If so you can download any of the below versions for testing. The product will function as normal except for an evaluation limitation. At the time of purchase we provide a license file via email that will allow the product to work in its full capacity. If you would also like an evaluation license to test without any restrictions for 30 days, please follow the directions provided here.
If you experience errors, when you try to download a file, make sure your network policies (enforced by your company or ISP) allow downloading ZIP and/or MSI files.

Installation
The package is available at PyPI and it can be installed via pip by executing following command:
pip install groupdocs-parser-cloud

Requirements
Dependencies
The SDK automatically installs the following packages:
| Package | Constraint |
|---|
| urllib3 | >= 1.15 |
| six | >= 1.10 |
| certifi | — |
| python-dateutil | — |
GroupDocs.Parser Cloud SDK for Python empowers developers to integrate advanced document parsing and data extraction into Python web apps, scripts, and automation workflows. Extract text, images, metadata, and structured data from over 70 file formats — including Word, Excel, PDF, presentations, emails, archives, and eBooks. Define custom extraction templates to pull text fields, numbers, and tables from invoices, forms, and business documents. Whether parsing a single file or processing container items from ZIP archives, PST/OST mail stores, or PDF portfolios, GroupDocs.Parser delivers accurate, scalable tools for cloud-based document intelligence.
Extract plain text - Extract text content from documents in a simple form.
Extract formatted text - Extract text while preserving original formatting.
Extract text by page range - Extract text from specific pages only.
Extract text from containers - Extract text from documents inside ZIP archives, PST/OST files, and PDF portfolios.
Extract all images - Extract every embedded image from a whole document.
Extract images by page range - Extract images from specific pages based on a page range.
Extract images from containers - Extract images from documents inside container files.
Template-Based Parsing
Parse by template - Parse documents using user-defined templates for structured data extraction.
Create or update templates - Define and store extraction templates in cloud storage.
Get and delete templates - Retrieve or remove templates stored in user storage.
Parse by template object - Pass a template definition directly in the API request.
Get document information - Retrieve file extension, size in bytes, and page count.
Get container items information - List items within ZIP archives, PDF portfolios, and mail stores.
Get supported file formats - Retrieve the full list of supported parsing formats.
File Operations
Upload Files to Cloud - Upload files to cloud storage via the API.
Download Files from Cloud - Download files from cloud storage to local systems.
Copy Files - Copy files within the cloud storage to different locations.
Move Files - Move files between folders in cloud storage.
Delete Files - Delete specific files from cloud storage.
Folder Operations
Create Folder - Create new folders in the cloud storage.
Copy Folder - Duplicate folders within the cloud storage.
Move Folder - Move folders between directories in cloud storage.
Delete Folder - Remove entire folders from cloud storage.
Licensing and Authentication
Evaluation Mode - Try the API with a free trial account.
Secure Authentication - Use Client ID and Client Secret for secure API access.
MIT License - The Python SDK is licensed under the MIT License.
GroupDocs.Parser Cloud supports 70+ file formats with text extraction, image extraction, and template-based parsing capabilities:
- Word Processing: DOC, DOCX, DOCM, DOT, DOTX, DOTM, TXT, RTF, ODT, OTT
- PDF: PDF
- Markup: HTML, XHTML, MHTML, MD, XML
- eBooks: CHM, EPUB, FB2
- Spreadsheets: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, ODS, OTS, CSV, XLA, XLAM, NUMBERS
- Presentations: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM, ODP, OTP
- Emails: PST, OST, EML, EMLX, MSG
- Notes: ONE (Microsoft OneNote)
- Archives: ZIP
Supported operations vary by format. For the complete format matrix, see the documentation.
Quick Start
Get your API credentials
To use GroupDocs.Parser Cloud, sign up at GroupDocs.Cloud Dashboard and get your Client ID and Client Secret.
Initialize the API
Use the following code to start using the GroupDocs.Parser Cloud SDK for Python:
import groupdocs_parser_cloud
# Get your ClientId and ClientSecret at https://dashboard.groupdocs.cloud
client_id = "YourClientId"
client_secret = "YourClientSecret"
# Create API configuration
configuration = groupdocs_parser_cloud.Configuration(client_id, client_secret)
configuration.api_base_url = "https://api.groupdocs.cloud"
# Create instance of the Parse API
parse_api = groupdocs_parser_cloud.ParseApi.from_config(configuration)
Once initialized, use this basic example to extract text from a document in cloud storage:
import groupdocs_parser_cloud
client_id = "YourClientId"
client_secret = "YourClientSecret"
parse_api = groupdocs_parser_cloud.ParseApi.from_keys(client_id, client_secret)
options = groupdocs_parser_cloud.TextOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "email/eml/embedded-image-and-attachment.eml"
request = groupdocs_parser_cloud.TextRequest(options)
result = parse_api.text(request)
print("Text: " + result.text)
With this quick start guide, you’re all set to begin parsing documents using GroupDocs.Parser Cloud in your Python applications. For more details, visit the documentation.
Retrieve the full list of supported file formats available through the Parser API.
import groupdocs_parser_cloud
info_api = groupdocs_parser_cloud.InfoApi.from_keys("YourClientId", "YourClientSecret")
result = info_api.get_supported_file_formats()
for fmt in result.formats:
print(fmt.file_format)
Parse Document by Template
Parse a document using a user-defined template stored in cloud storage to extract structured fields and tables.
import groupdocs_parser_cloud
parse_api = groupdocs_parser_cloud.ParseApi.from_keys("YourClientId", "YourClientSecret")
options = groupdocs_parser_cloud.ParseOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "words-processing/docx/companies.docx"
options.template_path = "templates/companies.json"
request = groupdocs_parser_cloud.ParseRequest(options)
result = parse_api.parse(request)
for data in result.fields_data:
if data.page_area.page_text_area is not None:
print("Field name: " + data.name + ". Text: " + data.page_area.page_text_area.text)
if data.page_area.page_table_area is not None:
print("Table name: " + data.name)
for cell in data.page_area.page_table_area.page_table_area_cells:
print("Row " + str(cell.row_index) + " column " + str(cell.column_index) + ": " + cell.page_area.page_text_area.text)
Extract all embedded images from a document and retrieve their cloud storage paths and download URLs.
import groupdocs_parser_cloud
parse_api = groupdocs_parser_cloud.ParseApi.from_keys("YourClientId", "YourClientSecret")
options = groupdocs_parser_cloud.ImagesOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "slides/three-slides.pptx"
request = groupdocs_parser_cloud.ImagesRequest(options)
result = parse_api.images(request)
for image in result.images:
print("Image path: " + image.path + ". Download url: " + image.download_url)
print("Format: " + image.file_format + ". Page index: " + str(image.page_index))
Retrieve metadata about a document such as page count and file properties.
import groupdocs_parser_cloud
info_api = groupdocs_parser_cloud.InfoApi.from_keys("YourClientId", "YourClientSecret")
options = groupdocs_parser_cloud.InfoOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "words-processing/docx/password-protected.docx"
options.file_info.password = "password"
request = groupdocs_parser_cloud.GetInfoRequest(options)
result = info_api.get_info(request)
print("Page count: " + str(result.page_count))
List items within container files such as ZIP archives or mail stores.
import groupdocs_parser_cloud
info_api = groupdocs_parser_cloud.InfoApi.from_keys("YourClientId", "YourClientSecret")
options = groupdocs_parser_cloud.ContainerOptions()
options.file_info = groupdocs_parser_cloud.FileInfo()
options.file_info.file_path = "containers/archive/zip.zip"
request = groupdocs_parser_cloud.ContainerRequest(options)
result = info_api.container(request)
for item in result.container_items:
print("Name: " + item.name + ". FilePath: " + item.file_path)
Sample Projects on GitHub
The GroupDocs.Parser Cloud Python Samples repository includes ready-to-run examples covering:
| Category | Examples |
|---|
| Info Operations | Supported file formats, document information, container items information |
| Parse Operations — Extract Text | Extract text from whole document, formatted text, text by page range, text from container |
| Parse Operations — Extract Images | Extract images from whole document, images by page range, images from container |
| Parse Operations — Parse by Template | Parse by template in user storage, template defined as object, parse document inside container |
| Template Operations | Create or update template, get template, delete template |
How to run the examples
- Clone or download the samples repository
- Edit
RunExamples.py and set your app_sid and app_key - Go to the
Examples directory - Run
pip install groupdocs-parser-cloud -U - Execute
python RunExamples.py
For more details, visit Getting Started.

Document Data Extraction | Python Cloud API | GroupDocs.Parser Cloud | REST API | Text Extraction | Image Extraction | Template Parsing | Data Parsing | Document Information | Container Files | Cloud Storage Integration | File Operations | Folder Operations | Secure API Access | Document Parsing | Metadata Extraction | PDF Parsing | ZIP Files | PST/OST Files | Cross-platform API | Document Processing | Data Extraction API