<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-parser-cloud</artifactId>
<version>22.12</version>
</dependency>
compile(group: 'com.groupdocs', name: 'groupdocs-parser-cloud', version: '22.12')
<dependency org="com.groupdocs" name="groupdocs-parser-cloud" rev="22.12">
<artifact name="groupdocs-parser-cloud" ext="jar"/>
</dependency>
libraryDependencies += "com.groupdocs" % "groupdocs-parser-cloud" % "22.12"
Document Parser Java Cloud REST API
Product Page | Docs | Live Demos | Swagger UI | Code Samples | Blog | Free Support | Free Trial
GroupDocs.Parser Cloud SDK for Java helps you build cloud Document Parser Java Apps that work without installing any 3rd party software. It is a wrapper around GroupDocs.Parser Cloud REST APIs.
Cloud Document Parsing SDK Features
- Create user-defined data extraction templates to extract data from the cloud documents.
- Retrieve user-defined templates created for parsing cloud data.
- Supports various ways of extracting text from cloud hosted files:
- Extract text in simple form
- Extract text by keeping the formatting intact
- Extract text from the specific pages only by providing the page range.
- Extract images from files hosted on the cloud:
- Image extraction of all images from the whole cloud document
- Extraction of images from specific pages based on desired page range.
- Get a list of all supported file formats.
- Fetch useful information regarding cloud document, such as:
- Cloud document file extension
- Cloud document size in Bytes
- Cloud document page count
- Retrieve information about the items within a container, such as, a Zipped archive, PDF portfolio, etc.
- Built-in cloud storage API to work with files & folders on the cloud storage.
Supported Document Parsing File Formats
Microsoft Word®: DOC, DOT, DOCX, DOCM, DOTX, DOTM, TXT, RTF
OpenOffice Writer®: ODT, OTT
Microsoft Excel®: XLS, XLT, XLSX, XLSM, XLSB, XLTX, XLTM, CSV, XLA, XLAM
OpenOffice Calc®: ODS, OTS
Apple® iWork: NUMBERS
Microsoft PowerPoint®: PPT, PPS, POT, PPTX, PPTM, POTX, POTM, PPSX, PPSM
OpenOffice Impress®: ODP, OTP
Microsoft Outlook®: PST, OST, EML, MSG
Apple® Mail EMLX
Microsoft OneNote®: ONE
Markup: HTML, XHTML, MHTML, MD (Markdown), XML
eBooks: CHM, EPUB, FB2
Fixed Layout: PDF
Archives: ZIP
Requirements
Building the API client library requires:
- Java 1.7+
- Maven
Prerequisites
To use GroupDocs.Parser Cloud SDK for Java you need to register an account with GroupDocs Cloud and lookup/create Client ID and Client Secret at Cloud Dashboard. There is free quota available. For more details, see GroupDocs Cloud Pricing.
Install GroupDocs.Parser-Cloud from Maven
Add GroupDocs Cloud repository to your application pom.xml
<repository>
<id>repository.groupdocs.cloud</id>
<name>repository.groupdocs.cloud</name>
<url>https://releases.groupdocs.cloud/java/repo/</url>
</repository>
Install from source
To install the API client library to your local Maven repository, simply execute:
mvn clean install
To deploy it to a remote Maven repository instead, configure the settings of the repository and execute:
mvn clean deploy
Refer to the OSSRH Guide for more information.
Maven users
Add this dependency to your project’s POM:
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-parser-cloud</artifactId>
<version>22.3</version>
</dependency>
Others
At first generate the JAR by executing:
mvn clean package
Then manually install the following JARs:
target/groupdocs-parser-cloud-22.3.jar
target/lib/*.jar
Get Started
Please follow the Quick Start instructions.
Extract Text by a Page Number Range via Java Cloud SDK
/ For complete examples and data files, please go to https://github.com/groupdocs-parser-cloud/groupdocs-parser-cloud-java-samples
String MyAppKey = ""; // Get AppKey and AppSID from https://dashboard.groupdocs.cloud
String MyAppSid = ""; // Get AppKey and AppSID from https://dashboard.groupdocs.cloud
Configuration configuration = new Configuration(MyAppSid, MyAppKey);
ParseApi apiInstance = new ParseApi(configuration);
FileInfo fileInfo = new FileInfo();
fileInfo.setFilePath("pdf/four-pages.pdf");
TextOptions options = new TextOptions();
options.setStartPageNumber(1);
options.setCountPagesToExtract(1);
options.setFileInfo(fileInfo);
TextRequest request = new TextRequest(options);
TextResult response = apiInstance.text(request);
Authorization & Authentication
Authentication schemes defined for the API is as follows:
JWT
- Type: OAuth 2.0
- Flow: application
- Authorization URL: https://api.groupdocs.cloud/connect/token
- Token Lifetime: 1 day (Default)
Product Page | Docs | Live Demos | Swagger UI | Code Samples | Blog | Free Support | Free Trial
File | Classifier | Size |
---|---|---|
groupdocs-parser-cloud-22.12-javadoc.jar | javadoc | 1 MB |
groupdocs-parser-cloud-22.12-sources.jar | sources | 177 KB |
groupdocs-parser-cloud-22.12.jar | 259 KB | |
groupdocs-parser-cloud-22.12.pom | 2 KB |
GroupDocs Java REST API Maven SDK Java SDK Cloud REST REST API Cloud API MIT JWT oauth GroupDocs.Total Cloud GroupDocs.Parser Cloud JAR document automation document automation cloud document DOC DOT DOCX DOCM DOTX DOTM TXT RTF ODT OTT XLS XLT XLSX XLSM XLSB XLTX XLTM CSV XLA XLAM ODS OTS NUMBERS PPT PPS POT PPTX PPTM POTX POTM PPSX PPSM ODP OTP PST OST EML MSG EMLX ONE HTML XHTML MHTML MD Markdown XML CHM EPUB FB2 PDF ZIP parser parsing parse extract extraction extractor template user defined data template data extraction data information text hosted page page range file format cloud data