How To Remove Excel Worksheets Using APIs in Java
Learn how to simplify the process of retrieving worksheet details from an Excel XLSX file, and removing specific worksheets based on that information.
Join the DZone community and get the full member experience.
Join For FreeManipulating Excel documents with code is more than just a value-add process in many environments. It’s often a necessity, eliminating the manpower that would otherwise be committed to the monotonous task of adjusting large and complex spreadsheets in a predictable, cyclical way.
Thanks to various open-source libraries (like Apache POI, for example), programmatically creating, manipulating, and deleting Excel data is reasonably straightforward for experienced developers. Libraries like these provide a layer of abstraction between the developer and the behind-the-scenes “button-pressing” required to make XLSX (open office Excel) files perform actions akin to what we might accomplish with points and clicks in the Excel application.
Sometimes, however, one layer of abstraction isn’t enough to simplify the task of Excel automation sufficiently. It’s easy to get bogged down by the learning curve that comes along with picking up any new library’s advanced features. If we’re trying to get a simple automated Excel editing workflow up and running in our application, the complexity of learning an open-source tool and all its unique capabilities might hinder us more than it helps in the immediate term.
In this article, we’ll learn how to manipulate Excel documents using a few free APIs with Java code examples. Specifically, we'll learn how to quickly retrieve worksheet information from an Excel document and use that information to remove (delete) specific worksheets. Thanks to simple request parameters and easily readable response objects, these APIs offer an additional layer of abstraction to XLSX file manipulation processes, and this makes them useful for faster-paced Excel automation projects.
How Are Worksheets Retrieved From XLSX File Structure?
Before we learn how to structure our worksheet retrieval and removal API calls, we'll first review how it's possible for various APIs and libraries to interact with Excel XLSX documents programmatically.
All open office XML (OpenXML) documents — including XLSX files — are ZIP archives containing multiple XML files (and other resources, including Binary Large Objects, plain text/binary metadata files, embedded objects, etc.). Within an XLSX archive, cell and row data is stored in worksheet XML files, and those files are found in a directory of the ZIP archive called xl/worksheets/
. Rows are represented as <row>
XML elements and each row element contains cell definitions. Cells themselves are structured as <c>
elements within each <row>
element, with each <c>
element carrying its own values and formatting specifications.
Any technology capable of manipulating Excel worksheet contents directly will have code that unzips the XLSX archive, identifies the xl/worksheets/
directory, and retrieves the correct sheets within that directory. More complex operations will involve querying sheets within the directory for specific <row>
and <c>
elements. With a deep understanding of OpenXML file structure, it's possible — albeit fairly burdensome and impractical — to build fully-fledged applications for editing and manipulating XLSX files.
Retrieve and Delete Worksheets From XLSX Files
Before we can programmatically remove specific worksheets from our Excel spreadsheets, we'll need to get worksheet information from our document.
While an open-source library like Apache POI allows this kind of operation to occur locally with a set of classes, the web APIs we'll learn about in this article handle that process a bit differently. When we get worksheet details in this demonstration, we'll return that information in a simple, easily readable response object which can be parsed into a subsequent API request. The file processing operations themselves will occur in memory on an external server (data is deleted upon task completion), offloading the memory cost we might otherwise expect our servers to bear when unzipping and manipulating larger Excel files.
We’ll perform two quick API calls that 1) retrieve worksheet details from an XLSX document (including the worksheet path and name) and 2) delete specific worksheets based on the details we retrieved.
To structure our API calls, we'll begin by adding the API client to our Maven project (this will take care of client installation for both APIs in this demonstration). In our pom.xml, let’s add the following repository reference (jitpack is used to dynamically compile the library):
<repositories>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
After that, let's add the following dependency:
<dependencies>
<dependency>
<groupId>com.github.Cloudmersive</groupId>
<artifactId>Cloudmersive.APIClient.Java</artifactId>
<version>v4.25</version>
</dependency>
</dependencies>
Next, let’s add the following imports to the top of our file:
// Import classes:
//import com.cloudmersive.client.invoker.ApiClient;
//import com.cloudmersive.client.invoker.ApiException;
//import com.cloudmersive.client.invoker.Configuration;
//import com.cloudmersive.client.invoker.auth.*;
//import com.cloudmersive.client.EditDocumentApi;
Right after that, let’s copy the below snippet to handle API configuration for both API requests in this demonstration. To authorize our requests, we’ll need a free API key (this allows 800 API calls/month, and we can paste that within the “YOUR API KEY
” placeholder string).
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
// Uncomment the following line to set a prefix for the API key, e.g. "Token" (defaults to null)
//Apikey.setApiKeyPrefix("Token");
Finally, let’s use the below code to create an instance of the “Get Worksheets”
API and call the function:
EditDocumentApi apiInstance = new EditDocumentApi();
GetXlsxWorksheetsRequest input = new GetXlsxWorksheetsRequest(); // GetXlsxWorksheetsRequest | Document input request
try {
GetXlsxWorksheetsResponse result = apiInstance.editDocumentXlsxGetWorksheets(input);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling EditDocumentApi#editDocumentXlsxGetWorksheets");
e.printStackTrace();
}
When we structure our “Get Worksheets”
request, we'll have the option to provide our XLSX archive file bytes directly (in a Base64 encoded string) or supply a public file URL.
{
"InputFileBytes": "string",
"InputFileUrl": "string"
}
When we complete our request, the "Get Worksheets"
API response will return worksheet information following the below model:
{
"Successful": true,
"Worksheets": [
{
"Path": "string",
"WorksheetName": "string"
}
]
}
Once we successfully retrieve worksheet information, we can make a subsequent API call that deletes one or more of the worksheets in our Excel document.
We'll structure our "Delete Worksheets"
request following the request model below. Just like before, we can use our original Excel file bytes to provide the content for this operation, or we can use a public URL. Most importantly, we'll use the worksheet details we retrieved in our previous API call to define the exact path and name of the worksheet we’re deleting from our XLSX archive. We can expand the "WorksheetToRemove"
value in our request to include more than one Worksheet path and name.
{
"InputFileBytes": "string",
"InputFileUrl": "string",
"WorksheetToRemove": {
"Path": "string",
"WorksheetName": "string"
}
}
We can instance the "Delete Worksheets"
API and call the function using the below examples:
EditDocumentApi apiInstance = new EditDocumentApi();
RemoveXlsxWorksheetRequest reqConfig = new RemoveXlsxWorksheetRequest(); // RemoveXlsxWorksheetRequest | Spreadsheet input request
try {
byte[] result = apiInstance.editDocumentXlsxDeleteWorksheet(reqConfig);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling EditDocumentApi#editDocumentXlsxDeleteWorksheet");
e.printStackTrace();
}
This API call will directly return the file bytes (in a Base64 encoded string) for our updated XLSX file. We can write these file bytes to a new XLSX document (or overwrite the original file) and call it a day.
Conclusion
In this article, we learned how to call APIs that streamline the process of retrieving and deleting worksheets from XLSX files. Using APIs for this process helps get our automated Excel workflows off the ground without the overhead cost of learning a new open-source library.
Opinions expressed by DZone contributors are their own.
Comments