MarketplaceStudioActivityRead Extra Large SpreadSheets

Create your first automation in just a few minutes.Try Studio Web

Read Extra Large SpreadSheets

Read Extra Large SpreadSheets

by Internal Labs

StarStarStar

9

Activity

Downloads

60.3k

back button
back button
carouselImage0
next button
next button

Summary

Summary

An activity pack used to read and process extra large Excel files.

Overview

Overview

Objectives:

  • Providing a robust and facile manner of processing very large Excel files in the background and without relying on COM Interop.
  • The implementation of these custom activities will give the user the last row of the Excel file (extremely useful when trying to process a file in chunks). Currently trying to read a file in chunks is impossible without knowing the number of rows and columns in the sheet.
  • The implementation of an Application Scope activity that will allow the grouping of the custom activities in this package.
  • Providing activities that are similar in structure and type of parameters to the current Excel application.

Observation and limitations:

The activity will not keep track of the formats used to display values in the Excel application. For example, a cell with a value of 43006 formatted as date will be displayed as 9/28/2017 in Excel, but this activity will still retrieve it as 43006.

Files in which the row starts with the index 0 (this index can be found in the XML mark-up of the row) will not be processed. The row number also needs to be under 1048576 (the maximum number of rows in an Excel Sheet). If a row number outside of this interval is found, the application will throw an exception. This limitation is also valid for the Read Range in the Workbook activities package.

This package is not compatible with .xls files or .csv files. Only .xlsx files can be read with this activity. Reading Large XLS files can be done in conjunction with the Conversions Package, which will convert the file to an .xlsx file, thus making it compatible with this package.

Some very minor differences when deciding where a range ends between the standard Excel processing activities and this one.

Technical Approach:

The files will be processed in the background using the Open XML SDK library and the SAX (Simple API for XML) approach which instead of reading the entire document and exposing its DOM, will independently read the XML files that comprise the Excel document.

Read Range

Principles

This custom activity reads a Range of cells from an Excel Sheet. It achieves its goal by using the Open XML SDK framework and the OpenXmlReader class to parse the File in a SAX like approach. To get the number of rows, we simply parse the file and count the rows.

The custom activity is created by simply inheriting the CodeActivity class from the Windows Workflow Foundation framework. We define the logic of the function by implementing the Execute method and we define the input and output of the activity by fields of the InArgument and OutArgument types.

This activity can only be placed inside a XL Excel Application Scope activity (not necessarily directly inside, but one of its parents must be the XL Excel Application Scope).

Features

Features

Using SAX, the XML content of the file is loaded gradually, this means that the content is retrieved from the file without overloading the RAM memory or needing to have Microsoft Office installed. XLS files can be processed too, but they need to be converted to .xlsx first. 

The package has been successfully tested with files sizes of +200MB and over 1 million rows. There's no reason this activity could not read even bigger files, however, for this it would require to read the file in chunks.

Additional Information

Additional Information

Dependencies

DocumentFormat.OpenXML (> = 2.5.0) Microsoft.Office.Interop.Excel (> = 15.0.4795.1000)

Code Language

C#, Visual Basic

Runtime

Windows Legacy (.Net Framework 4.6.1)

Publisher

Internal Labs

Visit publisher's page

License & Privacy

License Agreement

Privacy Terms

Technical

Version

1.0.68

Updated

June 3, 2023

Works with

Studio: 21.10 - 22.10

Certification

Silver Certified

Application

Microsoft Excel

Support

UiPath Community Support