-
Notifications
You must be signed in to change notification settings - Fork 339
Samples of Generating Various Excel Files
【Abstract】
In real-world businesses, we often need to generate Excel files of various formats based on structured data of different sources. Those Excel formats include pure data export, export of huge amounts of data, specifying display properties, fill of fixed rows and columns, property display according to dynamic condition, groups with detailed data and aggregates, crosstab, etc. This article illustrates how to automatically implement the conversion from structured data to Excel data of different formats and gives sample programs in esProc SPL Samples of Generating Various Excel Files!
In some data analysis jobs, we need to generate Excel files of required formats from structured data automatically by writing a program. The sources of structured data are different, such as queries, analyses and computations, data mining and data sampling.
For each Excel format listed above, we offer the esProc SPL program and explain the implementation process. It’s convenient to export structured data to an Excel file in esProc SPL. esProc is the professional data computing engine based on Structured Process Language (SPL). The language boasts an all-round set of functions for exporting data to an Excel file. The source of the structured data isn’t our focus. We just assume that there is already a structured text file.
In this case, column headers will be exported to the first row in an Excel worksheet and data records to the rows directly after it. Each row contains a record.
Example: To export data in an orders table to a new Excel file orders.xlsx, esProc SPL script is as follows:
A | Comment | |
1 | =file("e:/txt/orders.txt":"UTF-8").import@t() | Import data from *orders.txt* |
2 | =file("e:/excel/orders.xlsx").xlsexport@t(A1) | Export the text data to an Excel file; @t option enables export column headers to the first row |
If the target Excel file doesn’t exist, the program will automatically create one. xlsexport() function can specify columns to be exported through a parameter. As no columns are specified in this example, all columns will be exported. We can also specify the name of the sheet to which is exported using a parameter. By default data is exported to the first sheet.
Below is part of orders.xlsx:
To append data is to add new data at the end of an existing Excel file. There’s no need to export column headers in this case.
Example: To append new orders data of the current day to the existing orders.xlsx, esProc SPL script is as follows:
A | Comment | |
1 | =file("e:/txt/aday.txt":"UTF-8").import@t() | Import data from *aday.txt* |
2 | =file("e:/excel/orders.xlsx").xlsexport@a(A1) | @a option enables adding data to the end of the existing table |
Specify the name of sheet to which data will be exported.
Example: To export certain columns of orders data of Shantai company to Sheet Shantai in orders.xlsx and rename certain columns, esProc SPL script is as follows:
A | Comment | |
1 | =file("e:/txt/orders.txt":"UTF-8").import@t() | Import data from *orders.txt* |
2 | =A1.select(Company==“Shantai") | Select data of Shantai company |
3 | =file(“e:/excel/orders.xlsx”).xlsexport@t(A2,ID,Company,OrderDate:Date,Amount:Money;”Shantai") | Export four columns – ID, Company, OrderDate and Amount – to the sheet named Shantai and rename OrderDate column Date and Amount column Money |
Below is part of the orders.xlsx after data export:
f the to be exported data can’t be wholly loaded into the memory at once, we need to read in data with the cursor in batches and then write it to an Excel file of OpenXml format (with the extension xlsx) in a stream style. This read/write method requires as few resources as possible. As Excel regulates that one sheet can hold 1048576 rows of data at most, a new sheet is needed to store data if the number of rows to be exported exceeds the limit.
Example: To export data in a big orders table to big.xlsx, esProc SPL script is as follows:
A | Comment | |
1 | =file("e:/txt/big.txt":"UTF-8").cursor@t() | Read in text data with the cursor |
2 | =file("e:/excel/big.xlsx").xlsexport@st(A1) | @s options enables exporting data in a stream style |
Below are the last rows of data in the result big.xlsx:
The Excel file receives a total of 123663 rows of data. You can export any number of rows to an Excel file using the above SPL program if there is enough disk space.
We can specify a series of display properties, such as font, color, background color, text alignment type and display format, to make an Excel file looks attractive. To do this, we create the Excel file (the template), define the display properties and then export data to the file.
Example: To export data of orders table to orders.xlsx and display data in the following format:
As my aim is illustration instead of beauty, I set a lot of display properties here. Here we take the esProc script in section 1as an example:
Below is the result Excel file of the required display format:
Here’s a common scenario in data input business. There is an Excel file that has fixed format and number of rows and columns but that contains empty cells. We need to fill certain data to the empty cells. This requires a feature that enables inputting data to individual cells.
Example: The following is an Excel sheet sent by a fund house to its branches. The branches are required to fill data in it and sent the form back to the parent company.
esProc SPL script:
A | B | C | D | E | F | |
1 | Mengniu Funds | 2017 | 3 | 58.2 | 364 | 300 |
2 | 8.5 | 50 | 200 | 100 | 400 | 200 |
3 | 182.6 | 76.3 | 43.7 | 28.5 | 16.4 | |
4 | 120 | 1.07 | 30 | 0.27 | 90 | 0.8 |
5 | 154 | 6 | 4 | |||
6 | =file("e:/excel/result.xlsx") | =A6.xlsopen() | ||||
7 | =C6.xlscell("B2",1;A1) | =C6.xlscell("J2",1;B1) | 7 | |||
8 | =C6.xlscell("B3",1;D1) | =C6.xlscell("G3",1;E1) | 8 | |||
9 | =C6.xlscell("B6",1;[A2:F2].concat("\t")) | =C6.xlscell("H6",1;[A3:E3].concat("\t")) | ||||
10 | =C6.xlscell("B9",1;[A4:F4].concat("\t")) | =C6.xlscell("B11",1;[A5:C5].concat("\t")) | ||||
11 | =A6.xlswrite(C6) | 11 | =A6.xlswrite(C6) |
Suppose the data to-be-populated is ready (and stored in the first 5 rows). In the original form, the first 6 cells to be filled are separate and we have to fill them one by one. There are continuous empty cells on row 6. We can convert the to-be-populated data into a string delimited by \t and fill it to the empty cells continuously beginning from the specified cell. After all data is populated, write the Excel object stored in C6 back to the result.xlsx.
Below is the finished result.xlsx:
In this case, we want to display the cell properties dynamically according to the specified condition, such as displaying different background colors alternatively and showing cell color and font according to data values. To do this we need a mechanism that can define a conditional expression for displaying cell properties and get the desired properties by calculating the expression at the export. Such a mechanism in esProc is RaqReport.
Example: Export orders data to orders.xlsx where two background colors of data rows display alternatively and where amounts above 2000 is displayed in red and those below 500 displayed in green.
Open RaqReport designer and create a new report template orders.rpx, as shown below:
In the report, the first row contains table title, the second row holds column headers, and the third row stored detailed data.
Select all cells on the third row and enter background color expression as if(row()%2==0,-853778,-1) to define the two alternatively displayed colors. Select the last cell on the third row to set the display format as #.00 and enter foreground color expression as if(value()>2000,-65536,if(value()<500,-16711936,-16777216)) to display different font colors according to amounts.
esProc SPL script:
A | Comment | |
1 | =file("e:/txt/orders.txt":"UTF-8").import@t() | Import data from *orders.txt* |
2 | >report_config("e:/cfg/raqsoftConfig.xml") | Load the report environment configurations |
3 | =report_open("orders.rpx") | Open the desired report template |
4 | =report_run(A3;A1:"ds1") | Take A1’s table sequence as data set ds1 to pass to A3’s report for calculation |
5 | =report_exportXls@x(A3,"e:/excel/orders.xlsx") | Export the calculated report object in A3 to an Excel file |
Below is part of the final order.xlsx:
This format requires grouping data and displaying detailed data and aggregates for each group at export to the Excel file. We also need RaqReport to do this.
Example: Export orders data to orders.xlsx where data needs to be grouped by area and company to display detailed data and aggregates for each group.
Open RaqReport designer to create a report template named orders_group.rpx, as shown below:
In the report, the first row contains table title and the second row holds column headers. A3 groups rows by area; B3 groups rows by company; C3, D3 and E3 display detailed data; and E5 calculates total amount for each area.
esProc SPL script:
A | Comment | |
1 | =file("e:/txt/orders.txt":"UTF-8").import@t() | Import data from *orders.txt* |
2 | >report_config("e:/cfg/raqsoftConfig.xml") | Load the report environment configurations |
3 | =report_open("orders_group.rpx") | Open the desired report template |
4 | =report_run(A3;A1:"ds1") | Take A1’s table sequence as data set ds1 to pass to A3’s report for calculation |
5 | =report_exportXls@x(A3,"e:/excel/orders.xlsx") | Export the calculated report object in A3 to an Excel file |
Below is part of the final order.xlsx:
RaqReport is also needed to export data as an Excel crosstab.
Example: Export orders data to orders.xlsx of crosstab format where data needs to be displayed by both area and year.
Open RaqReport designer to create a report template named orders_cross.rpx, as shown below:
In the report, the first row contains table title. B2 groups rows by the year the order date belongs to; A3 groups rows by area; and B3 calculates total amount for each area.
esProc SPL script:
A | Comment | |
1 | =file("e:/txt/orders.txt":"UTF-8").import@t() | Import data from *orders.txt* |
2 | >report_config("e:/cfg/raqsoftConfig.xml") | Load the report environment configurations |
3 | =report_open("orders_cross.rpx") | Open the desired report template |
4 | =report_run(A3;A1:"ds1") | Take A1’s table sequence as data set ds1 to pass to A3’s report for calculation |
5 | =report_exportXls@x(A3,"e:/excel/orders.xlsx") | Export the calculated report object in A3 to an Excel file |
Below is part of the final order.xlsx:
Find more examples in SPL Cookbook.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code