Jump to content

read and write xlsx files without Excel


AspirinJunkie
 Share

Recommended Posts

Hi AspirinJunkie and thank you,

But since yesterday I'm trying to make it work with my program 😞. My version Autoit is 3.3.14.5 and my  feeling is this new version _xlsx_2Array() is not compatible with this.

I have issue  : error: MapAppend(): undefined function   MapAppend($mSheets, $mSheet) when i run my program.

Any ideas ?

Thank you again.

Regards

 

 

Link to comment
Share on other sites

Maps are only supported in the latest AutoIt release.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Hi AspirinJunkie

Indeed, that's what I thought, thanks Water.
I took some time to migrate to this new autoit version to check that my program still works without too many side effects, it seems to be ok 🙂

Thank you very much, your program works very well.
Is it difficult to add a complementary parameter that would allow to extract precise columns (ex: 4;7;8;15 etc...) and not continuous columns (ex: 5 to 15) ?
Thanks for your time and your help.

Regards

 

Link to comment
Share on other sites

  • 4 weeks later...

I tried to use this but it's very slow, with or without 7za.exe.  I have:

#include <Array.au3>
;#include <_XLSXReadToArray.au3>
#include <xlsxNative.au3>


Local $a, $t, $d, $Error, $Extended
Global $airportfile = @ScriptDir & "\all-airport-data.xlsx"
; https://adip.faa.gov/publishedAirports/all-airport-data.xlsx

$t = TimerInit()
;Global $a = _XLSXReadToArray($airportfile, 0, 0, 0)
$a = _xlsx_2Array($airportfile,1,1,3)
$error = @error
$d = TimerDiff($t)
ConsoleWrite("Return = " & $a & "; @Error = " & $Error & @LF)
ConsoleWrite("Timer = " & $d / 1000 & " seconds to read file" & @LF)

_ArrayDisplay($a)
ConsoleWrite("DONE")

With the file from the commented link it takes 18 minutes to read the file, even trying to read only three lines.  With an abridged version of the file (the original is 17MB, 98 columns and ~20k rows and ) it processes in under a second. 

XLSXReadToArray takes 24 seconds to read the big file, but for some reason drops the first line unless the file is opened in Excel and  resaved first.

Ultimately I only need to read 5 columns, but first I need to read the entire first line to get the column names and identify the columns I need.

 

Edited by Dana
Link to comment
Share on other sites

1 hour ago, Dana said:

With the file from the commented link it takes 18 minutes to read the file, even trying to read only three lines.

Yep - that sounds plausible and I don't really see any way to change this much.
There are a total of 737373 cells defined in the worksheet in the file. Each one in the form of an XML object.
The worksheet XML is 46 MB and that has to be parsed first.
The parser Microsoft.XMLDOM used in the UDF needs almost 3 minutes for this alone:

$sFile = "sheet1.xml"

$iT = TimerInit()
Global $oXML = ObjCreate("Microsoft.XMLDOM")
If Not $oXML.load($sFile) Then ConsoleWrite("error" & @CRLF)
ConsoleWrite(TimerDiff($iT) & " ms" & @CRLF)

And the whole data preparation for the transformation into an AutoIt array still follows.
So I have to say: The amount of data is simply too large for this kind of approach.

1 hour ago, Dana said:

XLSXReadToArray takes 24 seconds to read the big file

There the parsing is purely handled by StringRegExp instead of a DOM parser.
The approach seems to be faster than the XMLDOM parser.

Who knows, maybe I'll find the time and motivation to implement a worksheet parser via StringRegExp and see if it can actually increase the performance that much.

Link to comment
Share on other sites

Wow, thanks!  That's a LOT faster... 4 seconds to read 3 lines and 56 seconds to read the entire file, 38 seconds to read a single column.  Now to see how it does on a slower computer at home, and work out the best way to extract the 5 columns I need.

Link to comment
Share on other sites

  • 2 weeks later...

Hi,

I test more thoroughly, and I find major differences on the last 2 versions:  0.6 and 0.7. the parameters are the same :
_xlsx_2Array($TableExcel,"1","2",Default,"1",Default)
This is a screenshot of the Excel file that i want to handle in the array :

image.thumb.png.7e0f0c3724e6292c3082cfde0042b8df.png

In the version 0.6, the first column is not reading but the following columns seems to be good :

image.thumb.png.b601f15bfd83d8d2c21eb99a1cd0da0b.png

 

In the version 0.7, the first colomn is  reading but for the following colomns the result is different with previous version : some values are missing (Column  3 : CSV fields)
 et some colomns where there are not values in the Excel file, we can see some numeric datas !? (column 1, 4 or 6 for examples), I don't understand.


image.thumb.png.e10db069e31402e67493160121c87690.png

regards

Eric

Fichier_Test_Excel.xlsx

Link to comment
Share on other sites

Thank you - the file has exposed 2 cases that were not covered before.
I have uploaded the new fixed version.

Apart from that, the file is creepy. Data is only in the first 24 columns but there are 265 columns defined.
This is probably due to the fact that someone has wildly scrolled through and first assigned a formatting to all these cells although there is nothing in it.
Remove the columns and the handling of the file should be much smoother.

Link to comment
Share on other sites

Great ! thanks

Indeed, it is a legacy of some files I have to work with. This is why I wanted to be able to limit the display to the number of columns I wanted to define with the $dColTo parameter. 🙂

 

Link to comment
Share on other sites

  • 2 months later...

Hi AspirinJunkie,

With a tab containing several items of information, but separated by blank lines, udf ejects the program :

image.png.f1371eaad580543277dc4b6147fbea68.png

issue :

image.thumb.png.ad3ae793d3183674c87bce3a6bcb409e.png

Would it be possible to add a control to prevent this from happening with an error, or to capture all the data? ? this is the test file.

thanks for your time

regards

Eric

test_issue.xlsx

Link to comment
Share on other sites

I did have version 0.7.1, so I tried to compare it with the one I found on github and found this  little difference :

image.thumb.png.d1396067df46de36349becf69d5ef4aa.png

the only thing missing was the = and +$dRowFrom + 2  !

Thank you very much.

Regards

Edited by ermar
Link to comment
Share on other sites

  • 6 months later...

Hi AspirinJunkie,

This time I'm interested in writing results in an Excel File with thid udf : _xlsx_WriteFromArray

Is it possible to write several results in different tabs and rename it ? Using the example, I can see that the tab is called 1

Thanks

Regards

Link to comment
Share on other sites

1 hour ago, ermar said:

Is it possible to write several results in different tabs and rename it ?

No it's not possible with this function.

However, you can use the function as a template to understand the structure of an xlsx file and write your own function with the corresponding functionality based on this.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...