Popular Post AspirinJunkie Posted January 19, 2023 Popular Post Posted January 19, 2023 (edited) This UDF provides 2 functions to read data directly from xlsx files or to output data as xlsx file. Only the cell contents are considered - no cell formatting and the like. It is therefore explicitly not a full replacement for the Excel UDF, since its scope goes well beyond that. But to quickly read in data or to work with xlsx files without having Excel installed, the UDF can be quite useful. There may also be specially formatted xlsx files which I have not yet encountered during testing and which may cause problems. In this case it is best to make a message about it here and upload the file. Note: xlsx files must be unpacked for reading. To make this as fast as possible it is recommended to put a >>7za.exe<< file into the script directory, otherwise a slow alternative will be used. Otherwise an example says more than 1000 words: Spoiler #include "xlsxNative.au3" ; create 2D array Global $A[][] = [[1, 2, 3, 4, 5], ["", "6", 7, "", "8"], [], [9, "", "10", 11, True]] _ArrayDisplay($A, "source array") ; convert the Array into a xlsx-file: _xlsx_WriteFromArray(@ScriptDir & "\Text.xlsx", $A) ; read this xlsx-file into a 2D-Array: $aSheet = _xlsx_2Array(@ScriptDir & "\Text.xlsx") _ArrayDisplay($aSheet, "imported data from xlsx") >>sourcecode and download on github<< Changelog: Spoiler 2023-01-26: Complete rebuild of the element path extraction - should run more stable now _xlsx_2Array(): Added parameters to restrict to specific columns 2023-02-27: Complete revision of the XML parser. Partly massive performance boosts especially for very large files. 2023-03-13: Bugfix: empty cells, which have been attributed, are not recognized correctly. Bugfix: shared strings whose <t> elements have attributes were ignored. 2023-03-25: Bugfix: array range error if cell-coordinates are used and at least a whole row is not used Edited May 2, 2023 by AspirinJunkie mLipok, Zedna, argumentum and 6 others 7 2
ermar Posted January 19, 2023 Posted January 19, 2023 Hi, I discovered this udf on the German forum where you posted different evolutions, I was already working on a similar udf _XLSXReadToArray in my program. And now i use xlsxNative. Congratulations, i love it, It's an excellent work, thank you very much because it's really very convenient to get a array without having to open excel. It would be nice for a future evolution to obtain the BookLists, it will be great for me in order to work on the good workbook 🙂 Great Job ! Eric
AspirinJunkie Posted January 19, 2023 Author Posted January 19, 2023 1 minute ago, ermar said: It would be nice for a future evolution to obtain the BookLists A good suggestion - it goes on my to-do list. First and foremost, I'm not 100% sure about the compatibility with as many xlsx subformats as possible. The internal structure can be very different, so I can't guarantee that the UDF can handle all of them. At least it has been able to handle all the files I have fed it so far. dmob 1
AutoBert Posted January 21, 2023 Posted January 21, 2023 (edited) I tested with #include <xlsxNative.au3> ; create 2D array Global $A[][] = [[1, 2, 3, 4, 5], ["", "6", 7, "", "8"], [], [9, "", "10", 11, True]] _ArrayDisplay($A, "source array") $iStart = TimerInit() ; convert the Array into a xlsx-file: _xlsx_WriteFromArray(@ScriptDir & "\Text.xlsx", $A) $iDiff = TimerDiff($iStart) ConsoleWrite($iDiff & 'ms. for saving ' & @CRLF) ; read this xlsx-file into a 2D-Array: $aSheet = _xlsx_2Array(@ScriptDir & "\Text.xlsx") ConsoleWrite('Error: ' & @error & ' | ' & @extended & @CRLF) $iDiff2 = TimerDiff($iStart) ConsoleWrite($iDiff2-$iDiff & 'ms. for loading ' & @CRLF) ConsoleWrite($iDiff2 & 'ms. total ' & @CRLF) _ArrayDisplay($aSheet, "imported data from xlsx") Console output: 11563.3009ms. for saving Error: 0 | 0 3713.4193ms. for loading 15276.7202ms. total +>17:49:52 AutoIt3.exe ended.rc:0 +>17:49:52 AutoIt3Wrapper Finished. >Exit code: 0 Time: 73.55 Edited January 21, 2023 by AutoBert
AspirinJunkie Posted January 21, 2023 Author Posted January 21, 2023 (edited) And now put a 7za.exe in the folder and see how long this takes. For comparison - these are my results with your script when the 7za.exe is in the @Scriptdir: 180.0047ms. for saving Error: 0 | 0 99.362ms. for loading 279.3667ms. total Edited January 21, 2023 by AspirinJunkie
AspirinJunkie Posted January 22, 2023 Author Posted January 22, 2023 (edited) On 1/19/2023 at 6:16 PM, ermar said: It would be nice for a future evolution to obtain the BookLists, it will be great for me in order to work on the good workbook 🙂 I'll just interpret that by "workbook" you meant "worksheets" instead - right? As I understand it, there is only one workbook in an xlsx file - am I right? So I added a new function _xlsx_getWorkSheets() which returns a list of the existing worksheets. This is being used as follows: #include "xlsxNative.au3" Local $sFile = @ScriptDir & "\Test.xlsx" ; determine the worksheets of a file Local $aSheets = _xlsx_getWorkSheets($sFile) _ArrayDisplay($aSheets, "Sheet List", "", 64 + 32 , Default, "ID|Name") Edited January 26, 2023 by AspirinJunkie dmob 1
AutoBert Posted January 22, 2023 Posted January 22, 2023 (edited) 16 hours ago, AspirinJunkie said: For comparison - these are my results with your script when the 7za.exe is in the @Scriptdir Consoleoutput after downloading 7za.exe to @scriptdir: >Running:(3.3.16.0):C:\Program Files (x86)\AutoIt3\autoit3_x64.exe "C:\Users\(len)Bert\AutoIt\Downloaded\AspirinJunkey\xlsxNativeTEST.au3" +>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop. 1119.4133ms. for saving Error: 3 | 2 177.7074ms. for loading 1297.1207ms. total +>15:16:56 AutoIt3.exe ended.rc:0 Saving ~ 1/10 time, but error while loading. I think it's MS defender which blocks writing. xlsxWork is created, but nothing in. Changing to Local $pthWorkDir = @ScriptDir & "\xlsxWork\" ;original @tempdir brings throws same error: 3 | 2. Edited January 22, 2023 by AutoBert
AspirinJunkie Posted January 22, 2023 Author Posted January 22, 2023 7 minutes ago, AutoBert said: Error: 3 | 2 [...] but error while loading. This combination of @error and @extended should only be able to occur when writing with _xlsx_WriteFromArray(). Are you sure that it occurs when loading (with _xlsx_2Array() )? In case of _xlsx_WriteFromArray() this combination would mean that an error occurred during the FileWrite of the file `_rels\.rels`. I cannot reproduce this for now, so it is unclear what is causing this. I have adjusted the @error handling - especially of DirCreate and uploaded a new version. I have - if it would be a rights problem - the folder creation in suspicion.
AutoBert Posted January 22, 2023 Posted January 22, 2023 8 minutes ago, AspirinJunkie said: Are you sure that it occurs when loading (with _xlsx_2Array() )? Yes, i changed my testscript to #include <xlsxNative.au3> ; create 2D array Global $A[][] = [[1, 2, 3, 4, 5], ["", "6", 7, "", "8"], [], [9, "", "10", 11, True]] _ArrayDisplay($A, "source array") $iStart = TimerInit() ; convert the Array into a xlsx-file: _xlsx_WriteFromArray(@ScriptDir & "\Text.xlsx", $A) ConsoleWrite('_xlsx_WriteFromArray Error: ' & @error & ' | ' & @extended & @CRLF) $iDiff = TimerDiff($iStart) ConsoleWrite($iDiff & 'ms. for saving ' & @CRLF) ; read this xlsx-file into a 2D-Array: $aSheet = _xlsx_2Array(@ScriptDir & "\Text.xlsx") ConsoleWrite('_xlsx_2Array Error: ' & @error & ' | ' & @extended & @CRLF) $iDiff2 = TimerDiff($iStart) ConsoleWrite($iDiff2-$iDiff & 'ms. for loading ' & @CRLF) ConsoleWrite($iDiff2 & 'ms. total ' & @CRLF) _ArrayDisplay($aSheet, "imported data from xlsx") resulting console: +>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop. _xlsx_WriteFromArray Error: 0 | 0 950.3584ms. for saving _xlsx_2Array Error: 3 | 2 187.4876ms. for loading 1137.846ms. total +>16:00:27 AutoIt3.exe ended.rc:0 as XLS-Format isn't supported (~1k) , i have no real need. For the view Files (<5) in XLSx-Format i can live with 10* slower time.
argumentum Posted January 22, 2023 Posted January 22, 2023 Like it !, now, some error checking would be nice: ; function to share one single xmldom-object over the functions but without beeing a global variable Func __xlsx_getXMLObject() Local Static $c = 0 Local Static $oX = ObjCreate("Microsoft.XMLDOM") If @error Then Return SetError(@error, @extended, 0) ....... Func __xlsx_getSubFiles($pthWorkDir = @TempDir & "\xlsxWork\") Local $oXML = __xlsx_getXMLObject() If @error Then Return SetError(@error, @extended, 0) ....... ...and I believe that a local static is like a global internally for AutoIt, so is just a coding preference. ( I just wanted to say it for some reason ? ) Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
AspirinJunkie Posted January 22, 2023 Author Posted January 22, 2023 The error occurs when calling 7za.exe. Can you run the following manually from the command line from your script folder and see if this manual call works?: 7za.exe x "Text.xlsx" -o"%TEMP%\xlsxWork" shared*.xml sheet.xml sheet1.xml *.rels -r -tzip -bd -bb0 -aoa
AspirinJunkie Posted January 22, 2023 Author Posted January 22, 2023 1 minute ago, argumentum said: now, some error checking would be nice: Deal - new version is up now. 8 minutes ago, argumentum said: ...and I believe that a local static is like a global internally for AutoIt, so is just a coding preference. It is a value which remains in memory throughout the entire program run. Whether the variable belongs global or to another scope, however, refers purely to the identifier of the variable. If I can address a variable by its name from anywhere - then it is global. In the concrete example this is not the case - therefore the variable is also not global.
AutoBert Posted January 22, 2023 Posted January 22, 2023 (edited) C:\Users\(len)Bert\AutoIt\Downloaded\AspirinJunkey>7za.exe x "Text.xlsx" -o "%TEMP%\xlsxWork" shared*.xml sheet.xml sheet1.xml *.rels -r -tzip -bd -bb0 -aoa Resulting Error: Incorrect command line 7za.exe Just analyzed: https://www.virustotal.com/gui/file/4e15b455930cf0f0dd0732aee62a1482e90f88838eac2e38d0d248bae5934b20/details Edited January 22, 2023 by AutoBert
AspirinJunkie Posted January 22, 2023 Author Posted January 22, 2023 Where did you dig up this ancient version of 7za.exe? Are you an archeologist? The version you are using is from 2010 - so it is already 13 years old! So I can imagine that there are some commands in it that were not there at that time. In the opening post I have a link to a current 7za.exe - so your errors should no longer appear.
AutoBert Posted January 22, 2023 Posted January 22, 2023 Your linked file results with: --------------------------- Nicht unterstützte 16 Bit-Anwendung --------------------------- Das Programm bzw. das Feature "\??\C:\Users\(len)Bert\AutoIt\Downloaded\AspirinJunkey\7za.exe" kann aufgrund einer Inkompatibilität mit 64 Bit-Versionen von Windows nicht gestartet bzw. ausgeführt werden. Wenden Sie sich an den Softwarehersteller, um zu erfahren, ob eine mit 64 Bit-Windows kompatible Version verfügbar ist. --------------------------- OK --------------------------- so i downloaded the https://7-zip.org/a/7z2201-x64.exe and after installed i copied 7z.exe as 7za.exe to @scritdir. And yet it works: +>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop. _xlsx_WriteFromArray Error: 0 | 0 1134.811ms. for saving _xlsx_2Array Error: 0 | 0 595.9215ms. for loading 1730.7325ms. total
argumentum Posted January 22, 2023 Posted January 22, 2023 35 minutes ago, AspirinJunkie said: therefore the variable is also not global. ...I meant internally, in the stub at the C++ level, not script wise Follow the link to my code contribution ( and other things too ). FAQ - Please Read Before Posting.
AspirinJunkie Posted January 22, 2023 Author Posted January 22, 2023 3 minutes ago, AutoBert said: Your linked file results with: My linked file is not the 7za.exe - it is a 7z-archive which contains the 7za.exe. You have to unpack the file before. 4 minutes ago, argumentum said: ...I meant internally, in the stub at the C++ level, not script wise AutoIt manages its variables in a table which is divided in lists. In the old AutoIt source code these were two different lists - one for the globals and one for the locals. In the lists themselves all variables are equal. The assignment to a scope results from the affiliation to a certain list. If the variable would be part of the global list, it would be globally accessible. In the concrete case the variable remains a local variable and is not accessible for scopes outside (except just by the return). It must handled a bit differently in the AutoIt interpreter, because it is static (we don't know how exactly, because the source code for this is not available) but it doesn't end up in the global list. A global variable on C++ level would also mean that it can be addressed by C++ via its identifier. AutoIt variables, however, do not get a C++ identifier at all but get their name via their list entry as structure _VarNode with the attribute szName. The static persistence has nothing to do with scope for now. It is still not a global variable - not even on C++ level. argumentum 1
ermar Posted January 23, 2023 Posted January 23, 2023 Hi AspirinJunkie, Thanks for the new function _xlsx_getWorkSheets(). Exactly, workbook is for me Worksheets tab, sorry i'm french user and i'm not comfortable with the US version of excel. With my xlsx file, there are several worksheets tab. I tested the function with a file test and i want provide further clarification : The function gives the all worksheets, it's perfect 🙂 but the first column "ID" 's values does not necessarily correspond to the order of the worksheets that we see when i open the file. For example, I moved the fourth worksheet and if I use_xlsx_2Array and I want to read the first worksheet ($sSheetNr=1), I can't use this identification value because the value is "4" in the array, while the expected value should be "1". Great Job, 🤗 Would it be possible to provide an additional functionality with _xlsx_2Array ? We can select the numbers of rows to extract but could we choose some columns to extract ? it would be great ! Thank you.
AspirinJunkie Posted January 24, 2023 Author Posted January 24, 2023 (edited) On 1/23/2023 at 2:23 PM, ermar said: The function gives the all worksheets, it's perfect 🙂 but the first column "ID" 's values does not necessarily correspond to the order of the worksheets that we see when i open the file. The fact that the IDs no longer correspond to the order is not that bad. Worse, the IDs don't match the internal sheet file numbering, so you can't use them in the xlsx_2Array() to select the sheet. I have therefore adjusted it a bit so that the correct IDs appear first. However, the sheet determination is currently still very quick and dirty and works with many files but possibly not with all.When I get some time, I will therefore redo everything so that the workbook.xml.rels is evaluated properly so that all the assignments are evaluated more cleanly.Edit: Done! On 1/23/2023 at 2:23 PM, ermar said: Would it be possible to provide an additional functionality with _xlsx_2Array ? We can select the numbers of rows to extract but could we choose some columns to extract ? Yes sure - i can do that. Edited January 26, 2023 by AspirinJunkie
AspirinJunkie Posted January 26, 2023 Author Posted January 26, 2023 On 1/23/2023 at 2:23 PM, ermar said: Would it be possible to provide an additional functionality with _xlsx_2Array ? We can select the numbers of rows to extract but could we choose some columns to extract ? it would be great ! Thank you. Done. The _xlsx_2Array()-function now has two additional parameters to restrict to specific columns. Besides that, the extraction of the relevant files has been completely rewritten and should now be more stable and thus cover more shapes.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now