BigDaddyO Posted June 10, 2016 Posted June 10, 2016 Is there a correlation between a text file size and the memory usage when loading that file with FileReadToArray? Since that command seems to fully lock my script, not even Adlib works, I created a second helper script that just displays a popup showing % load by reading the Memory usage of the main script and calculating against the target file size but I’m having problems with figuring out the correct math to calculate a good %. on one file where the lines are not very long but > 1million rows: (Mem Size / (File size * 2)) * 100 was pretty close. on another file that has really long lines, but only around 400k rows I actually needed to do ((Mem Size * 2) / File Size) * 100 to get close to the right %’s Is there some other way that I could use to get a better % no matter the file? Thanks, Mike
jchd Posted June 10, 2016 Posted June 10, 2016 The ratio will heavily depend on which encoding the text file uses and its actual content. AutoIt strings use 16-bit per character; add something for array overhead. BigDaddyO 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
orbs Posted June 10, 2016 Posted June 10, 2016 @BigDaddyO, i know that is not what you asked, but... 5 hours ago, BigDaddyO said: ... that command seems to fully lock my script ... if you need to handle large files, perhaps you should be reading and parsing them line-by-line, rather than reading the entire text and parsing it. no doubt that will decrease your script memory footprint. Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
jchd Posted June 10, 2016 Posted June 10, 2016 If you really have to read a text file into an array while preserving AdLib functionality, prefer _FileReadToArray over FileReadToArray. The former is an UDF, hence an interruptible piece of AutoIt code, while the latter is a single built-in (uninterruptible) instruction. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
BrewManNH Posted June 10, 2016 Posted June 10, 2016 _FRTA mostly uses FRTA unless you're using the Delimiter parameter, or one of the flags used to return the array as an array-of-arrays or returning the count in the [0] element. So, it's only interruptable if you're using one of them. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator
jchd Posted June 10, 2016 Posted June 10, 2016 Well spotted. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
BigDaddyO Posted June 13, 2016 Author Posted June 13, 2016 @orbs I ended up doing it your way. I'm parsing the file line by line as I'm validating. Overall it added about 10 minutes to the 3 hour long process but as it gives the users a good progress bar, they seem to prefer it. Though as the file is on a Network share, hopefully they don't loose connectivity while they are running else it will probably error out.
orbs Posted June 13, 2016 Posted June 13, 2016 3 minutes ago, BigDaddyO said: hopefully they don't loose connectivity while they are running else it will probably error out. you ought to introduce proper error checking then. after each FileReadLine() check the @error status and the return value. you may want to record the file size before you start reading, and increment a counter as you read it line-by-line, so you can verify the entire file was parsed. Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
BigDaddyO Posted June 13, 2016 Author Posted June 13, 2016 5 hours ago, orbs said: you ought to introduce proper error checking then. after each FileReadLine() check the @error status and the return value. you may want to record the file size before you start reading, and increment a counter as you read it line-by-line, so you can verify the entire file was parsed. I did use _FileCountLines() so I know the total row count, but I missed adding the error check on the FileReadLine() I'll add that. Thanks!
orbs Posted June 15, 2016 Posted June 15, 2016 @BigDaddyO, On 6/13/2016 at 9:32 PM, BigDaddyO said: I did use _FileCountLines() ... i was wondering how _FileCountLines() could determine the count of lines without reading the entire file. so i looked at the UDF. and guess what? it does. _FileCountLines() reads the entire file, and then counts the line breaks. so you are actually still reading the entire file. and at the origin of this topic, you were actually reading the entire file twice - once at _FileCountLines(), and once in your main script! no wonder it took so long. you should drop the _FileCountLines() and stick to file size indicator, which does not require reading the file at all. Signature - my forum contributions: Spoiler UDF: LFN - support for long file names (over 260 characters) InputImpose - impose valid characters in an input control TimeConvert - convert UTC to/from local time and/or reformat the string representation AMF - accept multiple files from Windows Explorer context menu DateDuration - literal description of the difference between given dates Apps: Touch - set the "modified" timestamp of a file to current time Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes SPDiff - Single-Pane Text Diff
BigDaddyO Posted June 17, 2016 Author Posted June 17, 2016 I didn't have _FileCountLines() initially as I was just using the array size to know the count. The _FileCountLines() may read the file, but it doesn't take long at all, about 15 seconds for the 1 million rows in my test file. I did find out that _FileCountLines() does not seem to work for text files in Binary format as it only ever comes back with 1 for line count. I tried updating the UDF to open the file in Binary mode but that didn't work. So, for now I have split my script into 2 different files. 1 for the majority of the files, and the original one without much of a progress for the Binary file.
MilesAhead Posted June 17, 2016 Posted June 17, 2016 (edited) Could the network be slowing you down? Have you tried a simple copy to the local drive, then read from the HD? If it is still slow you might try memory mapped files. That is what the OS uses to map exe images into ram. User programs can create memory mapped files to share the same data across processes. Where it may help you is the call MapViewOfFile which lets you map a chunk of the file into a memory range. With some experience, or if you can find a library for Memory Mapped Files, you may use that call to create a window into the file. The position in the file used to fill the buffer is adjustable. Also the size. Edited June 17, 2016 by MilesAhead fix typo My Freeware Page
water Posted June 17, 2016 Posted June 17, 2016 I wonder why it takes 3 hours to process 1 million lines of a text file. How do you process the lines? Do you write the result to Excel, Word ..? Maybe your script could be enhanced to run much faster My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now