deezed Posted October 5, 2012 Share Posted October 5, 2012 Hey guys, Ok ill try to describe my issue as best i can, here goes : So i have a script that simply - reads a RANDOM line in rather large file of ~30,000 lines - does it's thing with the line it picked - after it's done, i use the _FileWriteToLine("file", $linenumber, "", 1) command to delete the line i just used, because it's not needed anymore. -program is then looped to begin the same process again Now, if i leave it at that, then obivously it works fine without a hitch. However, i thought it'd be faster if i just compiled the script and ran multiple copies of it.... it shouldn't ever read the same line across copies because i delete the lines i use, right? A multi-threading solution of sorts. WRONG. The issue i come across is that the file starts getting chopped down (sometimes by thousands of lines at a time) very quickly. I assume its becuase i have multiple handles of the file being written to at the same time or something along those lines (intended pun), and it just bugs out. Now im stuck thinking of a solution. I've tried multiple 'filedelete' and 'filewrite' functions posted on this board to no avail, the textfile still gets messed up in a short amount of time. Im guessing i need something in the script that 'checks' if the file is currently being 'write' accessed, and to wait until the file is not accessed anymore to proceed with the line deletion. My goal is to run at least 5 copies simultaneously to get the job done quicker. Any tips? Or is there no solution to my conundrum? Link to comment Share on other sites More sharing options...
czardas Posted October 5, 2012 Share Posted October 5, 2012 (edited) Hmm, not at all sure about this. If anything, I suspect that running multiple instances would make the whole process slower. It isn't going to create any more RAM. With dual core you can run two instances to gain (I imagine) double speed, and with a quad core four instances to gain even more speed. Don't ask me how it works though. I suggest you rethink your idea. Writing to the same file simultaneously is not recommended even with multi-core. Edited October 5, 2012 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 I would do it in a single script and try to do it all in memory. So don't read a line, process it and write/delete the line. Use _FileReadToArray to read the whole file into memory, process the array and write the results back to disk. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
deezed Posted October 5, 2012 Author Share Posted October 5, 2012 (edited) But then how would i go about speeding it up using multiple compiled exes? I can run the script by itself without any problems whatsoever, its just slow. When i try to run multiple instances of the script is where i run into problems, i dont see how each of them having their own seperate memory would solve my problem as i want them all to read/write to one single file. I was under the impression that multiple running autoit compiled scripts can't all share / modify one array thats in memory, they'd each have their own seperate ones..... which defeats my purpose Im *assuming* that my problem would be solved by having some sort of function that checks / detects if a file is being presently written to, to make the script sleep until the file isn't being written to anymore, and then proceed to write to that file when its 'free'..... My current theory as to why this isnt working is because multiple instances of my compiled exe script are trying to write to a file at the same time and thats causing my bugs. Here's a very bastardized version of what i figure the solution *MIGHT* be while 1 global $howmanylines= _FileCountLines("file.txt") ;read how many lines there are global $k = random(1,$howmanylines,1) ; take a random line number global $fileline= filereadline("text.txt",$k) ; read that line number ..... .... bunch of irrelevant code .... .... ;If $file is currently being written to, then ; this is my 'theory' on the solution, obviously not actual code ;sleep until file isn't being written to anymore ;endif _FileWriteToLine("file.txt", $linenumber, "", 1) ; Delete the line we parsed from the beggining of the script wend something like that would maybe solve it? I have no idea since im a newb. I hope i made it clear. Just to re-iterate, i can run my sript (in scite or compiled) just fine, its that when i try to run *multiple* instances of said script is when i run into problems with it butchering file.txt, presumably due to multiple concurrent attempts to write to the same exact file. Edited October 5, 2012 by deezed Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 Your script is slow because it does so many I/O operations. Read the file in one operation using _FileReadToArray, process all records in the array and write the result in one operation to disk. That makes it a lot faster and doesn't require multiple instances. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
deezed Posted October 5, 2012 Author Share Posted October 5, 2012 (edited) I understand what you're saying water, but my script isnt limited to reading/deleting text, thats a very small part of it (and ironically the part thats giving me the most trouble). It involves connecting to my webserver and sending over/receiving data as well, which is obviously a tad slower. A run through a loop of my script is ~10 seconds for example. So the bottleneck isn't the I/O operations as those are done relatively quickly. Im just trying to have it 'multi threaded' in the most sloppy and quickest way possible, and i thought it would work.edit: made a boo-boo in my above code.global $fileline= filereadline("text.txt",$k) ; read that line numberShould readglobal $fileline= filereadline("file.txt",$k) ; read that line number Edited October 5, 2012 by deezed Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 Doing _FileCountLines for every record you process is very slow because the function reads thw whole file and counts the lines. As I have only seen the file handling portion of your script I don't know what else slowes down the script. Before doing some optimization I would recommend to do some time measurement. Use TimerInit and TimerDiff to see how long each part of your script takes and where to enhance it. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
deezed Posted October 5, 2012 Author Share Posted October 5, 2012 making the GET/POST requests to my webserver is what take the most time, i can tell you that for sure. those things take 95-99% of the time. Optimizing these portions of my script wont make a big enough difference, i just want multiple instances working concurrently without bugs, thats it. This is already getting out of hand, im asking relatively simple questions and im getting told something completely different here. Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 On this forum we try to tell a man how to fish and not hand feed him for the rest of his life. The "relatively simple question" you ask has already been answered in the first post: "Writing to the same file simultaneously is not recommended even with multi-core" czardas 1 My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
deezed Posted October 5, 2012 Author Share Posted October 5, 2012 (edited) I dont know if you're trolling me or if you lack reading comprehension. Anyways we're in agreement here : I DONT WANT TO WRITE TO THE FILE SIMULTANEOUSLY, Infact, thats the thing im trying to *avoid*. So a function that detects whether a file is currently being written to, and then sleeps until that file isn't being written to anymore, then proceed with the script is *EXACTLY* what i want, as shown in my code example above. As i've stated, the bottleneck isnt the reading/writing, it's the web-stuff that im doing. Im probably going to have to solve this one myself because the so called mvps seem to be useless around here, i literally couldn't make it any clearer. Can't post until OCT 6 - 12:33 am due to the 5 post limit on new accounts, but ill try to have a solution and post it after that time, to help other people in the future. Edited October 5, 2012 by deezed Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 (edited) I dont know if you're trolling me or if you lack reading comprehension.I suggest you slow down a bit!Everyone who posted on this thread tries to help you! This isn't my first thread where the user asks for a solution which he can't implement himself but after some discussion and having an overall look at the script design the final solution looked completely different.What I would do:You need a script (Script A) that starts multiple instances (e.g. 10) of your processing script (Script B )Let script A split the input file into multiple files (e.g. 10) with only a few thousand recordsPass each filename to the instance of Script BThis way every instance has its own file to work with. No multiple write access needed any more Edited October 5, 2012 by water My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
hannes08 Posted October 5, 2012 Share Posted October 5, 2012 Another solution would be to use a small Database instead of a plain text file. Anyway listen to what water says as he knows what he's talking about. czardas 1 Regards,Hannes[spoiler]If you can't convince them, confuse them![/spoiler] Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 Anyway listen to what water says as he knows what he's talking about. Thanks for the compliment My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
czardas Posted October 5, 2012 Share Posted October 5, 2012 I have no idea what this is about - only 30,000 lines. That can be handled in memory. Random line selection - very slow - shuffle the array using one of the excellent random array snippets somewhere around here. Loop through the shuffled array [bunch of irrelevant code] and dump the resulting lines into a file. operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 The OP describes in post #8 that the time consuming processing is done with GET/POST requests to his webserver. That's why he wants to run multiple instances of his script. (That's what I understand with my lack of reading comprehension ) My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
czardas Posted October 5, 2012 Share Posted October 5, 2012 (edited) Yeah I read that, but I still don't understand how that changes things, or why it requires multiple instances of the script. Edited October 5, 2012 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 The OP hasn't provided any timing information but I can imagine that processing 30000 records and interact with a webserver for each of this records can take quite long. So if he can run the webserver part in parallel that should reduce the total run time. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
czardas Posted October 5, 2012 Share Posted October 5, 2012 (edited) Well I would be interested to hear the actual justification behind this, if it is really is necessary to spawn multiple processes. I suppose it depends on the nature and the order of tasks. I can imagine it in some circumstances. Edited October 5, 2012 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
water Posted October 5, 2012 Share Posted October 5, 2012 This is the next question I would ask. But the OP seems to be quite reluctant to give us additional information. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki Link to comment Share on other sites More sharing options...
czardas Posted October 5, 2012 Share Posted October 5, 2012 Well water, I didn't like the way he responded to you in post No 10, yet you still tried to help him. That's admirable! operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now