KeeWay Posted February 3, 2010 Share Posted February 3, 2010 I have a basic script that i created that will read a text file and it takes the text between a start and end string and writes to a separate file. currently the small test file i have takes about 1 minute to read 22k lines. can anyone show/recommend a way to speed up the routine? the files may get upwards of 100k lines.. here is what i have working so far: $file = "C:\SBT\Incoming\SBTFILE.txt" $newfile = "C:\SBT\Work\SBTFILE-" $filecount = 0 _FileReadToArray($file, $FileArray) ProgressOn("Processing SBT File", "Reading The File...", "0 Lines") For $i = 1 To $FileArray[0] If StringInStr($FileArray[$i], "ISA*") Then ;MsgBox(0, " ", $FileArray[$i]) $filecount = $filecount + 1 FileWriteLine($newfile & $filecount & ".txt", $FileArray[$i] & @CRLF) Else FileWriteLine($newfile & $filecount & ".txt", $FileArray[$i] & @CRLF) EndIf $Percent = Int(($i / $FileArray[0]) * 100) ProgressSet($Percent, $Percent & " Percent Complete") Next ProgressSet(100, "Done", "Complete") Sleep(1000) ProgressOff() Thanks James Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted February 3, 2010 Moderators Share Posted February 3, 2010 KeeWay,One of the reasons this might be slow is the fact that you use FileWriteLine. This opens and closes the file each time you use it - and for a 22k line file that us a lot of opening and closing. I have rewritten the code so that it stores each ISA* line and all lines until the next ISA* line (which is what I understood you wanted to have in each new file) in a string, whch we write to file whenever a new ISA* line is found:expandcollapse popup#include <File.au3> Global $FileArray $file = @ScriptDir & "\test.txt" ;"C:\SBT\Incoming\SBTFILE.txt" $newfile = @ScriptDir & "\split-" ; "C:\SBT\Work\SBTFILE-" _FileReadToArray($file, $FileArray) $filecount = 0 $sNewFile_Text = "" ProgressOn("Processing SBT File", "Reading The File...", "0 Lines") For $i = 1 To $FileArray[0] If StringInStr($FileArray[$i], "ISA*") Then ; We need to start a new file ; So write the existing one unless we have yet to start If $filecount > 0 Then FileWrite($newfile & $filecount & ".txt", $sNewFile_Text) ; And start a new one $filecount = $filecount + 1 $sNewFile_Text = $FileArray[$i] & @CRLF Else $sNewFile_Text &= $FileArray[$i] & @CRLF EndIf $Percent = Int(($i / $FileArray[0]) * 100) ProgressSet($Percent, $Percent & " Percent Complete") Next ; Now write the final file! FileWrite($newfile & $filecount & ".txt", $sNewFile_Text) ProgressSet(100, "Done", "Complete") Sleep(1000) ProgressOff()It works fine on my short 20 line test file - I am not going to write a 22k one so I test it on one of the size you want. Over to you!I hope it helps - come back if it does not do what you want. m23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
KeeWay Posted February 3, 2010 Author Share Posted February 3, 2010 KeeWay, One of the reasons this might be slow is the fact that you use FileWriteLine. This opens and closes the file each time you use it - and for a 22k line file that us a lot of opening and closing. I have rewritten the code so that it stores each ISA* line and all lines until the next ISA* line (which is what I understood you wanted to have in each new file) in a string, whch we write to file whenever a new ISA* line is found: expandcollapse popup#include <File.au3> Global $FileArray $file = @ScriptDir & "\test.txt" ;"C:\SBT\Incoming\SBTFILE.txt" $newfile = @ScriptDir & "\split-" ; "C:\SBT\Work\SBTFILE-" _FileReadToArray($file, $FileArray) $filecount = 0 $sNewFile_Text = "" ProgressOn("Processing SBT File", "Reading The File...", "0 Lines") For $i = 1 To $FileArray[0] If StringInStr($FileArray[$i], "ISA*") Then ; We need to start a new file ; So write the existing one unless we have yet to start If $filecount > 0 Then FileWrite($newfile & $filecount & ".txt", $sNewFile_Text) ; And start a new one $filecount = $filecount + 1 $sNewFile_Text = $FileArray[$i] & @CRLF Else $sNewFile_Text &= $FileArray[$i] & @CRLF EndIf $Percent = Int(($i / $FileArray[0]) * 100) ProgressSet($Percent, $Percent & " Percent Complete") Next ; Now write the final file! FileWrite($newfile & $filecount & ".txt", $sNewFile_Text) ProgressSet(100, "Done", "Complete") Sleep(1000) ProgressOff() It works fine on my short 20 line test file - I am not going to write a 22k one so I test it on one of the size you want. Over to you! I hope it helps - come back if it does not do what you want. m23 that worked out great thanks , it cut the same file to about 8 sec. i am still new to the autoit program and working with arrays. i see that you just wrote the found array to the file at once, i guess since i don't have a clear understanding of the whole array stuff i thought i had to write it out one line at at time.. anyway thanks again this is great.. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted February 3, 2010 Moderators Share Posted February 3, 2010 KeeWay,i see that you just wrote the found array to the file at onceNot quite. What I did was read the array created with _FileReadToArray line by line and then save the relevant lines into one long string - which I then wrote to a file in one pass with FileWrite when I wanted to start a new file. I created no arrays, although it could be done that way if you wanted to. M23P.S. When you reply please use the "Add Reply" button at the top and bottom of the page rather then the "Reply" button in the post itself. That way you do not get the contents of the previous post quoted in your reply and the whole thread becomes easier to read. Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
KeeWay Posted February 3, 2010 Author Share Posted February 3, 2010 Melba23 thanks i see now ... sorry i will remember to hit the correct reply button next time... thanks again James Link to comment Share on other sites More sharing options...
mjfoxtrot Posted June 17, 2014 Share Posted June 17, 2014 I know this is an old topic, but the script here works extremely well, so much so that it trumps all 3rd-party text-file splitters (and there are a lot of them.) But is it possible to make a modification to this script so that it would take the specified search term (in the example, it is ISA*) and put it at the BOTTOM of the split files? Right now, it puts the search term at the top. The text splitting that I am doing is such that I need the word boundary to be at the bottom of each of the split files. To clarify: the text boundary that I am using is ****** END OF REPORT ****** The current script produces this in each of the split pages: ****** END OF REPORT ****** (the rest of the text in the file) What I need is for the script to accomplish the following in each of the split files: (text in the file) ****** END OF REPORT ****** Basically, I need the delimited line to appear at the bottom of each of the split pages. The script would hunt for the delimited term, and when it finds it, it would split the file right there, with the delimited line at the bottom. Then it would continue through the text document for all other instances of the ****** END OF REPORT ****** delimiter, split the file, put the delimiter at the bottom, etc. etc. Thanks for any help that someone can provide. I am most grateful already for this script, as it has saved me a lot of time in processing files; it can save me even more with this modification. Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted June 17, 2014 Moderators Share Posted June 17, 2014 mjfoxtrot,Welcome to the AutoIt forum. Using this file:Line 1 | File 1 Line 2 | ****** END OF REPORT ****** Line 4 | File 2 Line 5 | ****** END OF REPORT ****** Line 7 | File 3 Line 8 | Line 9 | Line 10 | Line 11 | ****** END OF REPORT ****** Line 13 | File 4 Line 14 | Line 15 | ****** END OF REPORT ****** Line 17 Line 19 Line 19the following code splits it as indicated:#include <File.au3> Global $FileArray $file = @ScriptDir & "\test.txt" ;"C:\SBT\Incoming\SBTFILE.txt" $newfile = @ScriptDir & "\split-" ; "C:\SBT\Work\SBTFILE-" _FileReadToArray($file, $FileArray) $filecount = 1 $sNewFile_Text = "" ProgressOn("Processing SBT File", "Reading The File...", "0 Lines") For $i = 1 To $FileArray[0] If StringInStr($FileArray[$i], "****** END OF REPORT ******") Then ; We need to write the current file FileWrite($newfile & $filecount & ".txt", $sNewFile_Text & $FileArray[$i]) ; And start a new one $filecount = $filecount + 1 $sNewFile_Text = "" Else ; Add line to string $sNewFile_Text &= $FileArray[$i] & @CRLF EndIf $Percent = Int(($i / $FileArray[0]) * 100) ProgressSet($Percent, $Percent & " Percent Complete") Next ProgressSet(100, "Done", "Complete") Sleep(1000) ProgressOff()Is that what you are looking for? If not then please post a test file showing how it should be split and I can modify the code. M23 mjfoxtrot 1 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Oscis Posted June 17, 2014 Share Posted June 17, 2014 I made a script that splits a file in binary or normal mode. You can use an interface, or just integrate the function into another script. If you use the interface, you can drag a file into the first input control to get the file path, and drag a folder into the second input control to set the destination folder for the resulting files. You can type in a prefix that all the resulting files will share in the 3rd input control, and the delimiter you want to use the split the file in the 4th input control. If you leave the 2nd and 3rd input controls blank, the program will fill in default values based on the given file path. If you have any suggestions, or find bugs, let me know. Here is the code: expandcollapse popup#AutoIt3Wrapper_Run_Au3Stripper=y #Au3Stripper_Parameters=/RM /SF = 1 /SV = 1 /PE #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** Opt("MustDeclareVars",1) ;These includes are only needed for the interface. #include <GUIConstantsEx.au3> #include <WindowsConstants.au3> #include <StaticConstants.au3> ;_Example1() _Example2() Func _Example1();Splitting a file without a user interface Local $FileName = "" Local $Folder = "" Local $Prefix = "" Local $Delimiter = "****** END OF REPORT ******" ;_SplitFileHex($FileName,$Folder,$Prefix,$Delimiter) _SplitFile($FileName,$Folder,$Prefix,$Delimiter) EndFunc Func _Example2();Splitting a file with a user interface Local $GUI = GUICreate("File Splitter",300,130,-1,-1,-1,$WS_EX_ACCEPTFILES) Local $Label = GUICtrlCreateLabel("File:",10,10,68,20,$SS_RIGHT) GUICtrlSetFont($Label,12) Local $Input = GUICtrlCreateInput("",88,10,202,20) Local $Label2 = GUICtrlCreateLabel("Folder:",10,40,68,20,$SS_RIGHT) GUICtrlSetFont($Label2,12) Local $Input2 = GUICtrlCreateInput("",88,40,202,20) Local $Label3 = GUICtrlCreateLabel("Prefix:",10,70,68,20,$SS_RIGHT) GUICtrlSetFont($Label3,12) Local $Input3 = GUICtrlCreateInput("",88,70,202,20) Local $Label4 = GUICtrlCreateLabel("Delimiter:",10,100,68,20,$SS_RIGHT) GUICtrlSetFont($Label4,12) Local $Input4 = GUICtrlCreateInput("",88,100,202,20) Local $NextButton = GUICtrlCreateButton("",0,0,0,0) GUICtrlSetState($NextButton,$GUI_HIDE) Local $GUIAccelerators[1][2] = [["{ENTER}",$NextButton]] GUISetAccelerators($GUIAccelerators,$GUI) GUICtrlSetState($Input,$GUI_DROPACCEPTED) GUICtrlSetState($Input2,$GUI_DROPACCEPTED) GUISetState() Local $File, $FileName, $Prefix, $Delimiter, $A, $B, $Folder, $Text, $Continue = True While 1 Switch GUIGetMsg() Case -3 ExitLoop Case $NextButton $FileName = GUICtrlRead($Input) If $FileName == "" Then MsgBox(0,"Error","You must type in a file name, or drag a file into the first input box before you continue.") ElseIf FileExists($FileName) Then ConsoleWrite($FileName & @CR) $A = StringInStr($FileName,"\",0,-1) $Folder = GUICtrlRead($Input2) $Continue = True If $Folder == "" Then $Folder = StringLeft($FileName,$A) GUICtrlSetData($Input2,$Folder) $Continue = False ConsoleWrite($Folder & @CR) EndIf $Prefix = GUICtrlRead($Input3) If $Prefix == "" Then $A += 1 $B = StringInStr($FileName,".",0,-1) ConsoleWrite($A & @CR & $B & @CR & StringLen($FileName) & @CR) $Prefix = StringMid($FileName,$A,$B - $A) GUICtrlSetData($Input3,$Prefix) $Continue = False EndIf If $Continue Then $Delimiter = GUICtrlRead($Input4) ConsoleWrite("$Delimiter = " & $Delimiter & @CR) If $Delimiter == "" Then MsgBox(0,"Error","You did not type in a delimiter. Where do you want this program to split the file?") Else If MsgBox(4,"Question","Do you want to open the file in binary mode?") = 6 Then _SplitFileHex($FileName,$Folder,$Prefix,$Delimiter) Else _SplitFile($FileName,$Folder,$Prefix,$Delimiter) EndIf EndIf EndIf Else MsgBox(0,"Error","I could not find the file. Please check the file path, and try again.") EndIf EndSwitch WEnd Exit EndFunc Func _SplitFileHex($FileName,$Folder,$Prefix,$Delimiter);The main function, Hex version ProgressOn("File Splitter","Processing Your File","0%",-1,-1,18) Local $File = FileOpen($FileName,16), $Text = FileRead($File) FileClose($File) Local $TextLen = StringLen($Text) If StringRight($Folder,1) <> "\" Then $Folder &= "\" DirCreate($Folder) If StringLeft($Delimiter,2) <> "0x" Then $Delimiter = StringToBinary($Delimiter) $Delimiter = StringTrimLeft($Delimiter,2) Local $Suffix = StringMid($FileName,StringInStr($FileName,".")) $FileName = $Folder & $Prefix & " - " Local $DelimiterLen = StringLen($Delimiter), $String = "", $Start = 3, $End, $Count Local $F = 0, $Percent While 1 $F += 1 $End = StringInStr($Text,$Delimiter,2,1,$Start) If $End Then $End += $DelimiterLen $File = FileOpen($FileName & $F & $Suffix, 26) $Count = $End - $Start FileWrite($File,"0x" & StringMid($Text,$Start,$Count)) FileClose($File) $Start += $Count $Percent = Round(($Start/$TextLen)*100,2) ProgressSet($Percent,$Percent & "%","File " & $F) Else ExitLoop EndIf WEnd ProgressSet(100,"100%","Done!") ProgressOff() MsgBox(0,"Done","I've split your file.") EndFunc Func _SplitFile($FileName,$Folder,$Prefix,$Delimiter);The main function ProgressOn("File Splitter","Processing Your File","0%",-1,-1,18) Local $File = FileOpen($FileName), $Text = FileRead($File) FileClose($File) Local $TextLen = StringLen($Text) If StringRight($Folder,1) <> "\" Then $Folder &= "\" DirCreate($Folder) Local $Suffix = StringMid($FileName,StringInStr($FileName,".")) $FileName = $Folder & $Prefix & " - " Local $DelimiterLen = StringLen($Delimiter), $String = "", $Start = 1, $End, $Count Local $F = 0, $Percent While 1 $F += 1 $End = StringInStr($Text,$Delimiter,1,1,$Start) If $End Then $End += $DelimiterLen $File = FileOpen($FileName & $F & $Suffix, 26) $Count = $End - $Start FileWrite($File,StringMid($Text,$Start,$Count)) FileClose($File) $Start += $Count $Percent = Round(($Start/$TextLen)*100,2) ProgressSet($Percent,$Percent & "%","File " & $F) Else ExitLoop EndIf WEnd ProgressSet(100,"100%","Done!") ProgressOff() MsgBox(0,"Done","I've split your file.") EndFunc Let me know what you think. Link to comment Share on other sites More sharing options...
mjfoxtrot Posted June 17, 2014 Share Posted June 17, 2014 (edited) Melba23, I can't thank you enough. That is EXACTLY what I was looking for. I just gave your script a test run and it works perfectly for what I need. I was able to improvise the original script to work by putting in some clunky lines of code myself (I had to add the delimiter to the top of the file, then do the splitting, then add the "END OF REPORT" line back to the bottom. But now I don't have to do that anymore because your code does it all for me I also would like to thank you for the nice welcome message. Yes, this was my first posting. AutoIt is an amazing tool and for a few months now I have been perusing the site, picking up bits and pieces on how to use it effectively. I really appreciate the help from an advanced user such as yourself. It is interesting to see such a useful script that can do a fast, clean, straight-forward text split via a word boundary; I looked around the internet quite a bit for a useful application that does this, and the quality options are extremely scarce. Oscis: I have not tried out your suggestion, but I will, and I appreciate you providing it. I will let you know how it works. Edited June 18, 2014 by mjfoxtrot Link to comment Share on other sites More sharing options...
mjfoxtrot Posted June 18, 2014 Share Posted June 18, 2014 (edited) Oscis, I tried out your script. It's very nice; I like the front-end you built on to it. It makes things very convenient for a splitting operation that a user intends to run just once or twice. For everyday splitting operations, it's easy enough to integrate into another script, as you mentioned. But a couple of points, if I may: 1. Is it possible to make the script so that it is a bit more forgiving about the delimiter . . . for instance, I noticed that some of my "****** END OF REPORT ******" lines in the source file have an extra space or two in them (i.e., they look like this: ****** END OF REPORT ******"). When I use Melba23's script, I simply use the term "END OF REPORT" as the delimiter and it seeks out any line that has that phrase, and it strips the entire line, no matter what the other characters are. That is what I need for my file splitting operations; maybe there is a way to build the function and/or front end so that a user could specify whether the delimiter phrase is verbatim (literal) or just a key phrase? In the event of a key phrase, the script would find the term in any line, make the split at that point, and put the entire line at the bottom of the split page. 2. This is more of a pie-in-the-sky request than anything else: can the splitting operation be controlled via a .txt file that acts as an index and has the page breaks clearly defined by FIRST LINE - LAST LINE? In other words, the text file will map out the splitting process, page by page. I realize the preparation to make the .txt file index is more time-consuming, but for more complex splitting operations, this would be a godsend. I've looked high and low for this kind of "split by index" functionality and not found it any place. Anyway, I really appreciate your script that you sent, it is a nice piece of programming and does exactly what I had specified. Sorry to ask for more, but it seems like what you have built has tremendous possibilities for me and others. Edited June 18, 2014 by mjfoxtrot Link to comment Share on other sites More sharing options...
Oscis Posted June 20, 2014 Share Posted June 20, 2014 I'm working on a second version of this program now. I'll see what I can do. Link to comment Share on other sites More sharing options...
mjfoxtrot Posted June 21, 2014 Share Posted June 21, 2014 (edited) Thanks, Oscis. Whatever you can do is much appreciated. I'll play around with the code as well, I am an amateur at this point but it is fun to try Edited June 22, 2014 by Melba23 Fixed font size Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now