PureNewb Posted November 27, 2017 Share Posted November 27, 2017 (edited) Hello,I have a very large .csv file that i want to split but not by lines or size(kb,mb...) like https://www.gdgsoft.com/gsplit/ but according to the starting character of lines. I am a little newb on autoit and i tried find something in forum but i didnt find something about that parametre splitting. I found only that script in this post to splitting by lines but not good in my case,also i dont know how to work with StringRegExp so any help would be blessing. #include <File.au3> $sFilePath = @ScriptDir & "\test.csv" SplitCSVtoNewFile($sFilePath, 50) Func SplitCSVtoNewFile($sFilePath, $iLine) Local $aCSV_Lines = StringSplit(StringStripCR(FileRead($sFilePath)), @LF, $STR_CHRSPLIT) If @error Then Return SetError(1, 0, 0) Local $i = 1, $c = 1, $sLine, $hFile, $d = StringLen(UBound($aCSV_Lines)) Do $sLine &= $aCSV_Lines[$i] & @CRLF If Not Mod($i, $iLine) Then $hFile = FileOpen(StringTrimRight($sFilePath, 4) & StringFormat("%0" & $d & "i", $c) & ".csv", $FO_OVERWRITE) FileWrite($hFile, StringTrimRight($sLine, 2)) FileClose($hFile) $sLine = "" $c += 1 EndIf $i += 1 Until $i = $aCSV_Lines[0] If $sLine <> "" Then $hFile = FileOpen(StringTrimRight($sFilePath, 4) & StringFormat("%0" & $d & "i", $c) & ".csv", $FO_OVERWRITE) FileWrite($hFile, StringTrimRight($sLine, 2)) FileClose($hFile) EndIf EndFunc EXAMPLE OF CSV FILE : What i want is to split in to 2 files,the file of September and the second one the file of October.So the script should read every line and find all the lines that starts with the $variable of September,then copy those lines and create a file with that name(September) and "paste" them.Then the same again for the $variable of October. expandcollapse popup.CSV File September,1 September,2 September,3 September,4 September,5 September,6 September,7 September,8 September,9 September,10 September,11 September,12 September,13 September,14 September,15 September,16 September,17 September,18 September,19 September,20 September,21 September,22 September,23 September,24 September,25 September,26 September,27 September,28 September,29 September,30 October,1 October,2 October,3 October,4 October,5 October,6 October,7 October,8 October,9 October,10 October,11 October,12 October,13 October,14 October,15 October,16 October,17 October,18 October,19 October,20 October,21 October,22 October,23 October,24 October,25 October,26 October,27 October,28 October,29 October,30 EXAMPLE OF RESULT FILES expandcollapse popup1) File "September.csv" September,1 September,2 September,3 September,4 September,5 September,6 September,7 September,8 September,9 September,10 September,11 September,12 September,13 September,14 September,15 September,16 September,17 September,18 September,19 September,20 September,21 September,22 September,23 September,24 September,25 September,26 September,27 September,28 September,29 September,30 2) File "October.csv" October,1 October,2 October,3 October,4 October,5 October,6 October,7 October,8 October,9 October,10 October,11 October,12 October,13 October,14 October,15 October,16 October,17 October,18 October,19 October,20 October,21 October,22 October,23 October,24 October,25 October,26 October,27 October,28 October,29 October,30 Edited November 27, 2017 by Melba23 Added code tags Link to comment Share on other sites More sharing options...
mikell Posted November 27, 2017 Share Posted November 27, 2017 (edited) ? $sFilePath = @ScriptDir & "\test.csv" SplitCSVtoNewFile($sFilePath, "September") Func SplitCSVtoNewFile($sFilePath, $var) Local $content = FileRead($sFilePath) $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "") $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2) FileWrite($hFile, $new) FileClose($hFile) EndFunc This SRER deletes all lines which don't begin with the wanted starting characters Edited November 27, 2017 by mikell Malkey 1 Link to comment Share on other sites More sharing options...
PureNewb Posted November 27, 2017 Author Share Posted November 27, 2017 AWESOME! Thank u! Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted November 27, 2017 Moderators Share Posted November 27, 2017 PureNewb, When you post code or large text dumps please use Code tags - see here how to do it. Then you get a scrolling box and syntax colouring as you can see above now I have added the tags. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
PureNewb Posted November 27, 2017 Author Share Posted November 27, 2017 1 hour ago, Melba23 said: PureNewb, When you post code or large text dumps please use Code tags - see here how to do it. Then you get a scrolling box and syntax colouring as you can see above now I have added the tags. M23 ok 3 hours ago, mikell said: ? $sFilePath = @ScriptDir & "\test.csv" SplitCSVtoNewFile($sFilePath, "September") Func SplitCSVtoNewFile($sFilePath, $var) Local $content = FileRead($sFilePath) $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "") $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2) FileWrite($hFile, $new) FileClose($hFile) EndFunc This SRER deletes all lines which don't begin with the wanted starting characters also can i get the list of starting characters(September,October...etc) somehow? Link to comment Share on other sites More sharing options...
mikell Posted November 27, 2017 Share Posted November 27, 2017 Sure. Just extract all the first elements to an array and then use _ArrayUnique The example below will work if these elements don't include a comma (which is the delimiter) $content = FileRead($sFilePath) $a = StringRegExp($content, '(?m)^[^,]+', 3) $b = _ArrayUnique($a) _ArrayDisplay($b) 2 hours ago, PureNewb said: please use Code tags Melba, I'm sorry. I should have mentioned this Link to comment Share on other sites More sharing options...
czardas Posted November 28, 2017 Share Posted November 28, 2017 (edited) Perhaps this might be useful: _ArrayToSubItemCSV() It produces a 1D array of (2D) CSV results which can then be written to separate files.Edit: looking again at your request; you stated that you only want to specify the starting character, so this won't work because it splits on exact full matches within the specified column. I'm not sure how you had intended to deal with january, June and July: which all begin with the same letter. My function would be overkill in this case because, to solve your problem, you only need to split the csv without the need to search for matching (out of sequence) entries. @mikell's solution is probably best. Edited November 28, 2017 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
PureNewb Posted November 28, 2017 Author Share Posted November 28, 2017 16 hours ago, mikell said: Sure. Just extract all the first elements to an array and then use _ArrayUnique The example below will work if these elements don't include a comma (which is the delimiter) $content = FileRead($sFilePath) $a = StringRegExp($content, '(?m)^[^,]+', 3) $b = _ArrayUnique($a) _ArrayDisplay($b) Melba, I'm sorry. I should have mentioned this oh thanks again,it seems everything easy for you 8 hours ago, czardas said: Perhaps this might be useful: _ArrayToSubItemCSV() It produces a 1D array of (2D) CSV results which can then be written to separate files.Edit: looking again at your request; you stated that you only want to specify the starting character, so this won't work because it splits on exact full matches within the specified column. I'm not sure how you had intended to deal with january, June and July: which all begin with the same letter. My function would be overkill in this case because, to solve your problem, you only need to split the csv without the need to search for matching (out of sequence) entries. @mikell's solution is probably best. thanks for replay but as you said @mikell solution matching exactly in my needs.about january concept i will put the whole word"january". czardas 1 Link to comment Share on other sites More sharing options...
PureNewb Posted December 29, 2017 Author Share Posted December 29, 2017 On 11/27/2017 at 6:21 PM, mikell said: ? $sFilePath = @ScriptDir & "\test.csv" SplitCSVtoNewFile($sFilePath, "September") Func SplitCSVtoNewFile($sFilePath, $var) Local $content = FileRead($sFilePath) $new = StringRegExpReplace($content, '(?m)^\Q' & $var & '\E.*\R?(*SKIP)(*F)|^.*\R?', "") $hFile = FileOpen(@ScriptDir & "\" & $var & ".csv", 2) FileWrite($hFile, $new) FileClose($hFile) EndFunc This SRER deletes all lines which don't begin with the wanted starting characters Hello,and what if the csv file is 5gb and giving an error ram "Error allocating memory."? Link to comment Share on other sites More sharing options...
Earthshine Posted December 29, 2017 Share Posted December 29, 2017 (edited) Get more ram. 😋 File editor’s face this problem too when files become too large. What you have to do is you have to kind of build it so it scrolls through the file chunks at a time not the whole file at once. Edited December 29, 2017 by Earthshine My resources are limited. You must ask the right questions Link to comment Share on other sites More sharing options...
Developers Jos Posted December 29, 2017 Developers Share Posted December 29, 2017 Just change the process to read a record at a time. This will take longer but will be independent of the filesize. Jos Earthshine 1 SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Earthshine Posted December 29, 2017 Share Posted December 29, 2017 I like my buffered approach better but that’s just me. I’m sure you could read a chunk of it into an array process it read the next chalk process it My resources are limited. You must ask the right questions Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now