Alex117 Posted November 13, 2008 Share Posted November 13, 2008 Hi everyone, I try to create scripts for downloads information from web site and directly inject to sql. In my scripts i download (inetget) a html file for analyze it localy. I use filereadline for read a specific line for stringsplit later. I use this method a long time ago :-) This time, i encounter a problem because the line is very, very big. (fyi, the analized file is joined at this topic.). The number opf the line is 570. I try yet to search on forum but i never found a similar topic. In fact the FileRaedLine function is not be able to read the entire line. There is a cut before the end. For example i use this code : $base_illustrateur = FileOpen(@ScriptDir &"\illustrateurs.html",0) $line = FileReadLine($base_illustrateur,571) $split = StringSplit($line,"</table>",1) $split = StringSplit($split[1],"</tr>",1) FileWriteLine("debug_line.txt",$line) If i compare the first line of debug_line.txt and the 571th line of illustrateurs.html, there will be a difference. The size of the debuf file is exactly, every time, to 64k ! Is it a limit of the FileReadLine function ? Thank you very much Have a nice day. Regards. Alex117 Ps : Sorry for my poor english, i'm french.illustrateurs.zip Link to comment Share on other sites More sharing options...
PsaltyDS Posted November 13, 2008 Share Posted November 13, 2008 Hi everyone, I try to create scripts for downloads information from web site and directly inject to sql. In my scripts i download (inetget) a html file for analyze it localy. I use filereadline for read a specific line for stringsplit later. I use this method a long time ago :-) This time, i encounter a problem because the line is very, very big. (fyi, the analized file is joined at this topic.). The number opf the line is 570. I try yet to search on forum but i never found a similar topic. In fact the FileRaedLine function is not be able to read the entire line. There is a cut before the end. For example i use this code : $base_illustrateur = FileOpen(@ScriptDir &"\illustrateurs.html",0) $line = FileReadLine($base_illustrateur,571) $split = StringSplit($line,"</table>",1) $split = StringSplit($split[1],"</tr>",1) FileWriteLine("debug_line.txt",$line) If i compare the first line of debug_line.txt and the 571th line of illustrateurs.html, there will be a difference. The size of the debuf file is exactly, every time, to 64k ! Is it a limit of the FileReadLine function ? Thank you very much Have a nice day. Regards. Alex117 Ps : Sorry for my poor english, i'm french. AutoIt's theoretical limit for a string is 2GB. Practical limits in most real machines are more like 128MB, still much more than 64K. I guess there may be a null character in it though. See what you get from running it this way: $base_illustrateur = FileOpen(@ScriptDir &"\illustrateurs.html",0) $line = FileReadLine($base_illustrateur,571) ConsoleWrite("Debug: $line length = " & Stringlen($line) & @LF) ConsoleWrite("Debug: Binary $line length = " & BinaryLen(Binary($line)) & @LF) $split = StringSplit($line,"</table>",1) $split = StringSplit($split[1],"</tr>",1) FileWriteLine("debug_line.txt",$line) Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Alex117 Posted November 14, 2008 Author Share Posted November 14, 2008 Hi , Thank you for your response. The line is generated by a while function. I can't have a NULL character. I read a line with contain an html table. I split it with <tr>. Example of lines : <tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-1-aaron-boyd.html">Aaron Boyd</a></td></tr> <tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-2-adam-rex.html">Adam Rex</a></td></tr> <tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-3-adrian-smith.html">Adrian Smith</a></td></tr> etc... I try you suggest code, and conse debug this response : Debug: $line length = 61002 Debug: Binary $line length = 61002 If i try to add the 2 debug line in my original script (for the same file) i obtain a difference : Debug: $line length = 65534 Debug: Binary $line length = 65534 In twice case, the FileWriteLine("debug_line.txt",$line) doesn't give the same results How is it possible to give a different result with the same code on ? Thank you, Have a nice day. Alex Link to comment Share on other sites More sharing options...
PsaltyDS Posted November 14, 2008 Share Posted November 14, 2008 (edited) Hi , Thank you for your response. The line is generated by a while function. I can't have a NULL character. I read a line with contain an html table. I split it with <tr>. Example of lines : <tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-1-aaron-boyd.html">Aaron Boyd</a></td></tr> <tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-2-adam-rex.html">Adam Rex</a></td></tr> <tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-3-adrian-smith.html">Adrian Smith</a></td></tr> etc... I try you suggest code, and conse debug this response : Debug: $line length = 61002 Debug: Binary $line length = 61002 If i try to add the 2 debug line in my original script (for the same file) i obtain a difference : Debug: $line length = 65534 Debug: Binary $line length = 65534 In twice case, the FileWriteLine("debug_line.txt",$line) doesn't give the same results How is it possible to give a different result with the same code on ? Thank you, Have a nice day. Alex It's easy for the same code to produce different results based on different run-time circumstances. Only if the file read are exactly the same, so $line is exactly the same, should you get the same result. Can you post illustrateurs.html or another file like it that produces the same symptoms for you? Without that, I don't see how we can reproduce your conditions and symptoms. The short answer is: There is no 64KB string limit on the functions you've used in this topic, so that is not the problem. Edit: I'm just all kinds of wrong here... illustrateurs.html is posted in OP, and the FileReadLine() function is showing a 64K limit that I don't think is intended. Edited November 14, 2008 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Alex117 Posted November 14, 2008 Author Share Posted November 14, 2008 Hi, the file is alreeady uploaded via my first post. I upload my script file too.html2mysql.au3 Link to comment Share on other sites More sharing options...
Joon Posted November 14, 2008 Share Posted November 14, 2008 (edited) It appear that FileReadLine has limit at 64K. Here is the test I did. I get 65534 on my Vista machine. $line = "" For $i = 1 To 70000 $line &= "A" Next FileWriteLine("test.txt",$line) $line = FileReadLine("test.txt",1) MsgBox(0,'',StringLen($line)) Edited November 14, 2008 by Joon Link to comment Share on other sites More sharing options...
GEOSoft Posted November 14, 2008 Share Posted November 14, 2008 There may be a far easier way to do this. Exactly what is it that you are trying to get from the table? Do you need the link and the link text of just one of them? George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Joon Posted November 14, 2008 Share Posted November 14, 2008 (edited) Further testing shows FileReadLine splits at 65534 character. Output from the test. Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 44660 FileDelete("test.txt") $line = "" For $i = 1 To 700000 $line &= "A" Next FileWriteLine("test.txt",$line) $file = FileOpen("test.txt", 0) While 1 $line = FileReadLine($file) If @error = -1 Then ExitLoop ConsoleWrite("Line size: " & StringLen($line) & @LF) Wend FileClose($file) Edited November 14, 2008 by Joon Link to comment Share on other sites More sharing options...
PsaltyDS Posted November 14, 2008 Share Posted November 14, 2008 (edited) Further testing shows FileReadLine splits at 65534 character. Output from the test. Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 65534 Line size: 44660 FileDelete("test.txt") $line = "" For $i = 1 To 700000 $line &= "A" Next FileWriteLine("test.txt",$line) $file = FileOpen("test.txt", 0) While 1 $line = FileReadLine($file) If @error = -1 Then ExitLoop ConsoleWrite("Line size: " & StringLen($line) & @LF) Wend FileClose($file) Well... rats! Confirmed on XP Pro with 3.2.12.1 and 3.2.13.9 Beta (will try .10 in just a bit): #include <File.au3> Global $sFile = "test.txt", $hfile Global $sLine = "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()" FileDelete($sFile) Do $sLine &= $sLine Until Stringlen($sLine) > 2^16 ConsoleWrite("$sLine length = " & StringLen($sLine) & @LF) For $n = 1 To 10 FileWriteLine($sFile,$sLine) Next ConsoleWrite("Line count = " & _FileCountLines($sFile) & @LF) ; Test with FileReadLine() ConsoleWrite(@LF & "Test with FileReadLine() -----------------------" & @LF) $n = 1 While 1 $sLine = FileReadLine($sFile, $n) If @error Then ExitLoop ConsoleWrite("Line " & $n & " length = " & StringLen($sLine) & @LF) $n += 1 WEnd ; Test with FileRead() ConsoleWrite(@LF & "Test with FileRead() -----------------------" & @LF) $sLine = FileRead($sFile) $avLine = StringSplit($sLine, @CRLF, 1) For $n = 1 To 10 ConsoleWrite("Line " & $n & " length = " & StringLen($avLine[$n]) & @LF) Next Edit: Confirmed with 3.2.13.10 Beta. Output: expandcollapse popup>Running:(3.2.13.10):C:\Program Files\AutoIt3\beta\autoit3.exe "C:\temp\Test.au3" $sLine length = 73728 Line count = 10 Test with FileReadLine() ----------------------- Line 1 length = 65534 Line 2 length = 8194 Line 3 length = 65534 Line 4 length = 8194 Line 5 length = 65534 Line 6 length = 8194 Line 7 length = 65534 Line 8 length = 8194 Line 9 length = 65534 Line 10 length = 8194 Line 11 length = 65534 Line 12 length = 8194 Line 13 length = 65534 Line 14 length = 8194 Line 15 length = 65534 Line 16 length = 8194 Line 17 length = 65534 Line 18 length = 8194 Line 19 length = 65534 Line 20 length = 8194 Test with FileRead() ----------------------- Line 1 length = 73728 Line 2 length = 73728 Line 3 length = 73728 Line 4 length = 73728 Line 5 length = 73728 Line 6 length = 73728 Line 7 length = 73728 Line 8 length = 73728 Line 9 length = 73728 Line 10 length = 73728 +>15:26:29 AutoIT3.exe ended.rc:0 Edited November 14, 2008 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
PsaltyDS Posted November 14, 2008 Share Posted November 14, 2008 (edited) It was pointed out by herewasplato that this limit is old in AutoIt: FileReadLine Limit 65534 characters?Since that limit may have been imposed by Win9x compatibility back then, it might be worth a feature request to change it now. It is at the least a likely documentation update for the FileReadLine() function in the help file.There are work-arounds:1. Use FileRead() and StringSplit(), this is demonstrated in my code above.2. Use _FileReadToArray(), demonstrated in demo below.3. More options if you want to use WinAPI/FileSystemObject functions. Demo using _FileReadToArray(): #include <File.au3> Global $sFile = "test.txt", $avFile Global $sLine = "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()" FileDelete($sFile) Do $sLine &= $sLine Until Stringlen($sLine) > 2^16 ConsoleWrite("$sLine length = " & StringLen($sLine) & @LF) For $n = 1 To 10 FileWriteLine($sFile,$sLine) Next ConsoleWrite("Line count = " & _FileCountLines($sFile) & @LF) ; Test with _FileReadToArray() ConsoleWrite(@LF & "Test with FileReadToArray() -----------------------" & @LF) _FileReadToArray($sFile, $avFile) For $n = 1 To $avFile[0] ConsoleWrite("Line " & $n & " length = " & StringLen($avFile[$n]) & @LF) Next Edited November 14, 2008 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
PsaltyDS Posted November 15, 2008 Share Posted November 15, 2008 (edited) Created ticket number #681 to change documentation for FileReadLine() to explain the current 64K limit.Oops, too slow, as joon had already reported the documentation change in ticket #679.Also created Feature Request #682 to remove that limit. The DEVs are aware of the issue. Edited November 15, 2008 by PsaltyDS Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law Link to comment Share on other sites More sharing options...
Alex117 Posted November 16, 2008 Author Share Posted November 16, 2008 Hi everyone,Thank you for your helps.I will wait the update 3.2.13.11.Have a nice day,Alex117www.alexgiraud.net Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now