PnD Posted May 1, 2021 Share Posted May 1, 2021 Dear all Currently, I am having a text file like this Description REMOTE Order No : 1028263 Date Range: 04/19/2021 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Appointment What I am trying to do is to get only "1028263" and I could be able to do that easily with the script below #include <MsgBoxConstants.au3> #include <File.au3> #include <Array.au3> Global $file= @ScriptDir & "\test.txt" Global $sList = StringReplace(FileRead($file),@CRLF, ",") Global $ITIorder = StringRegExp($sList, "ITI Order No(.*?),",3) $RealITIOrder= stringright ($ITIorder[0],7) MsgBox(0,0,$RealITIOrder) However, sometime our text has different value such as Description REMOTE Order No : 1028263 Date Range: 04/19/2021 !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Appointment in which the "1028263" is moving to the next line. I am kind of stuck on how to incorporate an new condition in the script above to tackle this new text value. I would very appreciate if you could provide your feedback. Thank you all. Link to comment Share on other sites More sharing options...
Nine Posted May 1, 2021 Share Posted May 1, 2021 Try : Global $ITIorder = StringRegExp(FileRead($sFile), "(?s)Order No\h*:[^\d]*(\d*)", 1)[0] ConsoleWrite ($ITIorder & @CRLF) PnD 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
jguinch Posted May 1, 2021 Share Posted May 1, 2021 or just Order No\D*(\d+) PnD 1 Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Gianni Posted May 1, 2021 Share Posted May 1, 2021 surely the use of regexp is more elegant, however some time ago I created two functions that are based on the string functions that are used to find a word that follows another known word or even the word that precedes the known word. if you are interested you can find them here : https://www.autoitscript.com/forum/topic/155726-searching-for-a-string-after-the-string/?do=findComment&comment=1174545 PnD 1 Chimp small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt.... Link to comment Share on other sites More sharing options...
JockoDundee Posted May 1, 2021 Share Posted May 1, 2021 _StringBetween(StringStripWS(FileRead($sFile),8), "OrderNo:", "DateRange:") PnD 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
PnD Posted May 2, 2021 Author Share Posted May 2, 2021 8 hours ago, Nine said: Try : Global $ITIorder = StringRegExp(FileRead($sFile), "(?s)Order No\h*:[^\d]*(\d*)", 1)[0] ConsoleWrite ($ITIorder & @CRLF) Thank you so much Nine! It works beautifully for just only a single code. Link to comment Share on other sites More sharing options...
PnD Posted May 2, 2021 Author Share Posted May 2, 2021 8 hours ago, jguinch said: or just Order No\D*(\d+) Thank you Jguinch, i tried your suggestion and it works perfectly as $ITIorder = StringRegExp(FileRead($File), "(?s)Order No\D*(\d+)", 1)[0] Link to comment Share on other sites More sharing options...
PnD Posted May 2, 2021 Author Share Posted May 2, 2021 7 hours ago, JockoDundee said: _StringBetween(StringStripWS(FileRead($sFile),8), "OrderNo:", "DateRange:") Thank you JockoDundee! I tried your suggestion but it gave me the error message as _StringBetween(): undefined function. Link to comment Share on other sites More sharing options...
PnD Posted May 2, 2021 Author Share Posted May 2, 2021 8 hours ago, Chimp said: surely the use of regexp is more elegant, however some time ago I created two functions that are based on the string functions that are used to find a word that follows another known word or even the word that precedes the known word. if you are interested you can find them here : https://www.autoitscript.com/forum/topic/155726-searching-for-a-string-after-the-string/?do=findComment&comment=1174545 Thank you Chimp! I will definitely try out your suggestion as it may be helpful for other scenario where regexp is not needed! Link to comment Share on other sites More sharing options...
JockoDundee Posted May 2, 2021 Share Posted May 2, 2021 26 minutes ago, PnD said: Thank you JockoDundee! I tried your suggestion but it gave me the error message as _StringBetween(): undefined function. #include <String.au3> $ITIOrder=_StringBetween(StringStripWS(FileRead($sFile),8), "OrderNo:", "DateRange:") Pnd - please try it now, I left out the include... PnD 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Musashi Posted May 2, 2021 Share Posted May 2, 2021 @PnD : _StringBetween returns an array (not a string), so the result should be presented e.g. as follows : #include <String.au3> $sFile= @ScriptDir & "\test.txt" $aArr =_StringBetween(StringStripWS(FileRead($sFile),8), "OrderNo:", "DateRange:") If Not @error Then $ITIOrder = $aArr[0] MsgBox(0, "Order No. :", $ITIOrder) Else MsgBox(0, "Order No. :", "no match found") EndIf PnD 1 "In the beginning the Universe was created. This has made a lot of people very angry and been widely regarded as a bad move." Link to comment Share on other sites More sharing options...
JockoDundee Posted May 2, 2021 Share Posted May 2, 2021 2 minutes ago, Musashi said: @PnD : _StringBetween returns an array (not a string), so the result should be presented e.g. as follows : I was just mirroring @PnD's own code which returns an array: Global $ITIorder = StringRegExp($sList, "ITI Order No(.*?),",3) @Nine actually started the trend of returning the first match. PnD 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
PnD Posted May 2, 2021 Author Share Posted May 2, 2021 18 minutes ago, Musashi said: @PnD : _StringBetween returns an array (not a string), so the result should be presented e.g. as follows : #include <String.au3> $sFile= @ScriptDir & "\test.txt" $aArr =_StringBetween(StringStripWS(FileRead($sFile),8), "OrderNo:", "DateRange:") If Not @error Then $ITIOrder = $aArr[0] MsgBox(0, "Order No. :", $ITIOrder) Else MsgBox(0, "Order No. :", "no match found") EndIf Thank you both Musashi and JockoDundee for following up on this thread. I tried your codes and it worked perfectly as well. The only disadvantage of this method is we have to rely on OrderNo and DateRange for the script to work. If daterange is changed to !!!!! or something else, then it will not work. From my personal opinion, the solutions from Nine and jguinch work best regardless of scenarios. Link to comment Share on other sites More sharing options...
Deye Posted May 2, 2021 Share Posted May 2, 2021 Get the first explicit row with numbers, don't have to mention any respective previous row. $ION = StringRegExpReplace($s, "[^\w\d].*|\D{0,}", "") PnD 1 Link to comment Share on other sites More sharing options...
PnD Posted May 3, 2021 Author Share Posted May 3, 2021 13 hours ago, Deye said: Get the first explicit row with numbers, don't have to mention any respective previous row. $ION = StringRegExpReplace($s, "[^\w\d].*|\D{0,}", "") Hi Deye Thank you for your solution. I tried it and it worked great! However, it only work for this particular scenario in which any lines before "Order No : 1028263" that do not have a number. If, for example, "Remote 15", then 15 will be capture instead of 1028263. I still think solutions from from Nine and jguinch are perfect for all scenarios. Link to comment Share on other sites More sharing options...
Deye Posted May 3, 2021 Share Posted May 3, 2021 One more thing worth knowing is that if the sequence of digits you are looking for is of at least "6" and you know for a fact that this sequence is always longer than any other sequence elsewhere in the data. then it could also be done this way: Local $s = 'Description' _ & @LF & 'REMOTE: 12' _ & @LF & 'Order No : 1028263' _ & @LF & 'Date Range: 04/19/2021' Local $a = StringRegExp($s, "\d{6,}", 3) MsgBox(0, "", ($a ? "None Found" : $a[0])) PnD 1 Link to comment Share on other sites More sharing options...
PnD Posted May 4, 2021 Author Share Posted May 4, 2021 13 hours ago, Deye said: One more thing worth knowing is that if the sequence of digits you are looking for is of at least "6" and you know for a fact that this sequence is always longer than any other sequence elsewhere in the data. then it could also be done this way: Local $s = 'Description' _ & @LF & 'REMOTE: 12' _ & @LF & 'Order No : 1028263' _ & @LF & 'Date Range: 04/19/2021' Local $a = StringRegExp($s, "\d{6,}", 3) MsgBox(0, "", ($a ? "None Found" : $a[0])) Thank you Deye for this quick solution! This also works great and the \d{6,} is fantastic! Link to comment Share on other sites More sharing options...
JockoDundee Posted May 4, 2021 Share Posted May 4, 2021 @PnD, one thing that I was going to point out earlier when you commented about my code: On 5/1/2021 at 11:54 PM, PnD said: The only disadvantage of this method is we have to rely on OrderNo and DateRange was the, IMHO, this dependence may actually be an advantage, depending on factors that only you know. However, looking at the data-set you provided: DescriptionREMOTEOrder No :1028263Date Range: 04/19/2021!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Appointment it appears to be some sort of order/invoice/service memo, with possibly multiple free-text fields, e.g. “Description”. Which allows for the possibility of a part#, a P.O. Number, phone number without punct., a previous order # cut and pasted from another record complete with the words “Order No:”. Which could mean relying on just digits, or even simply the single token “Order No:” might lead to misinterpretation. Moreover, this entered text could be hard to predict, if free-entry is allowed. On the other hand, if it can be confirmed that DateRange does indeed follow OrderNo, this would not change, barring a modification of the program or template that creates it. IMO, if this is not a one-off report, but something that runs periodically, you may want to use even more filters and sanitizers. Musashi 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
PnD Posted May 5, 2021 Author Share Posted May 5, 2021 On 5/3/2021 at 8:24 PM, JockoDundee said: @PnD, one thing that I was going to point out earlier when you commented about my code: was the, IMHO, this dependence may actually be an advantage, depending on factors that only you know. However, looking at the data-set you provided: DescriptionREMOTEOrder No :1028263Date Range: 04/19/2021!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!Appointment it appears to be some sort of order/invoice/service memo, with possibly multiple free-text fields, e.g. “Description”. Which allows for the possibility of a part#, a P.O. Number, phone number without punct., a previous order # cut and pasted from another record complete with the words “Order No:”. Which could mean relying on just digits, or even simply the single token “Order No:” might lead to misinterpretation. Moreover, this entered text could be hard to predict, if free-entry is allowed. On the other hand, if it can be confirmed that DateRange does indeed follow OrderNo, this would not change, barring a modification of the program or template that creates it. IMO, if this is not a one-off report, but something that runs periodically, you may want to use even more filters and sanitizers. hi @JockoDundee, you are absolutely correct, and you are a very good observer with very good logical thinking. (Wondering how you would end up here instead of being a detective 😄. ) The data that I provided is just a small sample of the real one that due to the sensitivity of it that I cannot post them here publicly. It is actualy from an Ntag that has all kind of information of a product. I extracted the text from the scanned document in pdf format using OCR software and that why you can see !!!!!!!!!!!! and many more weird characters which I tried not to include them in my text sample. There is one thing for sure that the Order No: is fixed and the 6 digit number could be either on the same line or on the second line depending on how the OCR convert images to text. The rest of the data is changed dynamically. Base on my real data, and through a lot of testing from the solutions that you all geniuses provided, StringRegExp is still the best solution to get the result. However, that does not mean your stringbetween solution is not great. I did actually use it for my other project and it works beautifully as well. Programming is not my expertise and I am just starting to learn how to write codes in the last couple weeks. Autoit just accidentally opens the gateway for me to explore and I am actually learning a lot through your generosity in helping me and others in this forum. I once again thank you all a lot. Another day and another learning. Life is beautiful. Link to comment Share on other sites More sharing options...
JockoDundee Posted May 5, 2021 Share Posted May 5, 2021 15 minutes ago, PnD said: Base on my real data, and through a lot of testing from the solutions that you all geniuses provided, StringRegExp is still the best solution to get the result. However, that does not mean your stringbetween solution is not great... Just to be clear, I'm not necessarily advocating for stringbetween, I only used it to mix it up Had I been a first responder, I may have reg'ed myself. Rather, I'm speaking about the business logic, in that, IF accuracy is paramount, then maybe you can find another filter, for instance line# Range etc. But since you're ok with it, then happy data mining! PnD 1 Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now