Fr33b0w Posted March 5, 2022 Share Posted March 5, 2022 (edited) There must be a very simple solution for this "problem". I know how I would do it if there is just a few but I need to do it for every instance and it might be 1000 of them. Quote QVzADyw", "author_thumbnail": "https://yt3.ggpht.com/ytc/AKedOLTt1rCz0emlXz_QUVNB7T1AH11QBO13oYbFZw=s176-c-k-c0x00ffffff-no-rj", "author_is_uploader": false, "parent": "root"}, {"id": "UgxjxCRsERmHwFHpnXN4AaABAg", "text": "Some Youtube comment as example", "timestamp": 1583452800, "time_text": "2 years ago", "like_count": 1, "is_favorited": false, "author": "Aramis Papadopulos", "author_id": "UCRkVkOOpOBYkdrvGLmAETLQ", "author_thumbnail": "https://yt3.ggpht.com/ytc/AKedOLSbooVjQtUXaSGBpjrxOWh3kVTfXLRpsvcmu-AP=s176-c-k-c0x00ffffff-no-rj", "author_is_uploader": false, "parent": "root"}, {"id": "UgxbPLIT8b8pFW3K3f54AaABAg", "text": "some other text example", "timestamp": 1583452800, "time_text": "2 years ago", "like_count": 0, "is_favorited": false, "author": "BL1TZ", "author_id": "UCXqbwq4gWUpk8yJC561Pqew", "author_thumbnail": "https://yt3.ggpht.com/ytc/AKedOLRmm233HPvOaL-ohfRufFpnpAaHEoaMxKcypVru=s176-c-k-c0x00ffffff-no-rj", "author_is_uploader": false, "parent": "root"}, {"id": "UgwRoRLyWM_TIiaTEZp4AaABAg", "text": "another text example and so on", "timestamp": 1583452800, "time_text": "2 years ago", "like_count": 6, "is_favorited": false, I guess it is just a few lines of code. i am not so good with regular expressions so solution without it for my better understanding would be much appreciated. I use autoit for a long time but I am not an expert and I am recovering from corona illness, haven't been coding for some time, etc. So, if any good soul would give me a hint, comments that I would like to extract are between "text": " and ", "timestamp": . Anyone? Thanks! Edited March 8, 2022 by Fr33b0w Link to comment Share on other sites More sharing options...
Solution Trong Posted March 5, 2022 Solution Share Posted March 5, 2022 I don't know how to use RegEx but you can use _StringBetween(): #include <String.au3> Local $InputData = '"text": "Some Youtube comment as example", "timestamp":346230, "text": "SomeSDGs example", "timestamp": 15833460, "text": "Some YoutFGNSFGJnt as example", "timestamp": 45634572800, "' $InputData = StringReplace($InputData, ', "', ',"') $InputData = StringReplace($InputData, '": ', '":') Local $textArray = _StringBetween($InputData, '"text":', ',"') If IsArray($textArray) Then For $i = 0 To UBound($textArray) - 1 ConsoleWrite($textArray[$i] & @CRLF) Next EndIf Local $timestampArray = _StringBetween($InputData, '"timestamp":', ',"') If IsArray($timestampArray) Then For $i = 0 To UBound($timestampArray) - 1 ConsoleWrite($timestampArray[$i] & @CRLF) Next EndIf Subz and Fr33b0w 2 Regards, Link to comment Share on other sites More sharing options...
Subz Posted March 5, 2022 Share Posted March 5, 2022 Or something like: #include <Array.au3> Global $sText = '"text": "Some Youtube comment as example", "timestamp":346230, "text": "SomeSDGs example", "timestamp": 15833460, "text": "Some YoutFGNSFGJnt as example", "timestamp": 45634572800, "' Global $aText = StringRegExp($sText, '(?<=\"text\": \").*?(?=\", \"timestamp\")', 3) _ArrayDisplay($aText) ad777 and Fr33b0w 1 1 Link to comment Share on other sites More sharing options...
Fr33b0w Posted March 5, 2022 Author Share Posted March 5, 2022 (edited) Thank You very much VIP. This solve my probem and do exactly what I wanted to achieve. It was not that simple I thought it could be, so sorry for that. I am learning from your example. i wish you good and healthy life. Reedit: Thanks Subz! This regex I can understand and learn from it. Guys, thanks a lot. You made my day. Edited March 5, 2022 by Fr33b0w Link to comment Share on other sites More sharing options...
Fr33b0w Posted March 18, 2022 Author Share Posted March 18, 2022 (edited) Sorry... Still have some problems with this. It wont process all files... Did try to rename them, did try to change the code. but it wont work... It process 223 files of 327 and I dont know why... Script I am trying to use is: expandcollapse popup#include <String.au3> #include <Array.au3> Local $search = FileFindFirstFile("*.info.json") DirCreate(@ScriptDir & "\comments\") Local $dir = @ScriptDir & "\comments\" If $search = -1 Then MsgBox($MB_SYSTEMMODAL, "", "Error: No files/directories matched the search pattern.") Return False EndIf While 1 Local $file = FileFindNextFile($search) If @error Then ExitLoop Local $target = StringReplace($file, '.info.json', '.txt') Local $InputData = FileRead($file) $InputData = StringReplace($InputData, ', "', ',"') $InputData = StringReplace($InputData, '": ', '":') Local $textArray = _StringBetween($InputData, '"text":', ',"') If IsArray($textArray) Then For $i = 0 To UBound($textArray) - 1 FileWriteLine($dir & $target, @CRLF & " * " & $textArray[$i] & @CRLF) Next EndIf Local $timestampArray = _StringBetween($InputData, '"timestamp":', ',"') If IsArray($timestampArray) Then For $i = 0 To UBound($timestampArray) - 1 FileWriteLine($dir & $target, @CRLF & " * " & $textArray[$i] & @CRLF) Next EndIf FileClose($dir & $target) WEnd Exit I added files which I am trying to scrap... I let them be in a same folder where designated files are... Files are in attachment... Thanks. test.zip Edited March 18, 2022 by Fr33b0w I didnt enter how many files are processed of how many targeted... Brain burnt by non working script... Link to comment Share on other sites More sharing options...
Nine Posted March 18, 2022 Share Posted March 18, 2022 (edited) Few suggestions for your script : 1- Use _FileListToArray instead of FileFindFirstFile/FileFindNextFile. You can then use _ArrayDisplay to make sure you got all the files in the array. 2- Your second FileWriteLine should use $timestampArray instead of $textArray 3- FileClose on a named file is useless (see help file : it should be a handle) 4- You should add a consoleWrite warning when your stringBetween does not work 5- Adding traces to a script to understand what is going on is the best way to debug... Edited March 18, 2022 by Nine Fr33b0w 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Fr33b0w Posted March 18, 2022 Author Share Posted March 18, 2022 Thanks. I decided to use second example, which I can see its better, but far away from my level of knowledge. And it works even better then the first one, but with much more difficulty to play with it. This way it looks like script is playing with me.... Problem is that in this case I cant add @CRLF after every set of text which is find and I don't know how to do that. I did try to use StringReplace function to replace every @CRLF with two, so I will get a blank line after every part of text that is found.... But I am not good with arrays and RegEX... Got nothing... I am still using FindFile instead of _FileListToArray as you have been suggested, but thats only because I would like to make this code work on field where I am less uncomfortable and after that I could try to do it another way. Just... for someone this is a piece of cake and for me is rest of that cake... How to add @CRLF or @CR that will work? #include <String.au3> #include <Array.au3> #include <File.au3> Local $search = FileFindFirstFile("*.info.json") DirCreate(@ScriptDir & "\comments\") Local $dir = @ScriptDir & "\comments\" If $search = -1 Then MsgBox($MB_SYSTEMMODAL, "", "Error: No files/directories matched the search pattern.") Return False EndIf While 1 Local $file = FileFindNextFile($search) If @error Then ExitLoop Local $target = StringReplace($file, '.info.json', '.txt') Local $InputDataa = FileRead($file) Global $InputDatab = StringRegExp($InputDataa, '(?<=\"text\": \").*?(?=\", \"timestamp\")', 3) ;_ArrayDisplay($InputDatab) _FileWriteFromArray($dir & $target,$InputDatab, 1) WEnd Exit Link to comment Share on other sites More sharing options...
Nine Posted March 18, 2022 Share Posted March 18, 2022 Replace your _FileWriteFromArray line by this one: FileWriteLine($dir & $target, _ArrayToString($InputDatab, "|")) Fr33b0w 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Fr33b0w Posted March 18, 2022 Author Share Posted March 18, 2022 (edited) 5 minutes ago, Nine said: Replace your _FileWriteFromArray line by this one: FileWriteLine($dir & $target, _ArrayToString($InputDatab, "|")) I have seen that default array delimiter in help but wasnt sure how to use it. It now replaces existing carriage return with "|". Any tip for that? So, from: Line 1 Line 2 Line 3 I am getting Line 1|Line 2|Line 3 Edited March 18, 2022 by Fr33b0w Link to comment Share on other sites More sharing options...
Nine Posted March 18, 2022 Share Posted March 18, 2022 Replace "|" by @CRLF Fr33b0w 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Fr33b0w Posted March 18, 2022 Author Share Posted March 18, 2022 FileWriteLine($dir & $target, _ArrayToString($InputDatab, @CRLF & @CRLF)) Just for the record it had to go like this. Thanks a ton, I have solved a problem and did what I wanted to do! Happy Link to comment Share on other sites More sharing options...
Fr33b0w Posted April 25 Author Share Posted April 25 (edited) Hi sorry for bumping an old post but again i have a problem because site code changed. Everything worked fine but now there is a new line of code which unable this regex to work. Instead of "author_id" as closure now there is sometimes "like_count" instead of author_id which is still there but after much more code I dont need to extract. I did try to use delimiter in RegEx but I guess regex is not easy for me... Can someone just give me a suggestion how to make a regex which will say: Get text from here to (here or here). I did try to put it like this: Global $InputDatab = StringRegExp($InputDataa, '(?<=\"text\": \").*?(?=\", \"author_id\|like_count\")', 3) ...but it didnt work. Line instead of this was taking data from "text:" to "timestamp" Global $InputDatab = StringRegExp($InputDataa, '(?<=\"text\": \").*?(?=\", \"timestamp\")', 3) Here is an example of text which is in .info.json: Quote "text": "8 hours later the Fire HD 8 is $109. 99. I wish I would have gotten to watch this earlier. \nThanks for all you do Matt even if I'm late to the party.", "like_count": 1, "author_id": "UCWFKQey1WtCgGyxHPMhPtGQ", "author": "@kaceycampbell5550", "author_thumbnail": "https://yt3.ggpht.com/ytc/AOPolaSLWOprKke3uCsTselIrClAYoEM8RqDNcgadJvxBg=s176-c-k-c0x00ffffff-no-rj", "parent": "root", "_time_text": "2 years ago", "timestamp": 1630454400, "author_url": "https://www.youtube.com/channel/UCWFKQey1WtCgGyxHPMhPtGQ", "author_is_uploader": false, "is_favorited": false}, {"id": "Ugx1HNwbzMS9V0pUrgN4AaABAg", "text": "Lmao that first product is definitely photoshopped 😂", "like_count": 1, "author_id": "UCzeJMeX2bFwqvs9IJGKorfQ", "author": "@mrhappygoluckyjock", "author_thumbnail": "https://yt3.ggpht.com/ytc/AOPolaTRITy2x4xoy7aYMgIpyvmdF-ixQlv9thvtg7To=s176-c-k-c0x00ffffff-no-rj", "parent": "root", "_time_text": "2 years ago", "timestamp": 1630454400, "author_url": "https://www.youtube.com/channel/UCzeJMeX2bFwqvs9IJGKorfQ", "author_is_uploader": false, "is_favorited": false}, {"id": "Ugx1HNwbzMS9V0pUrgN4AaABAg.9HYyh7phVcw9HbAfkT-Wfr", "text": "Really, how can you tell? Genuinely asking, it looks too good to me", "author_id": "UCzTLWlN4pDD1jLiJJLVrfDA", "author": "@kikikiki3216", "author_thumbnail": "https://yt3.ggpht.com/ytc/AOPolaQiA8_KkqCrK7o7WNNL5qLk3C-PrOy1S591OQ=s176-c-k-c0x00ffffff-no-rj", "parent": "Ugx1HNwbzMS9V0pUrgN4AaABAg", "_time_text": "2 years ago", "timestamp": 1630454400, "author_url": "https://www.youtube.com/channel/UCzTLWlN4pDD1jLiJJLVrfDA", "author_is_uploader": false, "is_favorited": false}, {"id": "UgzhitiBqqS5dUzDfIZ4AaABAg", "text": "That echo auto does not have good user reviews", "author_id": "UC8Krza6o2IbS9zTGjYgd4jA", "author": "@soupedkid13", "author_thumbnail": "https://yt3.ggpht.com/ytc/AOPolaQbtLnxh1qhqgYU8i3LsO_6qE8lCRmBbV_OJ6f-=s176-c-k-c0x00ffffff-no-rj", "parent": "root", "_time_text": "2 years ago", "timestamp": 1630454400, "author_url": "https://www.youtube.com/channel/UC8Krza6o2IbS9zTGjYgd4jA", "author_is_uploader": false, "is_favorited": false}, {"id": "UgyWmQAcvH3gCxWMz9x4AaABAg", "text": "Merry Christmas , thank you for your videos and energy", "author_id": "UCOcfr_BebW1QqXpTI-PNEaQ", "author": "@teresafinnerty207", "author_thumbnail": "https://yt3.ggpht.com/ytc/AOPolaQDSwM9-eRu5aBKVVC1bh4xx4A6LoH2Vaompo-j=s176-c-k-c0x00ffffff-no-rj", "parent": "root", "_time_text": "2 years ago", "timestamp": 1630454400, "author_url": "https://www.youtube.com/channel/UCOcfr_BebW1QqXpTI-PNEaQ", "author_is_uploader": false, "is_favorited": false}, {"id": "UgyWmQAcvH3gCxWMz9x4AaABAg.9HYtClhcvg19HYwMNLud3_", "text": "Thanks for being here Teresa!", "author_id": "UC5Qbo0AR3CwpmEq751BIy0g", "author": "@thedealguy", "author_thumbnail": "https://yt3.ggpht.com/PHbn_ZwKQ-3PPhTtF7k6Q5t-vGBnENCPZAQc9lNe-EGCeJJ8T5DgbNIvGSSmFNVUrOCV6l3q=s176-c-k-c0x00ffffff-no-rj", "parent": "UgyWmQAcvH3gCxWMz9x4AaABAg", "_time_text": "2 years ago", "timestamp": 1630454400, "author_url": "https://www.youtube.com/channel/UC5Qbo0AR3CwpmEq751BIy0g", "author_is_uploader": true, "is_favorited": false, "author_is_verified": true}, {"id": "U So, now there are two lines which can be a closure for getting text: ', "like_count":' and ', "author_id":' How can I add in RegEx code that would do what i want? I did try it on my own with examples I found online but it does not work... Again much thanks in advance for this. Sorry, I just tried a bit more and solved a problem. Correct line is: Global $InputDatab = StringRegExp($InputDataa, '(?<=\"text\": \").*?(?=\", \"author_id|\", "like_count\")', 3) Thanks, sorry! Edited April 25 by Fr33b0w Had a problem which I couldnt solve but then waiting for an answer I had an idea and... soleved it myself. Link to comment Share on other sites More sharing options...
Nine Posted April 25 Share Posted April 25 (edited) Global $InputDatab = StringRegExp($InputDataa, '(?<="text": )(.+?)(?|, "like_count"|, "author_id")', 3) Try this. Edited April 25 by Nine forgot to have a capturing group Fr33b0w 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Fr33b0w Posted April 25 Author Share Posted April 25 Hi Nine and thanks for trying to help. This version of a solution of yours leave ", "like_count" and , "author_id" after every line. I am very bad at regex so i dont know why but would like to see if you can correct it because your solution looks much more clear to me. Link to comment Share on other sites More sharing options...
Nine Posted April 25 Share Posted April 25 already done, see my edit Fr33b0w 1 “They did not know it was impossible, so they did it” ― Mark Twain Spoiler Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Screen Scraping Multi-Threading Made Easy Link to comment Share on other sites More sharing options...
Fr33b0w Posted April 25 Author Share Posted April 25 Sorry, didnt refresh. Yes it works great! Thank You for your help! Glad to see you again. Nine 1 Link to comment Share on other sites More sharing options...
AspirinJunkie Posted April 25 Share Posted April 25 The string appears to be a JSON string. Have you already tried one of the corresponding JSON UDFs? This should be easier to understand and more stable than using RegEx. Fr33b0w 1 Link to comment Share on other sites More sharing options...
Fr33b0w Posted April 25 Author Share Posted April 25 Ow, thanks for that. I am looking forward to check that UDFs. Have not been around much lately. I have to start learning RegEx proper way but I like also what you said about JSON UDFs... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now