Jump to content

Recommended Posts

Posted

Hi All, I've been playing with my own home server and one of the cool things that I've been playing with is running my own LLM for privacy purposes plus to mainly screw around. The only programming language that I really 'know' is autoit. I would love to use my LLM to help me with AutoIT, in order to really get it to understand that language, I would need to scrape the forums / help files. I'm sure I could upload the help files if it was ok but I'm not comfortable scraping the forums, especially without permission. Is it possible instead to get like monthly 'scrapes' or something so I don't have to do my own? I imagine that scraping every page uses up a decent amount of resources and I don't feel comfortable doing that without permission / sharing / finding a better alternative. Any ideas? Admins? Better options? Also anyone know how to turn all the help files into PDF's or something for easier LLM digestion?

 

Posted (edited)
1 hour ago, BatMan22 said:

I would need to scrape the forums...Any ideas?

 

A while back I created a script that does something similar.  My script periodically checks the forum activity looking for new topics whose titles or text contain one or more specified keywords.  My keywords are related to my UDFs and areas of interest.  For instance, if a new topic's title contains a keyword like CryptoNG, jq, encryption, hash, json, HTTPAPI, etc., the script will send me an email alert. 

Instead of scraping the whole forum, I just use the forum's RSS feeds.  The RSS activity feeds contain something like the last 25 posts.  The RSS feeds include the post's date, time, title, and text in an XML format.  Since it only has the last 25 or so entries, it isn't a lot of data to parse and it uses far less bandwidth than the normal user that hangs out monitoring, reading, or searching the forums.  And depending on the tools you use for parsing and processing the information, it can be quite fast to go thru the feeds.  I convert the XML feed to JSON and process that JSON using jq.  Of course you can use whatever tools and processes you're most comfortable with.

For an example, here's the URL for the ALL ACTIVITY feed:
https://www.autoitscript.com/forum/discover/all.xml

 

Edited by TheXman
  • Jos locked this topic
Guest
This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...