Jump to content

LLMs and Scraping of autoit forums


Recommended Posts

Hi All, I've been playing with my own home server and one of the cool things that I've been playing with is running my own LLM for privacy purposes plus to mainly screw around. The only programming language that I really 'know' is autoit. I would love to use my LLM to help me with AutoIT, in order to really get it to understand that language, I would need to scrape the forums / help files. I'm sure I could upload the help files if it was ok but I'm not comfortable scraping the forums, especially without permission. Is it possible instead to get like monthly 'scrapes' or something so I don't have to do my own? I imagine that scraping every page uses up a decent amount of resources and I don't feel comfortable doing that without permission / sharing / finding a better alternative. Any ideas? Admins? Better options? Also anyone know how to turn all the help files into PDF's or something for easier LLM digestion?

 

Link to comment
Share on other sites

35 minutes ago, BatMan22 said:

Also anyone know how to turn all the help files into PDF's or something for easier LLM digestion?

Yes. The help file is just HTML compiled into a CHM file. ( https://www.autoitscript.com/autoit3/files/archive/autoit/autoit-docs-v3.3.16.1-src.zip )

Follow the link to my code contribution ( and other things too ).
FAQ - Please Read Before Posting.
autoit_scripter_blue_userbar.png

Link to comment
Share on other sites

1 hour ago, BatMan22 said:

I would need to scrape the forums...Any ideas?

 

A while back I created a script that does something similar.  My script periodically checks the forum activity looking for new topics whose titles or text contain one or more specified keywords.  My keywords are related to my UDFs and areas of interest.  For instance, if a new topic's title contains a keyword like CryptoNG, jq, encryption, hash, json, HTTPAPI, etc., the script will send me an email alert. 

Instead of scraping the whole forum, I just use the forum's RSS feeds.  The RSS activity feeds contain something like the last 25 posts.  The RSS feeds include the post's date, time, title, and text in an XML format.  Since it only has the last 25 or so entries, it isn't a lot of data to parse and it uses far less bandwidth than the normal user that hangs out monitoring, reading, or searching the forums.  And depending on the tools you use for parsing and processing the information, it can be quite fast to go thru the feeds.  I convert the XML feed to JSON and process that JSON using jq.  Of course you can use whatever tools and processes you're most comfortable with.

For an example, here's the URL for the ALL ACTIVITY feed:
https://www.autoitscript.com/forum/discover/all.xml

 

Edited by TheXman
Link to comment
Share on other sites

  • Jos locked this topic
Guest
This topic is now closed to further replies.
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...