MikeFez Posted March 8, 2016 Posted March 8, 2016 (edited) Hello, I have a script that copies the text of a webpage into a variable named "$pagetext", and then runs a RegEx search looking for an email address in a specific spot. Originally, the format in the area of the page I'd need would be: Quote Phone: (555) 555-5555 Email: example@example.com Example Company To which, the following RegEx code worked: $extractedEmail = StringRegExp($pagetext, "(?i)(?m:^)\s*email:\s*(.+)(?:\v|$)", 1) Unfortunately, the format recently changed so that it now looks like: Quote Phone (555) 555-5555 Email example@example.com Company Example Company Using this tool, I was able to determine that after "Email", comes a [Tab] and then an [End of Line(LF)]. Therefore, I tweaked my RegEx to this: $extractedEmail = StringRegExp($pagetext, "(?i)(?m:^)\s*email\t\n*(.+)(?:\v|$)", 1) And while this tool shows that it should be working, it doesn't seem to be working within AutoIt. Does anyone have any suggestions on how I could resolve this, or what I'm doing wrong? Edited March 8, 2016 by MikeFez Clarification
iamtheky Posted March 8, 2016 Posted March 8, 2016 $pagetext = "Phone" & @LF & _ "(555) 555-5555" & @LF & _ "Email" & @TAB & @LF & _ "example@example.com" & @LF & _ "Company" & @LF & _ "Example Company" msgbox(0, '' , StringRegExp(StringStripWS($pagetext , 8), "Email(.*?)Company", 3)[0]) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
jguinch Posted March 8, 2016 Posted March 8, 2016 Another one : $pagetext = "Phone" & @CRLF & _ "(555) 555-5555" & @CRLF & _ "Email" & @TAB & @CRLF & _ "example@example.com" & @CRLF & _ "Company" & @CRLF & _ "Example Company" $extractedEmail = StringRegExp($pagetext, "(?is)\s*email\s*([^@]+@\N+)", 1) MsgBox(0, "", $extractedEmail[0] ) Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF
mikell Posted March 8, 2016 Posted March 8, 2016 Usually in webpages the email addresses have already been tested to check if valid, so this should work $extractedEmail = StringRegExp($pagetext, ".+@.+", 1)
MikeFez Posted March 10, 2016 Author Posted March 10, 2016 Thanks for the replies everyone. I tried each of those variations but unfortunately, none of them seem to be working on my end. I've attached a copy of the form I'm trying to pull this from which seems to be the same as the one I quoted in the original post, but does anyone know why this would not work? Example.txt
mikell Posted March 10, 2016 Posted March 10, 2016 Hum this works for me ... $txt = FileRead("Example.txt") $extractedEmail = StringRegExp($txt, ".+@.+", 1)[0] Msgbox(0,"", $extractedEmail)
MikeFez Posted March 10, 2016 Author Posted March 10, 2016 Hi mikell, You're completely right. The issue was in another spot of my code - thank you very much for taking the time to help me out.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now