caramen Posted September 28, 2020 Share Posted September 28, 2020 (edited) Well I'm trying to improve myself days after days lol. StringRegExp is very powerfull, but I'm stuck with this*. It would be hard to put all my attempts here. I did regex something like 4 hours. And this is my best match : $TextToTest : Jordane-Durand Guevara-Lolito bllabalbalabalbalbabalablalab Jordane-Durand Guevara-LOLITO bllabalbalabalbalbabalablalab Jordane-Durand GUEVARA-Lolito bllabalbalabalbalbabalablalab Jordane-DURAND Guevara-Lolito bllabalbalabalbalbabalablalab JORDANE-Durand Guevara-Lolito bllabalbalabalbalbabalablalab Jordane GUEVARA bllabalbalabalbalbabalablalab JORDANE Guevara bllabalbalabalbalbabalablalab Jordane Guevara bllabalbalabalbalbabalablalab BLOG DE SHOPIFY bllabalbalabalbalbabalablalab GUEVARA Mme. Le-mana. Je suis joignable sans problème bllabalbalabalbalbabalablalab vous pouvez également demander Durand-Michel Marc ou Levis PLOUQUEZ This is my best match : $MyPattern : (([A-Z][a-zA-Z]+)[-]([A-Z][a-zA-Z]+)|([A-Z][a-zA-Z]+)) (([A-Z][a-zA-Z]+)[-]([A-Z][a-zA-Z]+)|([A-Z][a-zA-Z]+)) The goal is to have all name and surname here but without BLOG DE SHOPIFY Anyone had better pattern for name and surname ? A LATE edition : To add : BLOG DE SHOPIFY Can change "TO ANYTHING" Just want to avoid the selection of randoms capitals letters. Edited September 29, 2020 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
AspirinJunkie Posted September 29, 2020 Share Posted September 29, 2020 It is a bit difficult to find a distinguishing feature that separates BLOG DE SHOPIFY from a name by your definition. One approach would be to define what can be placed before and after a name. If you say: A name can only have a beginning of a line or a lower case word in front of it and a word in capital letters can not appear again behind the name then you could solve this in the following way: #include <Array.au3> $sString = ClipGet() $sPattern = '(?xms)' & @CRLF & _ '(?xs) (?> ^ | \b[[:lower:]]+ ) \h* \K # positive list of what may appear before the name' & @CRLF & _ '( \b [[:upper:]] [[:alpha:]]+ \b (?>-\b [[:upper:]] [[:alpha:]]+ \b)? \h \b [[:upper:]] [[:alpha:]]+ \b (?:-\b [[:upper:]] [[:alpha:]]+ \b)? )' & @CRLF & _ '(?!\h* \b [[:upper:]] [[:alpha:]]+ \b (?>-\b [[:upper:]] [[:alpha:]]+ \b)?)' $aNames = StringRegExp($sString, $sPattern, 3) _ArrayDisplay($aNames, "names found", "", 64) Link to comment Share on other sites More sharing options...
dmob Posted September 29, 2020 Share Posted September 29, 2020 5 minutes ago, AspirinJunkie said: #include <Array.au3> $sString = ClipGet() $sPattern = '(?xms)' & @CRLF & _ '(?xs) (?> ^ | \b[[:lower:]]+ ) \h* \K # positive list of what may appear before the name' & @CRLF & _ '( \b [[:upper:]] [[:alpha:]]+ \b (?>-\b [[:upper:]] [[:alpha:]]+ \b)? \h \b [[:upper:]] [[:alpha:]]+ \b (?:-\b [[:upper:]] [[:alpha:]]+ \b)? )' & @CRLF & _ '(?!\h* \b [[:upper:]] [[:alpha:]]+ \b (?>-\b [[:upper:]] [[:alpha:]]+ \b)?)' $aNames = StringRegExp($sString, $sPattern, 3) _ArrayDisplay($aNames, "names found", "", 64) How many aspirins did that require 😄 Link to comment Share on other sites More sharing options...
AspirinJunkie Posted September 29, 2020 Share Posted September 29, 2020 12 minutes ago, dmob said: How many aspirins did that require 😄 Not as many as one might think at first. The pattern is a variation of this more clear: (?xs) ((?(DEFINE) (?<NamePart> \b [[:upper:]] [[:alpha:]]+ \b) (?<Name> (?&NamePart) (?>-(?&NamePart))? ) (?<FullName> (?> ^ | \b[[:lower:]]+ ) \h* \K # positive list of what may appear before the name (?&Name) \h (?&Name) (?!\h* (?&Name)) ) ) (?&FullName) ) Only because of the problem that I couldn't squeeze the matches into group 1 (and thus $STR_REGEXPARRAYGLOBALMATCH was not sufficient) I rewrote the pattern accordingly. Link to comment Share on other sites More sharing options...
genius257 Posted September 29, 2020 Share Posted September 29, 2020 (edited) here you go: (?mx)(?!\QBLOG\E|\QSHOPIFY\E)\b(?:[A-Z][a-zA-Z]{3,}(?:-[A-Z][a-zA-Z]{3,})?)(?(?=\s[A-Z][a-zA-Z]{3,}(?:\s|$))\s[A-Z][a-zA-Z]+|)\b Edited September 29, 2020 by genius257 added flags My highlighted topics: AutoIt Package Manager, AutoItObject Pure AutoIt, AutoIt extension for Visual Studio Code Github: AutoIt HTTP Server, AutoIt HTML Parser Link to comment Share on other sites More sharing options...
caramen Posted September 29, 2020 Author Share Posted September 29, 2020 (edited) 5 hours ago, AspirinJunkie said: How many aspirins did that require By the way, yesterday was my first total 4 hours in a row of trying regex without interruption. This is a real effect. It is the first time I don't have enough room in my head to think about "all" what I'm typing. I don't know if I'm clear enough by saying that. Is there a way to make groups and don't see the precedent groups to edit the pattern more easy? I mean If I want to ADD a condition after made my first attempt it's kinda F*** HARD to do that and think about the total pattern. First time my head was not able to do something not because the technical knowledge issue but because of the complexity to think about it. I need a trick for that... And I know the good practice... would say you need to see all your pattern to see what you're doing but... It's not intuitive at all… Well Thank for all these answer I'm gonna read, try, try to understand and answer. 5 hours ago, AspirinJunkie said: One approach would be to define what can be placed before and after a name Not possible sorry bro It's like to ask people to say things always at same place in their mails. Which is not possible to ask. 5 hours ago, AspirinJunkie said: If you say: A name can only have a beginning of a line or a lower case word in front of it and a word in capital letters can not appear again behind the name then you could solve this in the following way: But this, is possible and good practice for me. 5 hours ago, genius257 said: here you go: (?mx)(?!\QBLOG\E|\QSHOPIFY\E)\b(?:[A-Z][a-zA-Z]{3,}(?:-[A-Z][a-zA-Z]{3,})?)(?(?=\s[A-Z][a-zA-Z]{3,}(?:\s|$))\s[A-Z][a-zA-Z]+|)\b Bro I'm sorry I thought it was logic but seem like it isn't. BLOG DE SHOPIFY Can change "TO ANYTHING" Just want to avoid the selection of randoms capitals letters in my $String. Sry, I should have precise it. I edited the topic. Edited September 29, 2020 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
genius257 Posted September 29, 2020 Share Posted September 29, 2020 Hi @caramen 13 minutes ago, caramen said: Is there a way to make groups and don't see the precedent groups to edit the pattern more easy? I would suggest using named capturing groups, like @AspirinJunkie did. Example: (?xm)(?(DEFINE) (?<FirstName>[A-Z][a-z]+) (?<LastName>[A-Z][a-z]+) (?<FullName>\b(?&FirstName)[ ](?&LastName)\b) ) (?&FullName) It gives you much needed overview, when making big regular expressions. 16 minutes ago, caramen said: Bro I'm sorry I thought it was logic but seem like it isn't. BLOG DE SHOPIFY Can change "TO ANYTHING" Just want to avoid the selection of randoms capitals letters in my $String. Yeah i was afraid of that, just took on the problem for fun according to your initial specifications for the fun of it. If ANYTHING can be the string in capital letters you want to avoid, It is close to impossible unless you make a over complicated regex or post process function to validate the captures as valid names, or you need to be able to control the data and format it like @AspirinJunkie said: 6 hours ago, AspirinJunkie said: If you say: A name can only have a beginning of a line or a lower case word in front of it and a word in capital letters can not appear again behind the name Anyway my two cents. Hope you get your problem solved My highlighted topics: AutoIt Package Manager, AutoItObject Pure AutoIt, AutoIt extension for Visual Studio Code Github: AutoIt HTTP Server, AutoIt HTML Parser Link to comment Share on other sites More sharing options...
caramen Posted September 29, 2020 Author Share Posted September 29, 2020 1 hour ago, genius257 said: or post process function to validate the captures as valid names It's in my TODO 🤣 I was gonna do it anyway. Even with a very strong pattern. I'll check with AD udf. My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
caramen Posted September 29, 2020 Author Share Posted September 29, 2020 (edited) 1 hour ago, genius257 said: It is close to impossible I agree with that, I was feeling to calculate an infinite number, while I was making my pattern for "that purpose". RegEx is crazy thing. Edit: aw sry double post (Tired) Was thinking I was editing. Edited September 29, 2020 by caramen My video tutorials : ( In construction ) || My Discord : https://discord.gg/S9AnwHw How to Ask Help || UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote Spoiler Water's UDFs:Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - WikiOutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - WikiExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example ScriptsPowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & SupportExcel - Example Scripts - WikiWord - Wiki Tutorials:ADO - Wiki Link to comment Share on other sites More sharing options...
jchd Posted September 29, 2020 Share Posted September 29, 2020 3 hours ago, caramen said: RegEx is crazy thing. No, it's a full-fledge language of its own. Only the grammar and syntax are a bit ususual. mikell 1 This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
mikell Posted September 29, 2020 Share Posted September 29, 2020 This "a bit" is lovely Link to comment Share on other sites More sharing options...
JockoDundee Posted September 29, 2020 Share Posted September 29, 2020 What is the exact challenge? Just one RegEx, or anything to solve? And you know it’s not a name when there is not any Upper-lower case strings adjacent? Code hard, but don’t hard code... Link to comment Share on other sites More sharing options...
seadoggie01 Posted September 30, 2020 Share Posted September 30, 2020 (edited) My best attempt? (?xm)(?(DEFINE) (?<FirstN>[A-Z][a-z]+) (?<FirstL>[a-z]+) (?<FirstU>[A-Z]+) (?<LastN>[A-Z][a-z]+) (?<LastL>[a-z]+) (?<LastU>[A-Z]+) (?<FullName> \b(?&FirstN)[ -](?&LastN)\b| \b(?&FirstN)[ -](?&LastL)\b| \b(?&FirstN)[ -](?&LastU)\b| \b(?&FirstL)[ -](?&LastN)\b| \b(?&FirstL)[ -](?&LastL)\b| \b(?&FirstL)[ -](?&LastU)\b| \b(?&FirstU)[ -](?&LastN)\b| \b(?&FirstU)[ -](?&LastL)\b #|\b(?&FirstU)[ -](?&LastU)\b ) ) (?&FullName) Creates first and last name group with uppercase, lowercase, and "normal" case. Full name is a giant alternative of the possible combinations (I commented out double uppercase) The é breaks it, but I imagine there's a way to fix that. I don't deal with strange Unicode characters, I delete them (Edit: This is basically an extension of genius257's RegEx, so mostly I copy pasted) Edited September 30, 2020 by seadoggie01 caramen 1 All my code provided is Public Domain... but it may not work. Use it, change it, break it, whatever you want. Spoiler My Humble Contributions:Personal Function Documentation - A personal HelpFile for your functionsAcro.au3 UDF - Automating Acrobat ProToDo Finder - Find #ToDo: lines in your scriptsUI-SimpleWrappers UDF - Use UI Automation more Simply-erKeePass UDF - Automate KeePass, a password managerInputBoxes - Simple Input boxes for various variable types Link to comment Share on other sites More sharing options...
jchd Posted September 30, 2020 Share Posted September 30, 2020 Adding (*UCP) at the beginning of the pattern allows many more filtering possibilities. See StringRegExp() help. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now