Jump to content

RegEx Challenge


caramen
 Share

Recommended Posts

Well I'm trying to improve myself days after days lol. 

 

StringRegExp is very powerfull, but I'm stuck with this*. 

It would be hard to put all my attempts here. I did regex something like 4 hours. And this is my best match

$TextToTest :

Jordane-Durand Guevara-Lolito
bllabalbalabalbalbabalablalab
Jordane-Durand Guevara-LOLITO
bllabalbalabalbalbabalablalab
Jordane-Durand GUEVARA-Lolito
bllabalbalabalbalbabalablalab
Jordane-DURAND Guevara-Lolito
bllabalbalabalbalbabalablalab
JORDANE-Durand Guevara-Lolito
bllabalbalabalbalbabalablalab
Jordane GUEVARA
bllabalbalabalbalbabalablalab
JORDANE Guevara
bllabalbalabalbalbabalablalab
Jordane Guevara
bllabalbalabalbalbabalablalab
BLOG DE SHOPIFY
bllabalbalabalbalbabalablalab
GUEVARA
Mme. Le-mana. Je suis joignable sans problème
bllabalbalabalbalbabalablalab
vous pouvez également demander Durand-Michel Marc ou Levis PLOUQUEZ

This is my best match : 

$MyPattern

(([A-Z][a-zA-Z]+)[-]([A-Z][a-zA-Z]+)|([A-Z][a-zA-Z]+)) (([A-Z][a-zA-Z]+)[-]([A-Z][a-zA-Z]+)|([A-Z][a-zA-Z]+))

image.png.8c5e1a10183e387068bfa036b75304fa.png

 

The goal is to have all name and surname here but without BLOG DE SHOPIFY

Anyone had better pattern for name and surname ? 

 

A LATE edition : To add BLOG DE SHOPIFY Can change "TO ANYTHING" Just want to avoid the selection of randoms capitals letters. 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

It is a bit difficult to find a distinguishing feature that separates BLOG DE SHOPIFY from a name by your definition.
One approach would be to define what can be placed before and after a name. If you say: A name can only have a beginning of a line or a lower case word in front of it and a word in capital letters can not appear again behind the name then you could solve this in the following way:

#include <Array.au3>

$sString = ClipGet()

$sPattern = '(?xms)' & @CRLF & _
    '(?xs) (?> ^ | \b[[:lower:]]+ ) \h* \K  # positive list of what may appear before the name' & @CRLF & _
    '( \b [[:upper:]]  [[:alpha:]]+ \b (?>-\b [[:upper:]]  [[:alpha:]]+ \b)? \h \b [[:upper:]]  [[:alpha:]]+ \b (?:-\b [[:upper:]]  [[:alpha:]]+ \b)? )' & @CRLF & _
    '(?!\h* \b [[:upper:]]  [[:alpha:]]+ \b (?>-\b [[:upper:]]  [[:alpha:]]+ \b)?)'


$aNames = StringRegExp($sString, $sPattern, 3)
_ArrayDisplay($aNames, "names found", "", 64)

 

Link to comment
Share on other sites

5 minutes ago, AspirinJunkie said:

 

#include <Array.au3>

$sString = ClipGet()

$sPattern = '(?xms)' & @CRLF & _
    '(?xs) (?> ^ | \b[[:lower:]]+ ) \h* \K  # positive list of what may appear before the name' & @CRLF & _
    '( \b [[:upper:]]  [[:alpha:]]+ \b (?>-\b [[:upper:]]  [[:alpha:]]+ \b)? \h \b [[:upper:]]  [[:alpha:]]+ \b (?:-\b [[:upper:]]  [[:alpha:]]+ \b)? )' & @CRLF & _
    '(?!\h* \b [[:upper:]]  [[:alpha:]]+ \b (?>-\b [[:upper:]]  [[:alpha:]]+ \b)?)'


$aNames = StringRegExp($sString, $sPattern, 3)
_ArrayDisplay($aNames, "names found", "", 64)

How many aspirins did that require 😄

 

Link to comment
Share on other sites

12 minutes ago, dmob said:

How many aspirins did that require 😄

Not as many as one might think at first.
The pattern is a variation of this more clear:

(?xs)
((?(DEFINE)
   (?<NamePart> \b [[:upper:]]  [[:alpha:]]+ \b)
   (?<Name> (?&NamePart)  (?>-(?&NamePart))?  )
   (?<FullName> 
          (?> ^ | \b[[:lower:]]+ ) \h* \K  # positive list of what may appear before the name
          (?&Name) \h (?&Name) 
          (?!\h* (?&Name))
    )
) (?&FullName) )

Only because of the problem that I couldn't squeeze the matches into group 1 (and thus $STR_REGEXPARRAYGLOBALMATCH was not sufficient) I rewrote the pattern accordingly.

Link to comment
Share on other sites

here you go:

(?mx)(?!\QBLOG\E|\QSHOPIFY\E)\b(?:[A-Z][a-zA-Z]{3,}(?:-[A-Z][a-zA-Z]{3,})?)(?(?=\s[A-Z][a-zA-Z]{3,}(?:\s|$))\s[A-Z][a-zA-Z]+|)\b

 

Edited by genius257
added flags
Link to comment
Share on other sites

5 hours ago, AspirinJunkie said:

How many aspirins did that require

By the way, yesterday was my first total 4 hours in a row of trying regex without interruption. 

This is a real effect. It is the first time I don't have enough room in my head to think about "all" what I'm typing. I don't know if I'm clear enough by saying that. 

Is there a way to make groups and don't see the precedent groups to edit the pattern more easy? 

I mean If I want to ADD a condition after made my first attempt it's kinda F*** HARD to do that and think about the total pattern. First time my head was not able to do something not because the technical knowledge issue but because of the complexity to think about it. I need a trick for that... 

And I know the good practice... would say you need to see all your pattern to see what you're doing but... It's not intuitive at all… 

Well Thank for all these answer I'm gonna read, try, try to understand and answer.

 

  

5 hours ago, AspirinJunkie said:

One approach would be to define what can be placed before and after a name

Not possible sorry bro :( It's like to ask people to say things always at same place in their mails. Which is not possible to ask. 

  

5 hours ago, AspirinJunkie said:

If you say: A name can only have a beginning of a line or a lower case word in front of it and a word in capital letters can not appear again behind the name then you could solve this in the following way:

But this, is possible and good practice for me. 

 

 

  

5 hours ago, genius257 said:

here you go:

(?mx)(?!\QBLOG\E|\QSHOPIFY\E)\b(?:[A-Z][a-zA-Z]{3,}(?:-[A-Z][a-zA-Z]{3,})?)(?(?=\s[A-Z][a-zA-Z]{3,}(?:\s|$))\s[A-Z][a-zA-Z]+|)\b

 

Bro I'm sorry I thought it was logic but seem like it isn't. BLOG DE SHOPIFY Can change "TO ANYTHING" Just want to avoid the selection of randoms capitals letters in my $String. 

Sry, I should have precise it. I edited the topic.

 

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

Hi @caramen

13 minutes ago, caramen said:

Is there a way to make groups and don't see the precedent groups to edit the pattern more easy? 

I would suggest using named capturing groups, like @AspirinJunkie did.

Example:

(?xm)(?(DEFINE)
  (?<FirstName>[A-Z][a-z]+)
  (?<LastName>[A-Z][a-z]+)
  (?<FullName>\b(?&FirstName)[ ](?&LastName)\b)
)
(?&FullName)

It gives you much needed overview, when making big regular expressions.

16 minutes ago, caramen said:

Bro I'm sorry I thought it was logic but seem like it isn't. BLOG DE SHOPIFY Can change "TO ANYTHING" Just want to avoid the selection of randoms capitals letters in my $String. 

Yeah i was afraid of that, just took on the problem for fun according to your initial specifications for the fun of it.

If ANYTHING can be the string in capital letters you want to avoid, It is close to impossible unless you make a over complicated regex or post process function to validate the captures as valid names, or you need to be able to control the data and format it like @AspirinJunkie said:

6 hours ago, AspirinJunkie said:

If you say: A name can only have a beginning of a line or a lower case word in front of it and a word in capital letters can not appear again behind the name

 

Anyway my two cents.

Hope you get your problem solved :)

Link to comment
Share on other sites

1 hour ago, genius257 said:

or post process function to validate the captures as valid names

It's in my TODO 🤣 I was gonna do it anyway. Even with a very strong pattern. I'll check with AD udf.

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

1 hour ago, genius257 said:

It is close to impossible

I agree with that, I was feeling to calculate an infinite number, while I was making my pattern for "that purpose".

RegEx is crazy thing.

Edit: aw sry double post (Tired) Was thinking I was editing.

Edited by caramen

My video tutorials : ( In construction )  || My Discord : https://discord.gg/S9AnwHw

How to Ask Help ||  UIAutomation From Junkew || WebDriver From Danp2 || And Water's UDFs in the Quote

Spoiler

 Water's UDFs:
Active Directory (NEW 2018-10-19 - Version 1.4.10.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Link to comment
Share on other sites

3 hours ago, caramen said:

RegEx is crazy thing.

No, it's a full-fledge language of its own. Only the grammar and syntax are a bit ususual.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

My best attempt?

(?xm)(?(DEFINE)
  (?<FirstN>[A-Z][a-z]+)
  (?<FirstL>[a-z]+)
  (?<FirstU>[A-Z]+)
  (?<LastN>[A-Z][a-z]+)
  (?<LastL>[a-z]+)
  (?<LastU>[A-Z]+)
  (?<FullName>
    \b(?&FirstN)[ -](?&LastN)\b|
    \b(?&FirstN)[ -](?&LastL)\b|
    \b(?&FirstN)[ -](?&LastU)\b|
    \b(?&FirstL)[ -](?&LastN)\b|
    \b(?&FirstL)[ -](?&LastL)\b|
    \b(?&FirstL)[ -](?&LastU)\b|
    \b(?&FirstU)[ -](?&LastN)\b|
    \b(?&FirstU)[ -](?&LastL)\b
    #|\b(?&FirstU)[ -](?&LastU)\b
    )
  )
(?&FullName)

Creates first and last name group with uppercase, lowercase, and "normal" case. Full name is a giant alternative of the possible combinations (I commented out double uppercase)

The é breaks it, but I imagine there's a way to fix that. I don't deal with strange Unicode characters, I delete them :D

(Edit: This is basically an extension of genius257's RegEx, so mostly I copy pasted)

Edited by seadoggie01

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

Adding (*UCP) at the beginning of the pattern allows many more filtering possibilities. See StringRegExp() help.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...