jerem488 Posted April 8, 2010 Posted April 8, 2010 Hello It's very frustrating for me not knowing how to use the StringRegExp function !!!!!!!!!!!!!!!! I am doing many test, but inconclusive... I have this text expandcollapse popupSIRET : 390 019 891 00014 Effectif : 2 Voie : Les Bouchats Code postal : 71370 Ville : Saint Etienne En Bresse Téléphone : +33 3 85 96 40 00 Activité en clair : Exploitation De Biens Agricole Dénomination : Acle Forme juridique : SARL (Société à Responsabilité Limitée) SIRET : 448 026 153 00024 Voie : Le Gros Chigy Code postal : 71220 Ville : Saint Andre Le Desert Téléphone : +33 3 85 59 48 71 Activité en clair : Platrerie Peinture Pose Revetement De Sols Et Murs Plafonds Divers Travaux De Renovation Dénomination : Galimi Samuel Forme juridique : EI (Entreprise Individuelle) SIRET : 489 314 963 00013 Effectif : 5 Voie : 30 Rue Bernard Renault Code postal : 71400 Ville : Autun Téléphone : +33 3 85 52 45 93 Activité en clair : Platrerie Peinture Pour Le Batiment Travaux D'Isolation Ravalement De Facades Revetements Sols Et Murs Negoce De Produits Lies Au Batiment Dénomination : Garcia Forme juridique : SARL (Société à Responsabilité Limitée) SIRET : 340 904 416 00013 Effectif : 2 Voie : 40 Rue De Creteuil Le Bas Code postal : 71150 Ville : Chaudenay Téléphone : +33 3 85 87 32 04 Activité en clair : Peinture Pose Revetem. Sols Murs Pose Plaq De Platre Nettoyage Vehic. Locaux Tissus Stores Tendues Demoussage & Protection De Surface Transp Meubles Dénomination : Garnaud Daniel Forme juridique : EI (Entreprise Individuelle) And I want read each paragraph (between SIRET and the last line of each paragraph (Forme juridique), and in each paragraph read the line "Code postal and test the number (71150, 71300, ...) with a list of numbers If the number is not in the list, i want delte the paragraph, and to do this, i must use the function StringRegExp and StringRegExpReplace but I don't know how to proceed Qui ose gagneWho Dares Win[left]CyberExploit[/left]
dantay9 Posted April 8, 2010 Posted April 8, 2010 (edited) I could give you the answer, but that wouldn't be much of a learning experience for you. If you are having trouble specifically with formatting the patterns, here is an excellent tool (free) that helps me create regex patterns. I hope this helps you out. It really helped me out. Edited April 8, 2010 by dantay9
Fulano Posted April 8, 2010 Posted April 8, 2010 (edited) You are going to want to split things up a bit to make things easier. I agree with dantay9, learning it is much better. That being said, debugging more than one thing at a time is a pain, so I made a framework for you to test in. expandcollapse popupGlobal $REGULAR_Expression = '' ;This is just formatting so I can test with it. I used CRLF because that's what your data appeared ;to have, it's possible that the website had something to do with it Local $LogText = "SIRET : 390 019 891 00014" & @CRLF & _ "Effectif : 2" & @CRLF & _ "Voie : Les Bouchats" & @CRLF & _ "Code postal : 71370" & @CRLF & _ "Ville : Saint Etienne En Bresse" & @CRLF & _ "Téléphone : +33 3 85 96 40 00 " & @CRLF & _ "Activité en clair : Exploitation De Biens Agricole " & @CRLF & _ "Dénomination : Acle" & @CRLF & _ "Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _ @CRLF & _ "SIRET : 448 026 153 00024" & @CRLF & _ "Voie : Le Gros Chigy" & @CRLF & _ "Code postal : 71220" & @CRLF & _ "Ville : Saint Andre Le Desert" & @CRLF & _ "Téléphone : +33 3 85 59 48 71 " & @CRLF & _ "Activité en clair : Platrerie Peinture Pose Revetement De Sols Et Murs Plafonds Divers Travaux De Renovation " & @CRLF & _ "Dénomination : Galimi Samuel" & @CRLF & _ "Forme juridique : EI (Entreprise Individuelle)" & @CRLF & _ @CRLF & _ "SIRET : 489 314 963 00013" & @CRLF & _ "Effectif : 5" & @CRLF & _ "Voie : 30 Rue Bernard Renault" & @CRLF & _ "Code postal : 71400" & @CRLF & _ "Ville : Autun" & @CRLF & _ "Téléphone : +33 3 85 52 45 93 " & @CRLF & _ "Activité en clair : Platrerie Peinture Pour Le Batiment Travaux D'Isolation Ravalement De Facades Revetements Sols Et Murs Negoce De Produits Lies Au Batiment " & @CRLF & _ "Dénomination : Garcia" & @CRLF & _ "Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _ @CRLF & _ "SIRET : 340 904 416 00013" & @CRLF & _ "Effectif : 2" & @CRLF & _ "Voie : 40 Rue De Creteuil Le Bas" & @CRLF & _ "Code postal : 71150" & @CRLF & _ "Ville : Chaudenay" & @CRLF & _ "Téléphone : +33 3 85 87 32 04 " & @CRLF & _ "Activité en clair : Peinture Pose Revetem. Sols Murs Pose Plaq De Platre Nettoyage Vehic. Locaux Tissus Stores Tendues Demoussage & Protection De Surface Transp Meubles " & @CRLF & _ "Dénomination : Garnaud Daniel" & @CRLF & _ "Forme juridique : EI (Entreprise Individuelle)" MsgBox (0, "Results:", ParseText ($LogText)) Func ParseText ($text) Local $ZipArray[2] = [71150, 71400] Local $LogArray = StringSplit ($text, @CRLF & @CRLF, 1) For $index = 1 to $LogArray[0] Local $zipCode = StringRegExp ($LogArray[$index], $REGULAR_Expression, 1) If Not InList ($zipCode[0], $ZipArray) Then $LogArray[$index] = "" EndIf Next Local $ReturnString = "" For $index = 1 to $LogArray[0] If $LogArray[$index] then $ReturnString &= $LogArray[$index] & @CRLF & @CRLF Next Return $ReturnString EndFunc Func InList ($zip_code, $compare_array) For $code in $compare_array If $code = $zip_code Then Return True Next Return False EndFunc Basically it splits things into paragraphs so you only have to work on one at a time. Good luck Edited April 8, 2010 by Fulano #fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!
jchd Posted April 8, 2010 Posted April 8, 2010 (edited) Salut compatriote, Try to go with something like: $str = FileRead("boites.txt") ;; beware, there are still html character codes inside, like & Local $ofs, $res While 1 $res = StringRegExp($str, "(?is)SIRET : .*?Code postal : (\d{5}).*?juridique.*?\r\n", 2, $ofs) If @error Then ExitLoop $ofs = @extended If CPbonPourTraitement($res[1]) Then OnEnvoitLeSpam($res[0]) WEnd Is that clear? Edited April 8, 2010 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Malkey Posted April 8, 2010 Posted April 8, 2010 (edited) And another example.expandcollapse popupLocal $LogText = "SIRET : 390 019 891 00014" & @CRLF & _ "Effectif : 2" & @CRLF & _ "Voie : Les Bouchats" & @CRLF & _ "Code postal : 71370" & @CRLF & _ "Ville : Saint Etienne En Bresse" & @CRLF & _ "Téléphone : +33 3 85 96 40 00 " & @CRLF & _ "Activité en clair : Exploitation De Biens Agricole " & @CRLF & _ "Dénomination : Acle" & @CRLF & _ "Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _ @CRLF & _ "SIRET : 448 026 153 00024" & @CRLF & _ "Voie : Le Gros Chigy" & @CRLF & _ "Code postal : 71220" & @CRLF & _ "Ville : Saint Andre Le Desert" & @CRLF & _ "Téléphone : +33 3 85 59 48 71 " & @CRLF & _ "Activité en clair : Platrerie Peinture Pose Revetement De Sols Et Murs Plafonds Divers Travaux De Renovation " & @CRLF & _ "Dénomination : Galimi Samuel" & @CRLF & _ "Forme juridique : EI (Entreprise Individuelle)" & @CRLF & _ @CRLF & _ "SIRET : 489 314 963 00013" & @CRLF & _ "Effectif : 5" & @CRLF & _ "Voie : 30 Rue Bernard Renault" & @CRLF & _ "Code postal : 71400" & @CRLF & _ "Ville : Autun" & @CRLF & _ "Téléphone : +33 3 85 52 45 93 " & @CRLF & _ "Activité en clair : Platrerie Peinture Pour Le Batiment Travaux D'Isolation Ravalement De Facades Revetements Sols Et Murs Negoce De Produits Lies Au Batiment " & @CRLF & _ "Dénomination : Garcia" & @CRLF & _ "Forme juridique : SARL (Société à Responsabilité Limitée)" & @CRLF & _ @CRLF & _ "SIRET : 340 904 416 00013" & @CRLF & _ "Effectif : 2" & @CRLF & _ "Voie : 40 Rue De Creteuil Le Bas" & @CRLF & _ "Code postal : 71150" & @CRLF & _ "Ville : Chaudenay" & @CRLF & _ "Téléphone : +33 3 85 87 32 04 " & @CRLF & _ "Activité en clair : Peinture Pose Revetem. Sols Murs Pose Plaq De Platre Nettoyage Vehic. Locaux Tissus Stores Tendues Demoussage & Protection De Surface Transp Meubles " & @CRLF & _ "Dénomination : Garnaud Daniel" & @CRLF & _ "Forme juridique : EI (Entreprise Individuelle)" ;Or ;Local $LogText = FileRead("LogText.txt") Local $sRetString, $aPara, $sCheckREPattern Local $sCheckNums = "71150, 71300, 71220" $sCheckREPattern = StringReplace(StringStripWS($sCheckNums, 8), ",", "|") $aPara = StringSplit($LogText, @CRLF & @CRLF, 1) For $i = 1 To $aPara[0] If StringRegExp($aPara[$i], "(?i)code\h*postal\h*:\h*" & $sCheckREPattern) Then $sRetString &= $aPara[$i] & @CRLF & @CRLF Next ConsoleWrite($sRetString & @CRLF) Edited April 9, 2010 by Malkey
jchd Posted April 8, 2010 Posted April 8, 2010 Hi, Malkey. Your code will raise false positives if ever it matches a wanted ZIP code inside the last part of Siret number, or if the phone number is formated differently. Quite still possible with low "region code" (first 2 digits of Zipcode, from 01..95 and 97..98) and high "establishment" number (last 5 digits of SIRET #), who knows? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)
Malkey Posted April 9, 2010 Posted April 9, 2010 jchd With reference to my example script @ Post #5 :- Within the For-Next loop, I have changed the RE pattern in the StringRegExp() from $sCheckREPattern to "(?i)code\h*postal\h*:\h*" & $sCheckREPattern This specifically targets the line "Code postal : nnnnn" in each paragraph. So now false positive matches should not occur from matching the numeric postcode only from other locations in the paragraph. Malkey
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now