Jump to content

Recommended Posts

Posted (edited)

just to be clear I personally do not care because I personally never use the "(?m)ect ect" and "[ectect]" even the reporting of (*ANYCRLF) was more general (so I always use n, never [rn] or R or anything like that), because regexp has its flaws if we hold to say so, so always need to find compromises, example

StringRegExpReplace(StringRegExpReplace($sData, "\x{2028}+", ""), "\x{2029}+", "")

is almost 300% or 400% faster the

StringRegExpReplace($sData, "[\x{2028}\x{2029}]+", "")

I mean to take always the best you need always to do compromises ehhh, but doing some tests in ASCII file, I notice that there no difference in performance between "ect ect" and "(*UCP)ect ect" indeed "(*UCP)ect ect" seems that it should be better performance perhaps it would be better to restore the default UCP (I would not say wrong, but I would say a bit premature, the choosing not to use the unicode by default), and also because is more compatible with all the AutoIt function, which all support unicode ect ect

sorry again for my english

Ciao a tutti.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Posted (edited)

@DXRW4E,

Flat ASCII regexp are on par up to 20% faster than (*UCP) runs. It all depends on the pattern use of UCP features. Testing on a significant ASCII text file (39 Mb) I got that:

Local $s = FileRead("binarbug.txt")

Local $p = "\bdest(uvgot)?"
Local $t = TimerInit()
StringRegExp($s, $p, 3)
ConsoleWrite("      " & $p & " --> " & TimerDiff($t) & @LF)
$p = "(*UCP)" & $p
$t = TimerInit()
StringRegExp($s, $p, 3)
ConsoleWrite($p & " --> " & TimerDiff($t) & @LF)

$p = "(?i)\bdest(uvgot)?"
$t = TimerInit()
StringRegExp($s, $p, 3)
ConsoleWrite("      " & $p & " --> " & TimerDiff($t) & @LF)
$p = "(*UCP)" & $p
$t = TimerInit()
StringRegExp($s, $p, 3)
ConsoleWrite($p & " --> " & TimerDiff($t) & @LF)

$p = "(?i)\w{2,5}s(\w{3,}got)?"
$t = TimerInit()
StringRegExp($s, $p, 3)
ConsoleWrite("      " & $p & " --> " & TimerDiff($t) & @LF)
$p = "(*UCP)" & $p
$t = TimerInit()
StringRegExp($s, $p, 3)
ConsoleWrite($p & " --> " & TimerDiff($t) & @LF)

$t = TimerInit()
StringRegExpReplace(StringRegExpReplace($s, "\x{2028}+", ""), "\x{2029}+", "")
ConsoleWrite("Dual Replace  --> " & TimerDiff($t) & @LF)
$t = TimerInit()
StringRegExpReplace($s, "[\x{2028}-\x{2029}]+", "")
ConsoleWrite("Range Replace --> " & TimerDiff($t) & @LF)
$t = TimerInit()
StringRegExpReplace($s, "[\x{2028}\x{2029}]+", "")
ConsoleWrite("Class Replace --> " & TimerDiff($t) & @LF)
$t = TimerInit()
StringRegExpReplace($s, "(\x{2028}|\x{2029})+", "")
ConsoleWrite("Alternation   --> " & TimerDiff($t) & @LF)
\bdest(uvgot)? --> 217.114116458935
(*UCP)\bdest(uvgot)? --> 222.978828314772
      (?i)\bdest(uvgot)? --> 320.376262904922
(*UCP)(?i)\bdest(uvgot)? --> 320.728751838572
      (?i)\w{2,5}s(\w{3,}got)? --> 8658.34212169424
(*UCP)(?i)\w{2,5}s(\w{3,}got)? --> 10425.5265683208
Dual Replace  --> 837.210595201345
Range Replace --> 5099.80598092774
Class Replace --> 5196.39772652669
Alternation   --> 8383.2922201006

Yo're right to point out that classes are a slow feature but they are also a fundamental building block of regexps. It's hard to avoid them in practice. Also PCRE is very good in front of many other engines.

Your specific use case in this comparison (single vs. dual Replace) doesn't do justice to the engine. You know that a single character pattern is using a single CPU instruction, making full use of microcode and internal architecture (pipelines and read-ahead). Obviously a class or alternation can't enjoy the same low-level feature.

In most real-world use, the pattern is likely to be more involved -- or involve more sophisticated operations. Then the comparison doesn't hold anymore.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted (edited)

Hi jchd, with (just to be clear I personally do not care) I mean just that, that is not my personal problem, and if is i am already able to solve it, even if ASCII default for me everything is OK, but I do not think is so that for all users, how to say not all users have the knowledge that we have in RegExp ehhh, The example above was just to say that no one will have to wait for a default setting of autoit can make the miracle ehhh (need to do testcompromises yourself to get the maximum performance of a function ect ect), however as mentioned above (*UCP) mod is more compatible with all the AutoIt function, which all support unicode ect ect

is difficult to see the speed of the regexp when you put them pattern complicated because in that point I did not understand more about the cause of the delay is an infinite loop or other ect ect ehhhh, is important to know in a file ascii that is used in the majorityof cases, the (*UCP) delay or not, in my opinion No, indeed it seems that sometimes goes better (after if one wants to use a complicated pattern ehh you know cost much time, as all example above just change a point and after the time required is multiplied many times ehhhh)

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Posted (edited)

While I fully understand your points, the rationale behind going back (remember I was the one who firstly lobbyied for (*UCP) being default) is that it heavily changes the meaning of b B w W and a number of other common escapes and routinely used POSIX classes.

Speed gain is only a very marginal side-effect: we gained much, much more by updating to v8.33 and, mostly, by using the PCRE16 interfaces which allowed to get rid of cumbersome conversions and painful low-level loops between "natural" UTF-16 subject and pattern AutoIt strings and previous internal UTF-8 encoding and interface that was used before. Also don't forget that for short strings, the pattern compilation time often dominates actual regexp run time.

You, I and a good number of others know how to navigate thru options and use the regexp tool with reasonable efficiency.  But we are not yet at the point where Unicode is the absolute lingua franca inside AutoIt and accompanying tools. It will take time and we (I say we, but I should say "the DEV team" where I don't belong) need to regard backward compatibility as something important. Ideally, every .au3 source should be UTF8 or UTF16, tools should only process that (and convert on the fly if not), and every function should make use of Unicode advantages to the maximum extend. Again I'm not complaining, but simply looking forward.

You surely know that Unicode doesn't solve all the issue with complexities of the human languages and traditions, yet it represents a fairly good step forward compared to ANSI-hell era. The IT community at large can't just let tech-savvy (Unicode-savvy) people dictate their way without taking in consideration the zillions of lines of code relying on ambiguous untold encodings. Guiding casual users towards better practices is our duty but that will still today require ample time, even after 20+ years of Unicode existence.

My humble contribution by revising the regexp help doc aimed at just that: offer clearer (I hope) and more complete (I'm sure) information about a tool which is largely underused -- and I don't say AutoIt should mimic Perl and do everything with regexps. Just by looking at some large and complex Example Scripts applications, we start to see the need for internationalization, or at least regard to non purely 7-bit ASCII contexts. I routinely see here more complex or careful regexp patterns than I saw, say, 5 years behind.

I now believe we've discussed these points clearly enough in this thread.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted

Not to break up the regexp monopoly on discussion... but can anyone else test / take a look here:

I'm getting a crash now with Ward's base64 machine code functions, x64 version, specifically in the _Base64EncodeEnd function. This is the first AutoIt beta in this series I've tested with them, but I've never had a crash before. So something changed in AutoIt that either created a new bug, or brought to light a bug in Ward's code.

  • 4 weeks later...
Posted

I believe there is some bug in latest beta, I can't say what's that exactly.

In one of my scripts, when I compile them with 3.3.8.1, it runs well on both 32-bit and 64-bit windows, but with the latest beta, it doesn't work on 64-bit windows.

I will post again, when I found what's the matter exactly. For now I guess that's something with DllCall(), some problem with calling Windows API on 64-bit windows maybe, just a guess.

  • Moderators
Posted

So, you want to post a bug, but don't know what it is. And you don't think it is beneficial to post your code, so you have as many sets of eyes looking at it as possible. If you aren't going to add any information, why bother posting at all??

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Posted (edited)

Is happened to me, need only compile again the exe, it seems that when it changes the build v3.3.8.1-v3.3.9.21 the first shot is not OK hmmmmmm (you will immediately notice as exe and less about 40-50 KB) , if you compile again after is all OK

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Posted

As Melba23 said in post #8:

"As always, there will be a release when Jon thinks it is ready. :)"

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Posted

Couple of weird things that have been happening lately since i started using this version

When AutoIt opens it opens like this, sometimes i get the slim part for a fraction of a second and then it continues to open but a lot i just get this and have to maximize and it happens from either from the start menu or by opening a script with double click

post-60350-0-90256900-1382167370_thumb.p

AutoIt3Wrapper v.2.1.2.9    Environment(Language:0409  Keyboard:00000809  OS:WIN_7/Service Pack 1  CPU:X64 OS:X64)

(3.3.9.21)

and everytime i use Tidy i get this

post-60350-0-37850400-1382167376_thumb.p

Tidy AutoIt3 v2.3.0.8   Copyright © Jos van der Zande  March 24, 2013

These may be known but i thought i would mention

 

  • Developers
Posted (edited)

Couple of weird things that have been happening lately since i started using this version

All reported items have a ralation with SciTE4autoit3 and not related to the autoit3 beta.

Use the latest beta versions of all the scite4autoit3 tools or current beta installer and test again.

Let me know which issues still remain.

Jos

Edited by Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Posted

Used this

SciTE4AutoIt3.exe 19-Oct-2013 16:57 6.0M

 

and the tidy issue seems to have disappeared and the first time i opened scite it was still thin so i maxed it and exited properly and now you get a brief thin scite as it loads which then maxes as it should

Its kinda like a fleeting glance before the window loads, ill keep an eye on it

Thx for the help

  • 2 weeks later...
Posted (edited)

I was just thinking about this change made to arrays in >AutoIt v3.3.9.6 Beta. I haven't tried anything, but it occured to me that the change may break a massive number of scripts.

- Added: Empty arrays.

I may be interpreting this incorrectly, in which case please tell me I'm wrong. I am assuming that empty arrays with zero dimensions can now be created. If this is the case, then many error checks which rely on IsArray() and Ubound() will fail. It could be that I misread something somewhere down the line, or (more likely) someone used the term zero dimensions when they meant to say zero elements - a common mistake. An array with zero dimensions is theoretically meaningless. I hope I'm wrong in my assumptions. Perhaps someone could put my mind at ease - tell me I'm wrong.

Edited by czardas
Posted

You're not completely wrong, IsArray will detect an empty array as an array. UBound($array) will return 0 though, which is more reliable in detecting whether the array actually has something in it.

Don't misunderstand though, empty arrays could always be encountered in AutoIt from the return from certain functions, DLLCall I believe was one of them, but the change is that now you can create them natively in AutoIt.

Any script that this would break, were already broken previous to this.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Posted (edited)

Thanks BrewManNH. Looking more closely, it appears empty arrays have a minimum of greater than zero dimensions - which is a relief to me. The old version of Ubound() returned zero when an error occured. Now zero is a valid return value, apart from here ==> Ubound($aArray, 0).

Returning zero as a valid element count may still break well written code. I need to run some tests.

Edited by czardas

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...