Jump to content

Remove symbols/emoticons and other junk from a string with StringRegExpReplace


Recommended Posts

Hi!

Started using AutoIt 3 days ago and now I'm working on a screenshot script for a 5:4 display which can take a screenshot without the black borders (for example from a fullscreen youtube video) and turn the title of the active window into a filename. My problem is that some guys used to fill the video title with emoticons and other junk.. I want to remove those but don't know how to define them within StringRegExpReplace. I want to learn how to do it.

 

This topic helped me quite a lot but I still don't get the important part.

Sections I want to remove from the symbol list:

  • Miscellaneous Symbols and Pictographs
  • Emoticons
  • Supplemental Symbols and Pictographs
  • (The ability to remove more if necessary)

(Somehow I can't remove the "Youtube" "Mozilla Firefox" "Google Chrome" parts  with StringRegExpReplace either..)

 

Here's the code:
 

#include <ScreenCapture.au3>
#include <Date.au3>
#include <MsgBoxConstants.au3>

HotKeySet("{PRINTSCREEN}", "prtSc")
HotKeySet("{ESC}", "leave")

Local $sString = StringReplace(_NowDate(), "/", "-")

While 1
    Sleep(100)
WEnd

Func prtSc()
    Local $nonstandard = "[\x10-\x1F\x21-\x2F\x3A-\x40\x5B-\x60\x80-\xFF]"
    Local $title = WinGetTitle("[active]")
    Local $titley = StringRegExpReplace($title, $nonstandard, "")
    _ScreenCapture_Capture(@DesktopDir & "\" & $titley & " " & @HOUR & @MIN & @SEC & ".jpg", 0, 153, 1279, 870)
EndFunc   ;==>prtSc

Func leave()
    Exit
EndFunc   ;==>leave

 

Thanks for any kind of help in advance!

Link to comment
Share on other sites

  • Moderators

SirAlonne,

Welcome to the AutoIt forums.

As always with RegEx questions a sample string with the required result is always a helpful addition to your post.

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Melba23

Youtube = Stefan Kunz – Hand Lettering Artist Livestream 🔴  YouTube  Mozilla Firefox 173339.jpg

 

Dionysis

Youtube = Stefan Kunz  Hand Lettering Artist Livestream  - YouTube - Mozilla Firefox 174434.jpg

The main problem solved, thank you! Only the youtube and firefox part remain.
Could you explain how this part works and how can I expand it?

"[^\w\h-\.!\()]"



 

Link to comment
Share on other sites

Well, I solved it the ugly way:

Func prtSc()
    Local $filter1 = " - Mozilla Firefox"
    Local $filter2 = " - YouTube"
    Local $filter3 = " on Vimeo"
    Local $nonstandard = "[^\w\h-\.!\()]"
    Local $title = WinGetTitle("[active]")
    Local $titley = StringRegExpReplace($title, $nonstandard, "")
    Local $titleyy = StringReplace($titley, $filter1, "")
    Local $titleyyy = StringReplace($titleyy, $filter2, "")
    Local $titleyyyy = StringReplace($titleyyy, $filter3, "")
    _ScreenCapture_Capture(@DesktopDir & "\" & $titleyyyy & " " & @HOUR & @MIN & @SEC & ".jpg", 0, 153, 1279, 870)
EndFunc   ;==>prtSc

 

Youtube = Stefan Kunz  Hand Lettering Artist Livestream  191214.jpg
Vimeo = AnimationFX Reel 191435.jpg

Link to comment
Share on other sites

You might find it useful to keep Unicode letters and digits as well. Prepend (*UCP) at the pattern to make \w match any Unicode letter or digit or _, and \d any Unicode digit.

Fictuous example:

Local $title = "영국여사 조쉬엄마의 첫 치맥 먹방 도전!?! ◓ ♬♫\ghuo .*. ỸἇἮὤ ∑ [ທຊکڄຮ] (+조쉬 사춘기 썰)"
Local $titley = StringRegExpReplace($title, "(*UCP)[^\w\h-\.!\()]", "")
MsgBox(0, "", $titley)

If ever you need to filter in or out specific languages or types of Unicode codepoint, you may find the \p<something> construct pretty handy. See help under StringRegexp() for basics or the full official PCRE1 doc https://www.pcre.org/original/doc/html/pcrepattern.html

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@SirAlonne

Well, jchp explained most of my regexp and pointed you to the complete reference manual.

If that is overwhelming, you can check the StringRegExp helpfile and the Regular Expression Tutorial for some more comprehensive approaches,

As for my re:

"[^\w\h\.!\()-]"

"[]" ;square brackets mark a set. A set matches any of the characters included in it
"^" ;when inside a set gives the complement of the set, e.g. "[e]" matches the "e" char, "[^e]" matches any char but "e"
"\w" ;matches any word char, that is any letter,digit or the underscore "_" char
"\h" ;matches any horizontal space char, that is space, tab and some more,you can find them in the StringRegExp helpfile
"\." ;matches the dot char, if not escaped "." matches any char with some exceptions, see StringRegExp helpfile again
"!" ;matches the exclamation mark
"\(" ;matches the opening parenthesis, if not escaped parentheses mark a group, again look it up, too long to cover here
")" ;matches the closing parenthesis, only the left parenthesis needs escaping
"-" ;matches the hyphen char, if put between chars marks a range, e.g. "[a-z]" matches any lowercase latin character, but
; "[a\-z]" matches either "a","-" or "z"
;so my previous re "[^\w\h-\.!\()]" was wrong because hyphen needed to be either escaped or in the end of the set

"[^\w\h\.!\()-]" ;matches any char that isn't a word char, a horizontal space char, a dot, an exclamation mark,parentheses or a hyphen

I hope I made it clear enough :P

Link to comment
Share on other sites

@Dionysis @jchd

Thanks for the thorough explanation and the resources!
Somehow I missed the StringRegExp part in the help file... my bad lol. I fixed and changed some part in the code (now it save files in increasing numerical order and if the active window is changed then resets the counter) but still couldn't figure out how to define that firefox and youtube part along with the unicode and the rest. Is that even possible?

#include <ScreenCapture.au3>

HotKeySet("{PRINTSCREEN}", "prtSc")
HotKeySet("{ESC}", "leave")

Global $num = 0
Global $folder = @DesktopDir & "\Screenshots"

While 1
    Sleep(100)
WEnd

Func prtSc()
    Local $filter1 = " - Mozilla Firefox"
    Local $filter2 = " - YouTube"
    Local $filter3 = " on Vimeo"
    Local $filter4 = "  "
    Local $nonstandard = "(*UCP)[^\w\h-\.!\()]"
    Local $title = WinGetTitle("[active]")
    Local $titley = StringRegExpReplace($title, $nonstandard, "")
    Local $titleyy = StringReplace($titley, $filter1, "")
    Local $titleyyy = StringReplace($titleyy, $filter2, "")
    Local $titleyyyy = StringReplace($titleyyy, $filter3, "")
    Global $titleyyyyy = StringReplace($titleyyyy, $filter4, " ")
    Call("check")
    $num = $num + 1
    If Not FileExists("@DesktopDir\Screenshots") Then DirCreate($folder)
    _ScreenCapture_Capture(@DesktopDir & "\Screenshots\" & $titleyyyyy & " " & $num & ".jpg", 0, 153, 1279, 870, False)

EndFunc   ;==>prtSc

Func check()
    $test = FileFindFirstFile($folder & "\" & $titleyyyyy & "*" & ".jpg")
    If $test = -1 Then
        $num = 0
    EndIf

EndFunc   ;==>check

Func leave()
    Exit
EndFunc   ;==>leave

 

Link to comment
Share on other sites

You can use an re to match any three strings in a single StringRegExpReplace like

StringRegExpReplace($titley,"- Mozilla Firefox|- YouTube|on Vimeo","") ;doesn't need to be escaped as it's not in a set

and you can also remove extra spaces using an re like

$titley = StringRegExpReplace($titley," +","")

Also, Why are you using a different variable for each replacement?

Link to comment
Share on other sites

10 hours ago, Dionysis said:

Also, Why are you using a different variable for each replacement?

Mostly "bad" habits that came from batch scripting.:D

I did some experiments with StringRegExpReplace for kicks but somehow that " +" part removed all spaces of the string so I used StringStripWS instead.  What I meant by "experiments"; the previous version reseted the counter when I changed tab/video so what would happen if I step back to the previous tab? Overwrites the files. (It's not like I used to do that but I wanted to practice. lol) So I tried to scan the Screenshots folder to find the latest one, then with StringRegExpReplace select a part of the filename "..... 17.jpg" clean it up till I have the number "17" then change $num. It worked for 3 times in a row when I tested but after it failed to go over 10 lol. I can't figure out what did I messed up.

#include <ScreenCapture.au3>
#include <File.au3>
#include <Array.au3>
#include <Constants.au3>

HotKeySet("{PRINTSCREEN}", "prtSc")
HotKeySet("{ESC}", "leave")

Global $num = 0
Global $folder = @DesktopDir & "\Screenshots\"
Global $type = ".jpg"


While 1
    Sleep(100)
WEnd

Func prtSc()
    Local $space = " "
    Local $nonstandard = "(*UCP)[^\w\h-\.!\()]|- Mozilla Firefox|- YouTube|on Vimeo"
    Local $title = WinGetTitle("[active]")
    Global $title1 = StringRegExpReplace($title, $nonstandard, "")
    Global $ch = StringStripWS($title1, 4)
    Call("check")
    $num = $num + 1
    Local $end = $title1 & $space & $num & $type
    Local $endlin = StringStripWS($end, 4)
    If Not FileExists($folder) Then DirCreate($folder)
    _ScreenCapture_Capture($folder & $endlin, 0, 153, 1279, 870, False)

EndFunc   ;==>prtSc

Func check()
    $test = FileFindFirstFile($folder & $ch & "*" & $type)
    If $test = -1 Then
        $num = 0
    Else
        $file = FileFindNextFile($test)
        $latest = _FileVersion($folder, $ch & "*" & $type, 1, False)
        $nomero = StringRegExpReplace($latest, '(?i).*?((\h)[[:digit:]]\.).*?$', "$1")
        $nomer = StringRegExpReplace($nomero, "[\h\.]", "")
        $num = $nomer
    EndIf

EndFunc   ;==>check

Func _FileVersion($sFilePath, $sMask = "*.*", $iFlag = 0, $bFormat = True)
    Local $aFileList = _FileListToArrayRec($sFilePath, $sMask, 1, 0, 0, 2)
    If @error Then Return 0
    Local $aFileVersion[0][4]
    _ArrayAdd($aFileVersion, UBound($aFileList) - 1 & "|Modified|Created|Accessed")
    For $i = 1 To $aFileList[0]
        _ArrayAdd($aFileVersion, $aFileList[$i] & "|" & FileGetTime($aFileList[$i], 0, 1) & "|" & FileGetTime($aFileList[$i], 1, 1) & "|" & FileGetTime($aFileList[$i], 2, 1))
    Next
    Switch $iFlag
        Case 1
            _ArraySort($aFileVersion, 1, 1, 0, 2)
        Case 2
            _ArraySort($aFileVersion, 1, 1, 0, 3)
        Case Else
            _ArraySort($aFileVersion, 1, 1, 0, 1)
    EndSwitch
    Return $bFormat = True ? $aFileVersion : $aFileVersion[1][0]
EndFunc   ;==>_FileVersion


Func leave()
    Exit
EndFunc   ;==>leave

 

Link to comment
Share on other sites

@Dionysis

There's no difference between these two, thanks!

 

However, _FileListToArrayRec within _FileVersion  triggers an error 0 sometimes and I don't know why. Which means $num = 0 so it can't continue from the previous number just overwrite the existing one all the time.. Otherwise the script works perfectly.

#include <ScreenCapture.au3>
#include <File.au3>
#include <Array.au3>
#include <Constants.au3>

HotKeySet("{PRINTSCREEN}", "prtSc")
HotKeySet("{ESC}", "leave")

Global $num = 0
Global $folder = @DesktopDir & "\Screenshots\"
Global $type = ".jpg"
Global $wildcard = "*"


While 1
    Sleep(100)
WEnd

Func prtSc()
    Local $space = " "
    Local $nonstandard = "(*UCP)[^\w\h-\.!\() +]|- Mozilla Firefox|- YouTube|on Vimeo"
    Global $title = WinGetTitle("[active]")
    $title = StringRegExpReplace($title, $nonstandard, "")
    $title = StringRegExpReplace($title," +"," ")
    Call("check")
    $num = $num + 1
    $title = $title & $space & $num & $type
    $title = StringRegExpReplace($title," +"," ")
    If Not FileExists($folder) Then DirCreate($folder)
    _ScreenCapture_Capture($folder & $title, 0, 153, 1279, 870, False)

EndFunc   ;==>prtSc

Func check()
    $test = FileFindFirstFile($folder & $title & $wildcard & $type)
    If $test = -1 Then
        $num = 0
    Else
        $nametype = $title & $wildcard & $type
        $last = _FileVersion($folder, $nametype, 1, False)
        $last = StringRegExpReplace($last, '(?i).*?((\h)(\d+)(\.)).*?$', "$1")
        $last = StringRegExpReplace($last, "[\h\.]", "")
        $num = $last
    EndIf

EndFunc   ;==>check

Func _FileVersion($sFilePath, $sMask = "*.*", $iFlag = 0, $bFormat = True)
    Local $aFileList = _FileListToArrayRec($sFilePath, $sMask, 1, 0, 0, 2)
    If @error Then Return 0
    Local $aFileVersion[0][4]
    _ArrayAdd($aFileVersion, UBound($aFileList) - 1 & "|Modified|Created|Accessed")
    For $i = 1 To $aFileList[0]
        _ArrayAdd($aFileVersion, $aFileList[$i] & "|" & FileGetTime($aFileList[$i], 0, 1) & "|" & FileGetTime($aFileList[$i], 1, 1) & "|" & FileGetTime($aFileList[$i], 2, 1))
    Next
    Switch $iFlag
        Case 1
            _ArraySort($aFileVersion, 1, 1, 0, 2)
        Case 2
            _ArraySort($aFileVersion, 1, 1, 0, 3)
        Case Else
            _ArraySort($aFileVersion, 1, 1, 0, 1)
    EndSwitch
    Return $bFormat = True ? $aFileVersion : $aFileVersion[1][0]
EndFunc   ;==>_FileVersion


Func leave()
    Exit
EndFunc   ;==>leave


I forgot to add the link.. Maybe the "w/" part in the title messing with something.

Youtube =  Embracing Randomness Imperfection in Graphic Design Typography w Chris Ashworth 1.jpg
(If I copy this file and paste it next to the original, windows simply ignores the fact that these two coexists with the same name.)

Edited by SirAlonne
Link to comment
Share on other sites

@SirAlonne

Just to clarify, if you take a screenshot manually and save it yourself with this name, and there is already a file with the same name created by the autoit script in the folder, windows don't show up any msgbox to rename or replace one of them?

If that's the case, you can try and compare the two filenames by length and if they are of same length, check their hex values, there might be some similar but different chars.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...