If you're certain the script is not going to forbid any of the normal 7-bit text characters, you can use a much simpler range regex that avoids all the issues of escaping those 5 special characters.


The above regex pattern looks for anything that's not in the $chr(20) through $chr(126) range where the space and the 95 7-bit text characters exist. If you want to include additional accented characters above codepoint 126, you can include them literally inside the [^list] Since this range includes the A-Z and a-z, there's no need for a case-insensitive /i flag.

That makes the regex much simpler, and the /S flag there is if you're wanting to allow colors/bold/etc. Instead of listing the literal accented letters, they can be listed in the \xformat, however if the codepoint is above 255 it needs to be wrapped inside {brackets}. i.e. \x31 is the number '1', but codepoint 10004 is \x{2714}
If you want to put allowed emojis on the list, you can either put the literal emoji symbol into the %allowed list, or include something like the black cat emoji at codepoint 128049 is \x{1f431}

if you change the above regex pattern to have parenthesis around it, you can retrieve the offending symbol.


This will put the 1st encountered not-allowed character into $regml(1)