I've found that certain characters (surrogates) when passed through $regsubex do not combine as they should.

This should resolve the characters and place them next to each other to display a '10' inside a box. $replace works fine. $regsubex corrupts the string.
Code:
//echo -ag $json.unescape(\ud83d\udd1f)

//echo -ag $regsubex(aa,/(a)/gu,$chr($gettok(55357 56607,\n,32))) vs $replace(ab,a,$chr(55357),b,$chr(56607))


Code:
alias json.unescape {
  return $regsubex($1-,/\\(?:u(....)|(.))/gu,$escape.map(\t))
}
 
alias -l escape.map {
  if ($1 isalpha) return $chr(160)
  if ($1 !isalnum) return $1
  if ($base($1,16,10) > 32) return $chr($v1)
  return $chr(160)
}


Other examples, taken from https://github.com/minimaxir/big-list-of-naughty-strings/blob/master/blns.txt
Code:
0\uFE0F\u20E3 1\uFE0F\u20E3 2\uFE0F\u20E3 3\uFE0F\u20E3 4\uFE0F\u20E3 5\uFE0F\u20E3 6\uFE0F\u20E3 7\uFE0F\u20E3 8\uFE0F\u20E3 9\uFE0F\u20E3 \uD83D\uDD1F
\ud83c\uddfa\ud83c\uddf8\ud83c\uddf7\ud83c\uddfa\ud83c\uddf8 \ud83c\udde6\ud83c\uddeb\ud83c\udde6\ud83c\uddf2\ud83c\uddf8
\ud835\udce3\ud835\udcf1\ud835\udcee \ud835\udcfa\ud835\udcfe\ud835\udcf2\ud835\udcec\ud835\udcf4 \ud835\udceb\ud835\udcfb\ud835\udcf8\ud835\udd00\ud835\udcf7 \ud835\udcef\ud835\udcf8\ud835\udd01 \ud835\udcf3\ud835\udcfe\ud835\udcf6\ud835\udcf9\ud835\udcfc \ud835\udcf8\ud835\udcff\ud835\udcee\ud835\udcfb \ud835\udcfd\ud835\udcf1\ud835\udcee \ud835\udcf5\ud835\udcea\ud835\udd03\ud835\udd02 \ud835\udced\ud835\udcf8\ud835\udcf0


Last edited by Loki12583; 11/06/18 01:23 AM.