mIRC Home    About    Download    Register    News    Help

Print Thread
#225583 05/09/10 02:59 PM
Joined: Sep 2010
Posts: 14
M
moocat Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2010
Posts: 14
Determining upper and lowercase chars is not functioning properly in 7.1

Lets take a code example:

Php Code:
/test {
  var %i = 1
  while (%i <= 300) {
	var %c = $chr(%i)
	echo -a > %c $iif(%c isupper, UPPER) $iif(%c islower, LOWER) $iif($regex(%c, /[[:upper:]]/g), REG_UPPER) $iif($regex(%c, /[[:lower:]]/g), REG_LOWER)
	inc %i
  }
} 


Results:
Php Code:

  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
	  UPPER LOWER
 
 UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 
 UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 UPPER LOWER
 ! UPPER LOWER
 " UPPER LOWER
 # UPPER LOWER
 $ UPPER LOWER
 % UPPER LOWER
 & UPPER LOWER
 ' UPPER LOWER
 ( UPPER LOWER
 ) UPPER LOWER
 * UPPER LOWER
 + UPPER LOWER
 , UPPER LOWER
 - UPPER LOWER
 . UPPER LOWER
 / UPPER LOWER
 0 UPPER LOWER
 1 UPPER LOWER
 2 UPPER LOWER
 3 UPPER LOWER
 4 UPPER LOWER
 5 UPPER LOWER
 6 UPPER LOWER
 7 UPPER LOWER
 8 UPPER LOWER
 9 UPPER LOWER
 : UPPER LOWER
 ; UPPER LOWER
 < UPPER LOWER
 = UPPER LOWER
 > UPPER LOWER
 ? UPPER LOWER
 @ UPPER LOWER
 A UPPER REG_UPPER
 B UPPER REG_UPPER
 C UPPER REG_UPPER
 D UPPER REG_UPPER
 E UPPER REG_UPPER
 F UPPER REG_UPPER
 G UPPER REG_UPPER
 H UPPER REG_UPPER
 I UPPER REG_UPPER
 J UPPER REG_UPPER
 K UPPER REG_UPPER
 L UPPER REG_UPPER
 M UPPER REG_UPPER
 N UPPER REG_UPPER
 O UPPER REG_UPPER
 P UPPER REG_UPPER
 Q UPPER REG_UPPER
 R UPPER REG_UPPER
 S UPPER REG_UPPER
 T UPPER REG_UPPER
 U UPPER REG_UPPER
 V UPPER REG_UPPER
 W UPPER REG_UPPER
 X UPPER REG_UPPER
 Y UPPER REG_UPPER
 Z UPPER REG_UPPER
 [ UPPER LOWER
 \ UPPER LOWER
 ] UPPER LOWER
 ^ UPPER LOWER
 _ UPPER LOWER
 ` UPPER LOWER
 a LOWER REG_LOWER
 b LOWER REG_LOWER
 c LOWER REG_LOWER
 d LOWER REG_LOWER
 e LOWER REG_LOWER
 f LOWER REG_LOWER
 g LOWER REG_LOWER
 h LOWER REG_LOWER
 i LOWER REG_LOWER
 j LOWER REG_LOWER
 k LOWER REG_LOWER
 l LOWER REG_LOWER
 m LOWER REG_LOWER
 n LOWER REG_LOWER
 o LOWER REG_LOWER
 p LOWER REG_LOWER
 q LOWER REG_LOWER
 r LOWER REG_LOWER
 s LOWER REG_LOWER
 t LOWER REG_LOWER
 u LOWER REG_LOWER
 v LOWER REG_LOWER
 w LOWER REG_LOWER
 x LOWER REG_LOWER
 y LOWER REG_LOWER
 z LOWER REG_LOWER
 { UPPER LOWER
 | UPPER LOWER
 } UPPER LOWER
 ~ UPPER LOWER
  UPPER LOWER
 &#128; UPPER LOWER
  UPPER LOWER
 &#130; UPPER LOWER
 &#131; UPPER LOWER
 &#132; UPPER LOWER
 &#133; UPPER LOWER
 &#134; UPPER LOWER
 &#135; UPPER LOWER
 &#136; UPPER LOWER
 &#137; UPPER LOWER
 &#138; UPPER LOWER
 &#139; UPPER LOWER
 &#140; UPPER LOWER
  UPPER LOWER
 &#142; UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 &#145; UPPER LOWER
 &#146; UPPER LOWER
 &#147; UPPER LOWER
 &#148; UPPER LOWER
 &#149; UPPER LOWER
 &#150; UPPER LOWER
 &#151; UPPER LOWER
 &#152; UPPER LOWER
 &#153; UPPER LOWER
 &#154; UPPER LOWER
 &#155; UPPER LOWER
 &#156; UPPER LOWER
  UPPER LOWER
 &#158; UPPER LOWER
 &#159; UPPER LOWER
   UPPER LOWER
 ¡ UPPER LOWER
 ¢ UPPER LOWER
 £ UPPER LOWER
 ¤ UPPER LOWER
 ¥ UPPER LOWER
 ¦ UPPER LOWER
 § UPPER LOWER
 ¨ UPPER LOWER
 © UPPER LOWER
 ª UPPER LOWER
 « UPPER LOWER
 ¬ UPPER LOWER
 ­ UPPER LOWER
 ® UPPER LOWER
 ¯ UPPER LOWER
 ° UPPER LOWER
 ± UPPER LOWER
 ² UPPER LOWER
 ³ UPPER LOWER
 ´ UPPER LOWER
 µ UPPER LOWER
 ¶ UPPER LOWER
 · UPPER LOWER
 ¸ UPPER LOWER
 ¹ UPPER LOWER
 º UPPER LOWER
 » UPPER LOWER
 ¼ UPPER LOWER
 ½ UPPER LOWER
 ¾ UPPER LOWER
 ¿ UPPER LOWER
 À UPPER
 Á UPPER
 Â UPPER
 Ã UPPER
 Ä UPPER
 Å UPPER
 Æ UPPER
 Ç UPPER
 È UPPER
 É UPPER
 Ê UPPER
 Ë UPPER
 Ì UPPER
 Í UPPER
 Î UPPER
 Ï UPPER
 Ð UPPER
 Ñ UPPER
 Ò UPPER
 Ó UPPER
 Ô UPPER
 Õ UPPER
 Ö UPPER
 × UPPER LOWER
 Ø UPPER
 Ù UPPER
 Ú UPPER
 Û UPPER
 Ü UPPER
 Ý UPPER
 Þ UPPER
 ß LOWER
 à LOWER
 á LOWER
 â LOWER
 ã LOWER
 ä LOWER
 å LOWER
 æ LOWER
 ç LOWER
 è LOWER
 é LOWER
 ê LOWER
 ë LOWER
 ì LOWER
 í LOWER
 î LOWER
 ï LOWER
 ð LOWER
 ñ LOWER
 ò LOWER
 ó LOWER
 ô LOWER
 õ LOWER
 ö LOWER
 ÷ UPPER LOWER
 ø LOWER
 ù LOWER
 ú LOWER
 û LOWER
 ü LOWER
 ý LOWER
 þ LOWER
 ÿ LOWER
 &#256; UPPER
 &#257; LOWER
 &#258; UPPER
 &#259; LOWER
 &#260; UPPER
 &#261; LOWER
 &#262; UPPER
 &#263; LOWER
 &#264; UPPER
 &#265; LOWER
 &#266; UPPER
 &#267; LOWER
 &#268; UPPER
 &#269; LOWER
 &#270; UPPER
 &#271; LOWER
 &#272; UPPER
 &#273; LOWER
 &#274; UPPER
 &#275; LOWER
 &#276; UPPER
 &#277; LOWER
 &#278; UPPER
 &#279; LOWER
 &#280; UPPER
 &#281; LOWER
 &#282; UPPER
 &#283; LOWER
 &#284; UPPER
 &#285; LOWER
 &#286; UPPER
 &#287; LOWER
 &#288; UPPER
 &#289; LOWER
 &#290; UPPER
 &#291; LOWER
 &#292; UPPER
 &#293; LOWER
 &#294; UPPER
 &#295; LOWER
 &#296; UPPER
 &#297; LOWER
 &#298; UPPER
 &#299; LOWER
 &#300; UPPER 


Unicode can't be displayed here but you get the idea.
The results: http://pastebin.com/HP5dzBLF

As you see, isupper and islower operators are not only true for letters/characters. Can work around this with the isletter operator.
Also you can see the regex group only works for a-z and A-Z.
Making a proper regex to take care of all upper or lowercase wouldn't be feasible as they are not grouped together. (as you can see in the end there)

Is there another (fast) way to determine properly between lower and uppercase letters?

Joined: Dec 2002
Posts: 344
D
Pan-dimensional mouse
Offline
Pan-dimensional mouse
D
Joined: Dec 2002
Posts: 344
I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line:

Code:
if (%c isupper) { ... }
if (%c === $upper(%c)) { ... }


Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use:

Code:
if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter }
if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter }


I'm not sure if there is a more efficient way, though.

Joined: Sep 2010
Posts: 14
M
moocat Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2010
Posts: 14
Originally Posted By: drum
I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line:

Code:
if (%c isupper) { ... }
if (%c === $upper(%c)) { ... }


Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use:

Code:
if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter }
if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter }


I'm not sure if there is a more efficient way, though.


Yeah, this will work:
Code:
if (%c === $upper(%c)) { ... }


But this wont: (as it will be true for signs too)
Code:
if (%c isupper) { ... }


The problem comes in when you want to count the uppercase letters in a message for example.
Without a regex you'd need to use isupper and islower, and do the message char by char.
For example like this:
Code:
char.upper {
  ; $1- Message
  tokenize 32 $remove($1-, $chr(32))
  var %i = 1, %c = 0
  while (%i <= $len($1-)) {
    if ($mid($1-, %i, 1) === $upper($v1)) { inc %c }
    inc %i
  }
  return %c
}


Now that works, but a regex group is incredibly much faster.
1000 iterations of that on my comp is 1872 ticks, with the upper regex group its 63. (which doesn't include the unicode uppercase letters I need, so it's useless)

Joined: Jul 2006
Posts: 4,149
W
Hoopy frood
Offline
Hoopy frood
W
Joined: Jul 2006
Posts: 4,149
This thread might help you : https://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238


#mircscripting @ irc.swiftirc.net == the best mIRC help channel
Joined: Sep 2010
Posts: 14
M
moocat Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2010
Posts: 14
Originally Posted By: Wims
This thread might help you : https://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238


Thanks, that explains the isupper/islower issue.
Either way, doing (%c isupper && $v1 isletter) doesn't give that much difference in speed from just using (%c isupper), so I guess there isn't really a problem there.

However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?

Joined: Dec 2002
Posts: 3,138
C
Hoopy frood
Offline
Hoopy frood
C
Joined: Dec 2002
Posts: 3,138
Originally Posted By: moocat
However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?

You can enable UTF-8 mode using the (*UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example:

//echo -a $iif($regex($chr(256), /(*UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER)

Joined: Sep 2010
Posts: 14
M
moocat Offline OP
Pikka bird
OP Offline
Pikka bird
M
Joined: Sep 2010
Posts: 14
Originally Posted By: Collective
Originally Posted By: moocat
However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?

You can enable UTF-8 mode using the (*UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example:

//echo -a $iif($regex($chr(256), /(*UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER)


That is truly beautiful. Went from 1825 ticks to 172. Thank you so much good sir laugh

Heres a quick search for the different unicode categories if anyone else needs em: http://www.fileformat.info/info/unicode/category/index.htm

Joined: Feb 2006
Posts: 546
J
Fjord artisan
Offline
Fjord artisan
J
Joined: Feb 2006
Posts: 546
there are a number of peculiar discrepancies between these types of operations in mIRC. unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()! here are a few examples of pairs of seemingly identical checks along with the number of characters for which they have different results:

Code:
if ($chr(N) isupper)
if ($chr(N) === $upper($chr(N)))


45,825 chars, $chr(223) is the first.

Code:
if ($chr(N) islower)
if ($chr(N) === $lower($chr(N)))


45,603 chars, $chr(304) is the first.

these results are mostly accounted for by the 45,533 characters which are neither upper nor lower (according to islower and isupper), the first example being $chr(443).

Code:
if ($chr(N) isalnum)
if ($chr(N) isalpha) || ($chr(N) isnum)


303 characters, $chr(178) is the first.

on the plus side: the following are, rather unremarkably, pairs of equivalent checks:

Code:
if ($chr(N) isupper)
if ($isupper($chr(N)))

if ($chr(N) islower)
if ($islower($chr(N)))

if ($chr(N) isalpha)
if ($chr(N) isletter)


"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
Joined: Dec 2002
Posts: 344
D
Pan-dimensional mouse
Offline
Pan-dimensional mouse
D
Joined: Dec 2002
Posts: 344
Originally Posted By: jaytea
unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()!


Thanks for pointing that out. My error was in skimming the help file and misreading what it said. Still I probably should have tested it first before stating it. wink

Joined: Oct 2003
Posts: 3,918
A
Hoopy frood
Offline
Hoopy frood
A
Joined: Oct 2003
Posts: 3,918
I'm confused about the direction of this conversation.

Is the consensus that the is* (islower, isupper, etc) operators should be updated to support unicode characters? Or are we saying this is not a bug and just the "Way It Works"(tm)?

Fixing the operators to support Unicode would be my suggestion, but nobody has really stated what the solution should be-- I'm only seeing descriptions of the problem.


- argv[0] on EFnet #mIRC
- "Life is a pointer to an integer without a cast"
Joined: Dec 2002
Posts: 344
D
Pan-dimensional mouse
Offline
Pan-dimensional mouse
D
Joined: Dec 2002
Posts: 344
There does appear to be a quirk where $upper() will not correctly replace a lowercase letter with its uppercase equivalent. The example that jaytea gave was $chr(223) which is a German lowercase character (ß). However, this link explains what is going on, and why it probably shouldn't be considered an mIRC bug (but rather a limitation with Microsoft's Unicode routines):

http://blogs.msdn.com/b/michkap/archive/2005/04/10/406880.aspx

Also to clarify, mIRC's case routines do support Unicode already, it's just that there are quirks like this one. The reason the OP didn't want to use mIRC's routines was because it was inefficient at counting the number of uppercase/lowercase characters in a given string compared to regex.

Last edited by drum; 07/09/10 01:53 PM.

Link Copied to Clipboard