Forums Bug Reports Upper and lowercase

Print Thread

#225583 05/09/10 02:59 PM

moocat

Determining upper and lowercase chars is not functioning properly in 7.1

Lets take a code example:

Php Code:

/test {
  var %i = 1
  while (%i <= 300) {
	var %c = $chr(%i)
	echo -a > %c $iif(%c isupper, UPPER) $iif(%c islower, LOWER) $iif($regex(%c, /[[:upper:]]/g), REG_UPPER) $iif($regex(%c, /[[:lower:]]/g), REG_LOWER)
	inc %i
  }
}

Results:

Php Code:


  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
	  UPPER LOWER
 
 UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 
 UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 UPPER LOWER
 ! UPPER LOWER
 " UPPER LOWER
 # UPPER LOWER
 $ UPPER LOWER
 % UPPER LOWER
 & UPPER LOWER
 ' UPPER LOWER
 ( UPPER LOWER
 ) UPPER LOWER
 * UPPER LOWER
 + UPPER LOWER
 , UPPER LOWER
 - UPPER LOWER
 . UPPER LOWER
 / UPPER LOWER
 0 UPPER LOWER
 1 UPPER LOWER
 2 UPPER LOWER
 3 UPPER LOWER
 4 UPPER LOWER
 5 UPPER LOWER
 6 UPPER LOWER
 7 UPPER LOWER
 8 UPPER LOWER
 9 UPPER LOWER
 : UPPER LOWER
 ; UPPER LOWER
 < UPPER LOWER
 = UPPER LOWER
 > UPPER LOWER
 ? UPPER LOWER
 @ UPPER LOWER
 A UPPER REG_UPPER
 B UPPER REG_UPPER
 C UPPER REG_UPPER
 D UPPER REG_UPPER
 E UPPER REG_UPPER
 F UPPER REG_UPPER
 G UPPER REG_UPPER
 H UPPER REG_UPPER
 I UPPER REG_UPPER
 J UPPER REG_UPPER
 K UPPER REG_UPPER
 L UPPER REG_UPPER
 M UPPER REG_UPPER
 N UPPER REG_UPPER
 O UPPER REG_UPPER
 P UPPER REG_UPPER
 Q UPPER REG_UPPER
 R UPPER REG_UPPER
 S UPPER REG_UPPER
 T UPPER REG_UPPER
 U UPPER REG_UPPER
 V UPPER REG_UPPER
 W UPPER REG_UPPER
 X UPPER REG_UPPER
 Y UPPER REG_UPPER
 Z UPPER REG_UPPER
 [ UPPER LOWER
 \ UPPER LOWER
 ] UPPER LOWER
 ^ UPPER LOWER
 _ UPPER LOWER
 ` UPPER LOWER
 a LOWER REG_LOWER
 b LOWER REG_LOWER
 c LOWER REG_LOWER
 d LOWER REG_LOWER
 e LOWER REG_LOWER
 f LOWER REG_LOWER
 g LOWER REG_LOWER
 h LOWER REG_LOWER
 i LOWER REG_LOWER
 j LOWER REG_LOWER
 k LOWER REG_LOWER
 l LOWER REG_LOWER
 m LOWER REG_LOWER
 n LOWER REG_LOWER
 o LOWER REG_LOWER
 p LOWER REG_LOWER
 q LOWER REG_LOWER
 r LOWER REG_LOWER
 s LOWER REG_LOWER
 t LOWER REG_LOWER
 u LOWER REG_LOWER
 v LOWER REG_LOWER
 w LOWER REG_LOWER
 x LOWER REG_LOWER
 y LOWER REG_LOWER
 z LOWER REG_LOWER
 { UPPER LOWER
 | UPPER LOWER
 } UPPER LOWER
 ~ UPPER LOWER
  UPPER LOWER
 &#128; UPPER LOWER
  UPPER LOWER
 &#130; UPPER LOWER
 &#131; UPPER LOWER
 &#132; UPPER LOWER
 &#133; UPPER LOWER
 &#134; UPPER LOWER
 &#135; UPPER LOWER
 &#136; UPPER LOWER
 &#137; UPPER LOWER
 &#138; UPPER LOWER
 &#139; UPPER LOWER
 &#140; UPPER LOWER
  UPPER LOWER
 &#142; UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 &#145; UPPER LOWER
 &#146; UPPER LOWER
 &#147; UPPER LOWER
 &#148; UPPER LOWER
 &#149; UPPER LOWER
 &#150; UPPER LOWER
 &#151; UPPER LOWER
 &#152; UPPER LOWER
 &#153; UPPER LOWER
 &#154; UPPER LOWER
 &#155; UPPER LOWER
 &#156; UPPER LOWER
  UPPER LOWER
 &#158; UPPER LOWER
 &#159; UPPER LOWER
   UPPER LOWER
 ¡ UPPER LOWER
 ¢ UPPER LOWER
 £ UPPER LOWER
 ¤ UPPER LOWER
 ¥ UPPER LOWER
 ¦ UPPER LOWER
 § UPPER LOWER
 ¨ UPPER LOWER
 © UPPER LOWER
 ª UPPER LOWER
 « UPPER LOWER
 ¬ UPPER LOWER
  UPPER LOWER
 ® UPPER LOWER
 ¯ UPPER LOWER
 ° UPPER LOWER
 ± UPPER LOWER
 ² UPPER LOWER
 ³ UPPER LOWER
 ´ UPPER LOWER
 µ UPPER LOWER
 ¶ UPPER LOWER
 · UPPER LOWER
 ¸ UPPER LOWER
 ¹ UPPER LOWER
 º UPPER LOWER
 » UPPER LOWER
 ¼ UPPER LOWER
 ½ UPPER LOWER
 ¾ UPPER LOWER
 ¿ UPPER LOWER
 À UPPER
 Á UPPER
 Â UPPER
 Ã UPPER
 Ä UPPER
 Å UPPER
 Æ UPPER
 Ç UPPER
 È UPPER
 É UPPER
 Ê UPPER
 Ë UPPER
 Ì UPPER
 Í UPPER
 Î UPPER
 Ï UPPER
 Ð UPPER
 Ñ UPPER
 Ò UPPER
 Ó UPPER
 Ô UPPER
 Õ UPPER
 Ö UPPER
 × UPPER LOWER
 Ø UPPER
 Ù UPPER
 Ú UPPER
 Û UPPER
 Ü UPPER
 Ý UPPER
 Þ UPPER
 ß LOWER
 à LOWER
 á LOWER
 â LOWER
 ã LOWER
 ä LOWER
 å LOWER
 æ LOWER
 ç LOWER
 è LOWER
 é LOWER
 ê LOWER
 ë LOWER
 ì LOWER
 í LOWER
 î LOWER
 ï LOWER
 ð LOWER
 ñ LOWER
 ò LOWER
 ó LOWER
 ô LOWER
 õ LOWER
 ö LOWER
 ÷ UPPER LOWER
 ø LOWER
 ù LOWER
 ú LOWER
 û LOWER
 ü LOWER
 ý LOWER
 þ LOWER
 ÿ LOWER
 &#256; UPPER
 &#257; LOWER
 &#258; UPPER
 &#259; LOWER
 &#260; UPPER
 &#261; LOWER
 &#262; UPPER
 &#263; LOWER
 &#264; UPPER
 &#265; LOWER
 &#266; UPPER
 &#267; LOWER
 &#268; UPPER
 &#269; LOWER
 &#270; UPPER
 &#271; LOWER
 &#272; UPPER
 &#273; LOWER
 &#274; UPPER
 &#275; LOWER
 &#276; UPPER
 &#277; LOWER
 &#278; UPPER
 &#279; LOWER
 &#280; UPPER
 &#281; LOWER
 &#282; UPPER
 &#283; LOWER
 &#284; UPPER
 &#285; LOWER
 &#286; UPPER
 &#287; LOWER
 &#288; UPPER
 &#289; LOWER
 &#290; UPPER
 &#291; LOWER
 &#292; UPPER
 &#293; LOWER
 &#294; UPPER
 &#295; LOWER
 &#296; UPPER
 &#297; LOWER
 &#298; UPPER
 &#299; LOWER
 &#300; UPPER

Unicode can't be displayed here but you get the idea.
The results: http://pastebin.com/HP5dzBLF

As you see, isupper and islower operators are not only true for letters/characters. Can work around this with the isletter operator.
Also you can see the regex group only works for a-z and A-Z.
Making a proper regex to take care of all upper or lowercase wouldn't be feasible as they are not grouped together. (as you can see in the end there)

Is there another (fast) way to determine properly between lower and uppercase letters?

Re: Upper and lowercase #225587 05/09/10 04:11 PM
Joined: Dec 2002 Posts: 294 D drum Pan-dimensional mouse
drum Pan-dimensional mouse D Joined: Dec 2002 Posts: 294	I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line: Code: if (%c isupper) { ... } if (%c === $upper(%c)) { ... } Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use: Code: if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter } if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter } I'm not sure if there is a more efficient way, though.

Re: Upper and lowercase drum #225588 05/09/10 04:34 PM
M moocat
moocat M	Originally Posted By: drum I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line: Code: if (%c isupper) { ... } if (%c === $upper(%c)) { ... } Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use: Code: if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter } if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter } I'm not sure if there is a more efficient way, though. Yeah, this will work: Code: if (%c === $upper(%c)) { ... } But this wont: (as it will be true for signs too) Code: if (%c isupper) { ... } The problem comes in when you want to count the uppercase letters in a message for example. Without a regex you'd need to use isupper and islower, and do the message char by char. For example like this: Code: char.upper { ; $1- Message tokenize 32 $remove($1-, $chr(32)) var %i = 1, %c = 0 while (%i <= $len($1-)) { if ($mid($1-, %i, 1) === $upper($v1)) { inc %c } inc %i } return %c } Now that works, but a regex group is incredibly much faster. 1000 iterations of that on my comp is 1872 ticks, with the upper regex group its 63. (which doesn't include the unicode uppercase letters I need, so it's useless)

Re: Upper and lowercase #225589 05/09/10 05:12 PM
Joined: Jul 2006 Posts: 4,020 France W Wims Hoopy frood
Wims Hoopy frood W Joined: Jul 2006 Posts: 4,020 France	This thread might help you : https://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238 #mircscripting @ irc.swiftirc.net == the best mIRC help channel

Re: Upper and lowercase Wims #225590 05/09/10 05:41 PM
M moocat
moocat M	Originally Posted By: Wims This thread might help you : https://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238 Thanks, that explains the isupper/islower issue. Either way, doing (%c isupper && $v1 isletter) doesn't give that much difference in speed from just using (%c isupper), so I guess there isn't really a problem there. However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?

Re: Upper and lowercase #225591 05/09/10 05:51 PM
Joined: Dec 2002 Posts: 3,015 London, UK C Collective Hoopy frood
Collective Hoopy frood C Joined: Dec 2002 Posts: 3,015 London, UK	Originally Posted By: moocat However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported? You can enable UTF-8 mode using the (UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example: //echo -a $iif($regex($chr(256), /(UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER)

Re: Upper and lowercase Collective #225592 05/09/10 06:07 PM
M moocat
moocat M	Originally Posted By: Collective Originally Posted By: moocat However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported? You can enable UTF-8 mode using the (UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example: //echo -a $iif($regex($chr(256), /(UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER) That is truly beautiful. Went from 1825 ticks to 172. Thank you so much good sir Heres a quick search for the different unicode categories if anyone else needs em: http://www.fileformat.info/info/unicode/category/index.htm

Re: Upper and lowercase #225594 05/09/10 07:28 PM
Joined: Feb 2006 Posts: 523 J jaytea Fjord artisan
jaytea Fjord artisan J Joined: Feb 2006 Posts: 523	there are a number of peculiar discrepancies between these types of operations in mIRC. unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()! here are a few examples of pairs of seemingly identical checks along with the number of characters for which they have different results: Code: if ($chr(N) isupper) if ($chr(N) === $upper($chr(N))) 45,825 chars, $chr(223) is the first. Code: if ($chr(N) islower) if ($chr(N) === $lower($chr(N))) 45,603 chars, $chr(304) is the first. these results are mostly accounted for by the 45,533 characters which are neither upper nor lower (according to islower and isupper), the first example being $chr(443). Code: if ($chr(N) isalnum) if ($chr(N) isalpha) \|\| ($chr(N) isnum) 303 characters, $chr(178) is the first. on the plus side: the following are, rather unremarkably, pairs of equivalent checks: Code: if ($chr(N) isupper) if ($isupper($chr(N))) if ($chr(N) islower) if ($islower($chr(N))) if ($chr(N) isalpha) if ($chr(N) isletter) "The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde

Re: Upper and lowercase jaytea #225608 06/09/10 02:25 AM
Joined: Dec 2002 Posts: 294 D drum Pan-dimensional mouse
drum Pan-dimensional mouse D Joined: Dec 2002 Posts: 294	Originally Posted By: jaytea unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()! Thanks for pointing that out. My error was in skimming the help file and misreading what it said. Still I probably should have tested it first before stating it.

Re: Upper and lowercase jaytea #225634 06/09/10 10:04 PM
Joined: Oct 2003 Posts: 3,641 Montreal, QC, Canada A argv0 Hoopy frood
argv0 Hoopy frood A Joined: Oct 2003 Posts: 3,641 Montreal, QC, Canada	I'm confused about the direction of this conversation. Is the consensus that the is* (islower, isupper, etc) operators should be updated to support unicode characters? Or are we saying this is not a bug and just the "Way It Works"(tm)? Fixing the operators to support Unicode would be my suggestion, but nobody has really stated what the solution should be-- I'm only seeing descriptions of the problem.

Re: Upper and lowercase argv0 #225667 07/09/10 01:31 PM
Joined: Dec 2002 Posts: 294 D drum Pan-dimensional mouse
drum Pan-dimensional mouse D Joined: Dec 2002 Posts: 294	There does appear to be a quirk where $upper() will not correctly replace a lowercase letter with its uppercase equivalent. The example that jaytea gave was $chr(223) which is a German lowercase character (ß). However, this link explains what is going on, and why it probably shouldn't be considered an mIRC bug (but rather a limitation with Microsoft's Unicode routines): http://blogs.msdn.com/b/michkap/archive/2005/04/10/406880.aspx Also to clarify, mIRC's case routines do support Unicode already, it's just that there are quirks like this one. The reason the OP didn't want to use mIRC's routines was because it was inefficient at counting the number of uppercase/lowercase characters in a given string compared to regex. Last edited by drum; 07/09/10 01:53 PM.

Link Copied to Clipboard