mIRC Homepage

Upper and lowercase

Posted By: moocat

Upper and lowercase - 05/09/10 02:59 PM

Determining upper and lowercase chars is not functioning properly in 7.1

Lets take a code example:

Php Code:
/test {
  var %i = 1
  while (%i <= 300) {
	var %c = $chr(%i)
	echo -a > %c $iif(%c isupper, UPPER) $iif(%c islower, LOWER) $iif($regex(%c, /[[:upper:]]/g), REG_UPPER) $iif($regex(%c, /[[:lower:]]/g), REG_LOWER)
	inc %i
  }
} 


Results:
Php Code:

  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
	  UPPER LOWER
 
 UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 
 UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 UPPER LOWER
 ! UPPER LOWER
 " UPPER LOWER
 # UPPER LOWER
 $ UPPER LOWER
 % UPPER LOWER
 & UPPER LOWER
 ' UPPER LOWER
 ( UPPER LOWER
 ) UPPER LOWER
 * UPPER LOWER
 + UPPER LOWER
 , UPPER LOWER
 - UPPER LOWER
 . UPPER LOWER
 / UPPER LOWER
 0 UPPER LOWER
 1 UPPER LOWER
 2 UPPER LOWER
 3 UPPER LOWER
 4 UPPER LOWER
 5 UPPER LOWER
 6 UPPER LOWER
 7 UPPER LOWER
 8 UPPER LOWER
 9 UPPER LOWER
 : UPPER LOWER
 ; UPPER LOWER
 < UPPER LOWER
 = UPPER LOWER
 > UPPER LOWER
 ? UPPER LOWER
 @ UPPER LOWER
 A UPPER REG_UPPER
 B UPPER REG_UPPER
 C UPPER REG_UPPER
 D UPPER REG_UPPER
 E UPPER REG_UPPER
 F UPPER REG_UPPER
 G UPPER REG_UPPER
 H UPPER REG_UPPER
 I UPPER REG_UPPER
 J UPPER REG_UPPER
 K UPPER REG_UPPER
 L UPPER REG_UPPER
 M UPPER REG_UPPER
 N UPPER REG_UPPER
 O UPPER REG_UPPER
 P UPPER REG_UPPER
 Q UPPER REG_UPPER
 R UPPER REG_UPPER
 S UPPER REG_UPPER
 T UPPER REG_UPPER
 U UPPER REG_UPPER
 V UPPER REG_UPPER
 W UPPER REG_UPPER
 X UPPER REG_UPPER
 Y UPPER REG_UPPER
 Z UPPER REG_UPPER
 [ UPPER LOWER
 \ UPPER LOWER
 ] UPPER LOWER
 ^ UPPER LOWER
 _ UPPER LOWER
 ` UPPER LOWER
 a LOWER REG_LOWER
 b LOWER REG_LOWER
 c LOWER REG_LOWER
 d LOWER REG_LOWER
 e LOWER REG_LOWER
 f LOWER REG_LOWER
 g LOWER REG_LOWER
 h LOWER REG_LOWER
 i LOWER REG_LOWER
 j LOWER REG_LOWER
 k LOWER REG_LOWER
 l LOWER REG_LOWER
 m LOWER REG_LOWER
 n LOWER REG_LOWER
 o LOWER REG_LOWER
 p LOWER REG_LOWER
 q LOWER REG_LOWER
 r LOWER REG_LOWER
 s LOWER REG_LOWER
 t LOWER REG_LOWER
 u LOWER REG_LOWER
 v LOWER REG_LOWER
 w LOWER REG_LOWER
 x LOWER REG_LOWER
 y LOWER REG_LOWER
 z LOWER REG_LOWER
 { UPPER LOWER
 | UPPER LOWER
 } UPPER LOWER
 ~ UPPER LOWER
  UPPER LOWER
 &#128; UPPER LOWER
  UPPER LOWER
 &#130; UPPER LOWER
 &#131; UPPER LOWER
 &#132; UPPER LOWER
 &#133; UPPER LOWER
 &#134; UPPER LOWER
 &#135; UPPER LOWER
 &#136; UPPER LOWER
 &#137; UPPER LOWER
 &#138; UPPER LOWER
 &#139; UPPER LOWER
 &#140; UPPER LOWER
  UPPER LOWER
 &#142; UPPER LOWER
  UPPER LOWER
  UPPER LOWER
 &#145; UPPER LOWER
 &#146; UPPER LOWER
 &#147; UPPER LOWER
 &#148; UPPER LOWER
 &#149; UPPER LOWER
 &#150; UPPER LOWER
 &#151; UPPER LOWER
 &#152; UPPER LOWER
 &#153; UPPER LOWER
 &#154; UPPER LOWER
 &#155; UPPER LOWER
 &#156; UPPER LOWER
  UPPER LOWER
 &#158; UPPER LOWER
 &#159; UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER LOWER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER LOWER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  UPPER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  UPPER LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
  LOWER
 &#256; UPPER
 &#257; LOWER
 &#258; UPPER
 &#259; LOWER
 &#260; UPPER
 &#261; LOWER
 &#262; UPPER
 &#263; LOWER
 &#264; UPPER
 &#265; LOWER
 &#266; UPPER
 &#267; LOWER
 &#268; UPPER
 &#269; LOWER
 &#270; UPPER
 &#271; LOWER
 &#272; UPPER
 &#273; LOWER
 &#274; UPPER
 &#275; LOWER
 &#276; UPPER
 &#277; LOWER
 &#278; UPPER
 &#279; LOWER
 &#280; UPPER
 &#281; LOWER
 &#282; UPPER
 &#283; LOWER
 &#284; UPPER
 &#285; LOWER
 &#286; UPPER
 &#287; LOWER
 &#288; UPPER
 &#289; LOWER
 &#290; UPPER
 &#291; LOWER
 &#292; UPPER
 &#293; LOWER
 &#294; UPPER
 &#295; LOWER
 &#296; UPPER
 &#297; LOWER
 &#298; UPPER
 &#299; LOWER
 &#300; UPPER 


Unicode can't be displayed here but you get the idea.
The results: http://pastebin.com/HP5dzBLF

As you see, isupper and islower operators are not only true for letters/characters. Can work around this with the isletter operator.
Also you can see the regex group only works for a-z and A-Z.
Making a proper regex to take care of all upper or lowercase wouldn't be feasible as they are not grouped together. (as you can see in the end there)

Is there another (fast) way to determine properly between lower and uppercase letters?
Posted By: drum

Re: Upper and lowercase - 05/09/10 04:11 PM

I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line:

Code:
if (%c isupper) { ... }
if (%c === $upper(%c)) { ... }


Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use:

Code:
if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter }
if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter }


I'm not sure if there is a more efficient way, though.
Posted By: moocat

Re: Upper and lowercase - 05/09/10 04:34 PM

Originally Posted By: drum
I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line:

Code:
if (%c isupper) { ... }
if (%c === $upper(%c)) { ... }


Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use:

Code:
if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter }
if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter }


I'm not sure if there is a more efficient way, though.


Yeah, this will work:
Code:
if (%c === $upper(%c)) { ... }


But this wont: (as it will be true for signs too)
Code:
if (%c isupper) { ... }


The problem comes in when you want to count the uppercase letters in a message for example.
Without a regex you'd need to use isupper and islower, and do the message char by char.
For example like this:
Code:
char.upper {
  ; $1- Message
  tokenize 32 $remove($1-, $chr(32))
  var %i = 1, %c = 0
  while (%i <= $len($1-)) {
    if ($mid($1-, %i, 1) === $upper($v1)) { inc %c }
    inc %i
  }
  return %c
}


Now that works, but a regex group is incredibly much faster.
1000 iterations of that on my comp is 1872 ticks, with the upper regex group its 63. (which doesn't include the unicode uppercase letters I need, so it's useless)
Posted By: Wims

Re: Upper and lowercase - 05/09/10 05:12 PM

This thread might help you : http://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238
Posted By: moocat

Re: Upper and lowercase - 05/09/10 05:41 PM

Originally Posted By: Wims
This thread might help you : http://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238


Thanks, that explains the isupper/islower issue.
Either way, doing (%c isupper && $v1 isletter) doesn't give that much difference in speed from just using (%c isupper), so I guess there isn't really a problem there.

However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?
Posted By: Collective

Re: Upper and lowercase - 05/09/10 05:51 PM

Originally Posted By: moocat
However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?

You can enable UTF-8 mode using the (*UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example:

//echo -a $iif($regex($chr(256), /(*UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER)
Posted By: moocat

Re: Upper and lowercase - 05/09/10 06:07 PM

Originally Posted By: Collective
Originally Posted By: moocat
However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?

You can enable UTF-8 mode using the (*UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example:

//echo -a $iif($regex($chr(256), /(*UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER)


That is truly beautiful. Went from 1825 ticks to 172. Thank you so much good sir laugh

Heres a quick search for the different unicode categories if anyone else needs em: http://www.fileformat.info/info/unicode/category/index.htm
Posted By: jaytea

Re: Upper and lowercase - 05/09/10 07:28 PM

there are a number of peculiar discrepancies between these types of operations in mIRC. unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()! here are a few examples of pairs of seemingly identical checks along with the number of characters for which they have different results:

Code:
if ($chr(N) isupper)
if ($chr(N) === $upper($chr(N)))


45,825 chars, $chr(223) is the first.

Code:
if ($chr(N) islower)
if ($chr(N) === $lower($chr(N)))


45,603 chars, $chr(304) is the first.

these results are mostly accounted for by the 45,533 characters which are neither upper nor lower (according to islower and isupper), the first example being $chr(443).

Code:
if ($chr(N) isalnum)
if ($chr(N) isalpha) || ($chr(N) isnum)


303 characters, $chr(178) is the first.

on the plus side: the following are, rather unremarkably, pairs of equivalent checks:

Code:
if ($chr(N) isupper)
if ($isupper($chr(N)))

if ($chr(N) islower)
if ($islower($chr(N)))

if ($chr(N) isalpha)
if ($chr(N) isletter)
Posted By: drum

Re: Upper and lowercase - 06/09/10 02:25 AM

Originally Posted By: jaytea
unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()!


Thanks for pointing that out. My error was in skimming the help file and misreading what it said. Still I probably should have tested it first before stating it. wink
Posted By: argv0

Re: Upper and lowercase - 06/09/10 10:04 PM

I'm confused about the direction of this conversation.

Is the consensus that the is* (islower, isupper, etc) operators should be updated to support unicode characters? Or are we saying this is not a bug and just the "Way It Works"(tm)?

Fixing the operators to support Unicode would be my suggestion, but nobody has really stated what the solution should be-- I'm only seeing descriptions of the problem.
Posted By: drum

Re: Upper and lowercase - 07/09/10 01:31 PM

There does appear to be a quirk where $upper() will not correctly replace a lowercase letter with its uppercase equivalent. The example that jaytea gave was $chr(223) which is a German lowercase character (). However, this link explains what is going on, and why it probably shouldn't be considered an mIRC bug (but rather a limitation with Microsoft's Unicode routines):

http://blogs.msdn.com/b/michkap/archive/2005/04/10/406880.aspx

Also to clarify, mIRC's case routines do support Unicode already, it's just that there are quirks like this one. The reason the OP didn't want to use mIRC's routines was because it was inefficient at counting the number of uppercase/lowercase characters in a given string compared to regex.
© 2021 mIRC Discussion Forums