|
moocat
|
moocat
|
Determining upper and lowercase chars is not functioning properly in 7.1 Lets take a code example: /test {
var %i = 1
while (%i <= 300) {
var %c = $chr(%i)
echo -a > %c $iif(%c isupper, UPPER) $iif(%c islower, LOWER) $iif($regex(%c, /[[:upper:]]/g), REG_UPPER) $iif($regex(%c, /[[:lower:]]/g), REG_LOWER)
inc %i
}
}
Results:
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
UPPER LOWER
! UPPER LOWER
" UPPER LOWER
# UPPER LOWER
$ UPPER LOWER
% UPPER LOWER
& UPPER LOWER
' UPPER LOWER
( UPPER LOWER
) UPPER LOWER
* UPPER LOWER
+ UPPER LOWER
, UPPER LOWER
- UPPER LOWER
. UPPER LOWER
/ UPPER LOWER
0 UPPER LOWER
1 UPPER LOWER
2 UPPER LOWER
3 UPPER LOWER
4 UPPER LOWER
5 UPPER LOWER
6 UPPER LOWER
7 UPPER LOWER
8 UPPER LOWER
9 UPPER LOWER
: UPPER LOWER
; UPPER LOWER
< UPPER LOWER
= UPPER LOWER
> UPPER LOWER
? UPPER LOWER
@ UPPER LOWER
A UPPER REG_UPPER
B UPPER REG_UPPER
C UPPER REG_UPPER
D UPPER REG_UPPER
E UPPER REG_UPPER
F UPPER REG_UPPER
G UPPER REG_UPPER
H UPPER REG_UPPER
I UPPER REG_UPPER
J UPPER REG_UPPER
K UPPER REG_UPPER
L UPPER REG_UPPER
M UPPER REG_UPPER
N UPPER REG_UPPER
O UPPER REG_UPPER
P UPPER REG_UPPER
Q UPPER REG_UPPER
R UPPER REG_UPPER
S UPPER REG_UPPER
T UPPER REG_UPPER
U UPPER REG_UPPER
V UPPER REG_UPPER
W UPPER REG_UPPER
X UPPER REG_UPPER
Y UPPER REG_UPPER
Z UPPER REG_UPPER
[ UPPER LOWER
\ UPPER LOWER
] UPPER LOWER
^ UPPER LOWER
_ UPPER LOWER
` UPPER LOWER
a LOWER REG_LOWER
b LOWER REG_LOWER
c LOWER REG_LOWER
d LOWER REG_LOWER
e LOWER REG_LOWER
f LOWER REG_LOWER
g LOWER REG_LOWER
h LOWER REG_LOWER
i LOWER REG_LOWER
j LOWER REG_LOWER
k LOWER REG_LOWER
l LOWER REG_LOWER
m LOWER REG_LOWER
n LOWER REG_LOWER
o LOWER REG_LOWER
p LOWER REG_LOWER
q LOWER REG_LOWER
r LOWER REG_LOWER
s LOWER REG_LOWER
t LOWER REG_LOWER
u LOWER REG_LOWER
v LOWER REG_LOWER
w LOWER REG_LOWER
x LOWER REG_LOWER
y LOWER REG_LOWER
z LOWER REG_LOWER
{ UPPER LOWER
| UPPER LOWER
} UPPER LOWER
~ UPPER LOWER
UPPER LOWER
€ UPPER LOWER
UPPER LOWER
‚ UPPER LOWER
ƒ UPPER LOWER
„ UPPER LOWER
… UPPER LOWER
† UPPER LOWER
‡ UPPER LOWER
ˆ UPPER LOWER
‰ UPPER LOWER
Š UPPER LOWER
‹ UPPER LOWER
Œ UPPER LOWER
UPPER LOWER
Ž UPPER LOWER
UPPER LOWER
UPPER LOWER
‘ UPPER LOWER
’ UPPER LOWER
“ UPPER LOWER
” UPPER LOWER
• UPPER LOWER
– UPPER LOWER
— UPPER LOWER
˜ UPPER LOWER
™ UPPER LOWER
š UPPER LOWER
› UPPER LOWER
œ UPPER LOWER
UPPER LOWER
ž UPPER LOWER
Ÿ UPPER LOWER
UPPER LOWER
¡ UPPER LOWER
¢ UPPER LOWER
£ UPPER LOWER
¤ UPPER LOWER
¥ UPPER LOWER
¦ UPPER LOWER
§ UPPER LOWER
¨ UPPER LOWER
© UPPER LOWER
ª UPPER LOWER
« UPPER LOWER
¬ UPPER LOWER
UPPER LOWER
® UPPER LOWER
¯ UPPER LOWER
° UPPER LOWER
± UPPER LOWER
² UPPER LOWER
³ UPPER LOWER
´ UPPER LOWER
µ UPPER LOWER
¶ UPPER LOWER
· UPPER LOWER
¸ UPPER LOWER
¹ UPPER LOWER
º UPPER LOWER
» UPPER LOWER
¼ UPPER LOWER
½ UPPER LOWER
¾ UPPER LOWER
¿ UPPER LOWER
À UPPER
Á UPPER
 UPPER
à UPPER
Ä UPPER
Å UPPER
Æ UPPER
Ç UPPER
È UPPER
É UPPER
Ê UPPER
Ë UPPER
Ì UPPER
Í UPPER
Î UPPER
Ï UPPER
Ð UPPER
Ñ UPPER
Ò UPPER
Ó UPPER
Ô UPPER
Õ UPPER
Ö UPPER
× UPPER LOWER
Ø UPPER
Ù UPPER
Ú UPPER
Û UPPER
Ü UPPER
Ý UPPER
Þ UPPER
ß LOWER
à LOWER
á LOWER
â LOWER
ã LOWER
ä LOWER
å LOWER
æ LOWER
ç LOWER
è LOWER
é LOWER
ê LOWER
ë LOWER
ì LOWER
í LOWER
î LOWER
ï LOWER
ð LOWER
ñ LOWER
ò LOWER
ó LOWER
ô LOWER
õ LOWER
ö LOWER
÷ UPPER LOWER
ø LOWER
ù LOWER
ú LOWER
û LOWER
ü LOWER
ý LOWER
þ LOWER
ÿ LOWER
Ā UPPER
ā LOWER
Ă UPPER
ă LOWER
Ą UPPER
ą LOWER
Ć UPPER
ć LOWER
Ĉ UPPER
ĉ LOWER
Ċ UPPER
ċ LOWER
Č UPPER
č LOWER
Ď UPPER
ď LOWER
Đ UPPER
đ LOWER
Ē UPPER
ē LOWER
Ĕ UPPER
ĕ LOWER
Ė UPPER
ė LOWER
Ę UPPER
ę LOWER
Ě UPPER
ě LOWER
Ĝ UPPER
ĝ LOWER
Ğ UPPER
ğ LOWER
Ġ UPPER
ġ LOWER
Ģ UPPER
ģ LOWER
Ĥ UPPER
ĥ LOWER
Ħ UPPER
ħ LOWER
Ĩ UPPER
ĩ LOWER
Ī UPPER
ī LOWER
Ĭ UPPER
Unicode can't be displayed here but you get the idea. The results: http://pastebin.com/HP5dzBLFAs you see, isupper and islower operators are not only true for letters/characters. Can work around this with the isletter operator. Also you can see the regex group only works for a-z and A-Z. Making a proper regex to take care of all upper or lowercase wouldn't be feasible as they are not grouped together. (as you can see in the end there) Is there another (fast) way to determine properly between lower and uppercase letters?
|
|
|
|
Joined: Dec 2002
Posts: 294
Pan-dimensional mouse
|
Pan-dimensional mouse
Joined: Dec 2002
Posts: 294 |
I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line: if (%c isupper) { ... }
if (%c === $upper(%c)) { ... } Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use: if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter }
if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter }
I'm not sure if there is a more efficient way, though.
|
|
|
|
moocat
|
moocat
|
I don't think there is any bug here. mIRC just defines "isupper" and "islower" differently than you were expecting. In particular, the following two lines will always give the same result, and you can think of the first line just being a shortcut for the second line: if (%c isupper) { ... }
if (%c === $upper(%c)) { ... } Assuming you want the same behavior you get with regex, and if you are working with a variable %c that contains a single character, you can use: if ($asc(%c) isnum 65-90) { echo -a %c is an uppercase letter }
if ($asc(%c) isnum 97-122) { echo -a %c is a lowercase letter }
I'm not sure if there is a more efficient way, though. Yeah, this will work:
if (%c === $upper(%c)) { ... }
But this wont: (as it will be true for signs too) The problem comes in when you want to count the uppercase letters in a message for example. Without a regex you'd need to use isupper and islower, and do the message char by char. For example like this:
char.upper {
; $1- Message
tokenize 32 $remove($1-, $chr(32))
var %i = 1, %c = 0
while (%i <= $len($1-)) {
if ($mid($1-, %i, 1) === $upper($v1)) { inc %c }
inc %i
}
return %c
}
Now that works, but a regex group is incredibly much faster. 1000 iterations of that on my comp is 1872 ticks, with the upper regex group its 63. (which doesn't include the unicode uppercase letters I need, so it's useless)
|
|
|
|
Joined: Jul 2006
Posts: 4,020
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,020 |
This thread might help you : https://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
moocat
|
moocat
|
This thread might help you : https://forums.mirc.com/ubbthreads.php?ubb=showflat&Board=8&Number=214238&Searchpage=1&Main=39805&Words=%2Bupper+%2Blower&topic=0&Search=true#Post214238 Thanks, that explains the isupper/islower issue. Either way, doing (%c isupper && $v1 isletter) doesn't give that much difference in speed from just using (%c isupper), so I guess there isn't really a problem there. However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported?
|
|
|
|
Joined: Dec 2002
Posts: 3,015
Hoopy frood
|
Hoopy frood
Joined: Dec 2002
Posts: 3,015 |
However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported? You can enable UTF-8 mode using the (*UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example: //echo -a $iif($regex($chr(256), /(*UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER)
|
|
|
|
moocat
|
moocat
|
However my problem with the regex still stands as it does not recognize unicode and looping through chars is too inefficient. Is there a way around this or is it simply not supported? You can enable UTF-8 mode using the (*UTF8) sequence. To match upper and lowercase characters use \p{Lu} and \p{Ll} respectively, for example: //echo -a $iif($regex($chr(256), /(*UTF8)\p{Lu}/g), REG_UPPER) $iif($regex($chr(256), /(*UTF8)\p{Ll}/g), REG_LOWER) That is truly beautiful. Went from 1825 ticks to 172. Thank you so much good sir  Heres a quick search for the different unicode categories if anyone else needs em: http://www.fileformat.info/info/unicode/category/index.htm
|
|
|
|
Joined: Feb 2006
Posts: 523
Fjord artisan
|
Fjord artisan
Joined: Feb 2006
Posts: 523 |
there are a number of peculiar discrepancies between these types of operations in mIRC. unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()! here are a few examples of pairs of seemingly identical checks along with the number of characters for which they have different results:
if ($chr(N) isupper)
if ($chr(N) === $upper($chr(N)))
45,825 chars, $chr(223) is the first.
if ($chr(N) islower)
if ($chr(N) === $lower($chr(N)))
45,603 chars, $chr(304) is the first. these results are mostly accounted for by the 45,533 characters which are neither upper nor lower (according to islower and isupper), the first example being $chr(443).
if ($chr(N) isalnum)
if ($chr(N) isalpha) || ($chr(N) isnum)
303 characters, $chr(178) is the first. on the plus side: the following are, rather unremarkably, pairs of equivalent checks:
if ($chr(N) isupper)
if ($isupper($chr(N)))
if ($chr(N) islower)
if ($islower($chr(N)))
if ($chr(N) isalpha)
if ($chr(N) isletter)
"The only excuse for making a useless script is that one admires it intensely" - Oscar Wilde
|
|
|
|
Joined: Dec 2002
Posts: 294
Pan-dimensional mouse
|
Pan-dimensional mouse
Joined: Dec 2002
Posts: 294 |
unfortunately the identity suggested by drum doesn't hold true for over half of the range of characters supported by $chr()! Thanks for pointing that out. My error was in skimming the help file and misreading what it said. Still I probably should have tested it first before stating it. 
|
|
|
|
Joined: Oct 2003
Posts: 3,641
Hoopy frood
|
Hoopy frood
Joined: Oct 2003
Posts: 3,641 |
I'm confused about the direction of this conversation.
Is the consensus that the is* (islower, isupper, etc) operators should be updated to support unicode characters? Or are we saying this is not a bug and just the "Way It Works"(tm)?
Fixing the operators to support Unicode would be my suggestion, but nobody has really stated what the solution should be-- I'm only seeing descriptions of the problem.
|
|
|
|
Joined: Dec 2002
Posts: 294
Pan-dimensional mouse
|
Pan-dimensional mouse
Joined: Dec 2002
Posts: 294 |
There does appear to be a quirk where $upper() will not correctly replace a lowercase letter with its uppercase equivalent. The example that jaytea gave was $chr(223) which is a German lowercase character (ß). However, this link explains what is going on, and why it probably shouldn't be considered an mIRC bug (but rather a limitation with Microsoft's Unicode routines): http://blogs.msdn.com/b/michkap/archive/2005/04/10/406880.aspxAlso to clarify, mIRC's case routines do support Unicode already, it's just that there are quirks like this one. The reason the OP didn't want to use mIRC's routines was because it was inefficient at counting the number of uppercase/lowercase characters in a given string compared to regex.
Last edited by drum; 07/09/10 01:53 PM.
|
|
|
|
|