|
Joined: Jan 2021
Posts: 32
Ameglian cow
|
OP
Ameglian cow
Joined: Jan 2021
Posts: 32 |
Hello and welcome to my tutorial how to be able to grab the title of a YouTube link. Either by typing //youtube youtubelink or by sitting in a channel when someone writes out a YouTube link. For me this is a nice thing. Instead of be surprised each time you click on a link to YouTube that someone have wrote. You are going to be noticed which title the YouTube link have. Who knows, maybe a rick roll? So to get this to work we need to use the YouTube API v3. Before was it easy to just read through YouTube site. But this is much harder now. The reason we should use YouTube API v3. All you need is a G-Mail Account, so we can get the API key. Sure Google will log everything you do when using the API. So just use a new or old G-Mail account that you just use for sites you don't care so much about. Let’s start. Login to G-Mail, then visit https://console.developers.google.com/. Now let’s generate the API key. Click on Credentials at left. Then click on + CREATE CREDENTIALS in the middle of the screen in the top. Now select API key. Now should a popup come up on your screen. Since we don't want to use the API key for more than YouTube, select RESTRICT KEY. Now you should be redirected to a new page. Go down to API restrictions and select Restrict key. Now click on Select APIs, and select YouTube Data API v3. And click on OK. You should now have below the screen Selected API’s, YouTube Data API v3. That means that we can now use the YouTube API only. Now you are finished. Click on SAVE. You should now be redirected to the Credentials site. And look we have API key 1. Copy the key from the key column and we can use it in our Script. But remember, don't share your key with other people. Let them make their own API Keys. So now when we have the API Key, we can insert it into the YouTube Script.
on *:text:*youtu*:#: {
/youtube $1-
}
alias youtube {
set %apikey YourAPIKey
if ($regex($1-,(https:\/\/www\.youtube\.com\/watch\?v=[0-9a-zA-Z-_]{8,20}))) {
var %x1 $regex($regml(1),([0-9a-zA-Z-_]{8,20}))
set %id $regml(1)
}
else if ($regex($1-,(https:\/\/youtu\.be\/[0-9a-zA-Z-_]{8,20}))) {
var %x1 $regex($regml(1),([0-9a-zA-Z-_]{8,20}))
set %id $regml(1)
}
sockopen -e ytube www.googleapis.com 443
}
on *:sockopen:ytube:{
if (!$sockerr) {
sockwrite -n $sockname GET $+(/youtube/v3/videos?id=,%id,&key=,%apikey,&part=snippet) HTTP/1.0
sockwrite -n $sockname Host: www.googleapis.com
sockwrite -n $sockname $crlf
}
else {
echo -a $sockerr
sockclose $sockname
halt
}
}
on *:sockread:ytube:{
if ($sockerr) { echo -a $sockerr | sockclose $sockname | return }
else {
var %check | sockread %check
while ($sockbr) {
if ("title": isin %check) {
echo -a $+($chr(3),76,$chr(2),$remtok($remove($remove(%check,title),"),2-,58),$chr(2),$chr(3))
}
sockread %check
}
}
sockclose $sockname
}
I hope someone could have use of this Script. Or maybe someone that want to pick out the code I have been written. It was a long time ago I coded in MSL.
|
|
|
|
Joined: Dec 2002
Posts: 252
Fjord artisan
|
Fjord artisan
Joined: Dec 2002
Posts: 252 |
Most websites (YouTube included) usually have this information within the title or meta tags about what the content contains, With the additions of $urlget() and the addition of being able to perform regex matches on binary variables we can quickly rough search the entire webpage! Previously using sockets, we would have to back-log it ourselves, because the chunked socket data may not fully contain our search query. For instance, looking for title, you might have "...<ti" in one chunk and "tle>..." within the next chunk, not to mention all the other complexities of http/1.1, like compression, chunked data with markers of how much to read, etc... $urlget() takes care of all this for you! And regex on a binvar allows us to peek at all the data entirely without the $maxlenl line-size limitations we might encounter otherwise, especially with JavaScript being everywhere and minified to be a HUGE one-liner to save space. I didn't make this example specific to YouTube, but it will effectively achieve the same thing for any website, as it appears you're only searching for title data anyways and requires no 3rd party API's. I opted to also look for meta description tags as well for a bit more information. This is not perfect for sure, but just meant to be a quick and dirty example of how one might go about scraping content from modern websites these days. Examples: /websitedatatester https://www.youtube.com/watch?v=KxwUy2S2n-Q1: Title: Summer - Bensound | Royalty Free Music - No Copyright Music - YouTube 2: Description: Music /websitedatatester https://www.mirc.com1: Title: mIRC: Internet Relay Chat client 2: Description: mIRC: Internet Relay Chat client /websitedatatester https://www.microsoft.com1: Title: Microsoft – Cloud, Computers, Apps & Gaming 2: Description: Explore Microsoft products and services for your home or business. Shop Surface, Microsoft 365, Xbox, Windows, Azure, and more. Find downloads and get support. And lastly, Here is the code:
alias WebsiteDataTester { noop $urlget($1-,gb,& $+ $ticks,ScrapeWebsiteData) }
alias ScrapeWebsiteData {
var %b = $urlget($1).target
if ($bfind(%b,1,/<title>(.*)<\/title>/i,Title).regex) { var %title = $regml(Title,1) }
if ($bfind(%b,1,/<meta name="description".*content="([^"]+)"(?:[^>]+)?>/i,Desc).regex) { var %desc = $regml(Desc,1) }
echo -a 1: Title: %title
echo -a 2: Description: %desc
}
|
|
|
|
Joined: Jan 2004
Posts: 2,127
Hoopy frood
|
Hoopy frood
Joined: Jan 2004
Posts: 2,127 |
Just something minor to add to this post, about protecting yourself from being exploited. If all you're doing with this is echoing the Title and Description to yourself, there's no problem. The exploit happens ONLY when you sent the string to the IRC server, because of the way the IRC server handles messages. If they receive a message which contains an embedded line ending (either $cr or $lf or $crlf), they treat the following text as if it's another server command. If it's impossible for the html strings for title and description to contain $cr and/or $lf, then this warning doesn't apply to this specific case, but it DOES apply to the mp3 tags I mention below. If you're typing in the editbox for channel named #test, and you type without quotes "Title goes here", it's the same as your client issuing the command: /raw PRIVMSG #test :Title goes here However, since you're sending a string scraped from a website created by an unknown person, there's the possibility that someone sharing a link in channel has done it for a page that they can control, and they can insert a $cr into a string that you'll be sending to the server. Assuming it's possible to create a webpage title consisting of Title goes Here $cr mode #test +o maroon ... and your script sends a channel chat message containing the %title string, the eqivalent of this happens: //raw PRIVMSG #test :Title goes Here $cr mode #test +o maroon The message everyone in channel sees will consist only of the portion preceding the $cr (or $lf or $crlf), and the remaining portion will be executed by the server as if it's a separate command with your nick's current level of privileges. And in this case will give @op to whoever is using the maroon nick, but only if you have @op status or better which causes the server to accept it as a valid command. A valid server command could be anything from giving status to someone, kicking someone, or telling nickserv to drop YOUR account. It would be especially dangerous for someone who has IRCOP privileges, since there's another raft of powerful commands which become valid for them. Note that the 2nd command is not an exploit against mIRC or any client, it's just how servers are designed to handle 2 commands in the same string. It cannot be used to get someone to execute code on their computer, so it can't launch something using /run or delete something using /remove etc. The defense is simple, you can either strip the portion of the string following the line-ending character, or show the entire string with the cr/lf removed or replaced by spaces. I prefer the latter, because that allows everyone in channel to see the attempted trickery, rather than the threat being silently ignored in this case while still leaving others in channel possibly vulnerable later on. For this script, these 2 commands:
var %title = $regml(Title,1)
var %desc = $regml(Desc,1)
... would be replaced by:
var %title = $replace($regml(Title,1),$cr,$chr(32),$lf,$chr(32))
var %desc = $replace($regml(Desc,1),$cr,$chr(32),$lf,$chr(32))
Since you're sanitizing a &binvar whose name is held in %b, you can ensure all line endings are removed in a single command prior to creating the %title and %desc variables:
var %b = $urlget($1).target
becomes
var %b = $urlget($1).target
breplace %b 10 32 13 32
NOTE: This warning is just as applicable for other strings sent to the server, of any kind. Another area of concern would be if you have a script which sends a 'now playing' message to channel containing the Artist/Album/Title 'tag' info from an .mp3, where someone could create a malicious tag then wait for you to announce to the channel that you're listening to that song. You can use the following aliases to scan the comment string for all mp3's in your $mp3dir and below, and it will show you which, if any, contain $cr or $lf in the 'comment' tag, though they could just as easily be inserted into other fields. It's common for cr/lf to be in a comment field, but it's unlikely that will be used in the 'now playing' message. However, someone shouldn't be putting it into the title, artist, album, etc.
alias testmp3 { noop $findfile($mp3dir,*.mp3,0,testmp3_sub $1-) }
alias -l testmp3_sub {
var %a $sound($1-).comment
if ($regex(foo,%a,[\n\r])) {
noop $regsubex(foo,%a,,,&foo)
echo 4 -a $1- comment contains cr or lf: %a $bvar(&foo,1-)
}
}
|
|
|
|
Joined: Jan 2021
Posts: 32
Ameglian cow
|
OP
Ameglian cow
Joined: Jan 2021
Posts: 32 |
I didn't know about urlget. I was using sockets and ended in cloudflare protection. The reason I made this script. Also thanks to maroon for talking about $cr and $lf. How it can be exploited.
This was what I was writing in the end. I like when other people could write about, how to fix or change my script. Or why not better ways? Now I have been learning a lot from you Talon and Maroon.
|
|
|
|
Joined: Nov 2003
Posts: 101
Vogon poet
|
Vogon poet
Joined: Nov 2003
Posts: 101 |
Currently, I'm using this code but I've added a line or 2 so that I'd like to share the YouTube link with my friends on my IRC server - Currently, I'm echoing to myself for testing purposes but will change to msg to the channel after. alias YouTube { set %YTLink $1 noop $urlget($1-,gb,& $+ $ticks,ScrapeWebsiteData) } alias ScrapeWebsiteData { var %b = $urlget($1).target if ($bfind(%b,1,/<title>(.*)<\/title>/i,Title).regex) { var %title = $regml(Title,1) } if ($bfind(%b,1,/<meta name="description".*content="([^"]+)"(?:[^>]+)?>/i,Desc).regex) { var %desc = $regml(Desc,1) } echo -a Link : %YTLink echo -a 1: Title: %title echo -a 2: Description: %desc } Output shows Link : https://www.y*****e.com/watch?v=JK2rqxLLMRw 1: Title: The Tiger's Apprentice Teaser Trailer (2024) - YouTube 2: Description: Film & Animation It seems to be working, however - how do I avoid such like these i.e. 's also & amp; (as shown in screenshot) or any others in the future? Thanks
|
|
|
|
Joined: Jul 2006
Posts: 4,180
Hoopy frood
|
Hoopy frood
Joined: Jul 2006
Posts: 4,180 |
Use this
alias -l HT2AS { var %A quot amp lt gt nbsp iexcl cent pound curren yen brvbar sect uml copy ordf $& laquo not shy reg macr deg plusmn sup2 sup3 acute micro para middot cedil sup1 $& ordm raquo frac14 frac12 frac34 iquest Agrave Aacute Acirc Atilde Auml Aring AElig $& Ccedil Egrave Eacute Ecirc Euml Igrave Iacute Icirc Iuml ETH Ntilde Ograve Oacute $& Ocirc Otilde Ouml times Oslash Ugrave Uacute Ucirc Uuml Yacute THORN szlig agrave $& aacute acirc atilde auml aring aelig ccedil egrave eacute ecirc euml igrave iacute $& icirc iuml eth ntilde ograve oacute ocirc otilde ouml divide oslash ugrave uacute $& ucirc uuml yacute thorn yuml trade var %B 34 38 60 62 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 $& 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 $& 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 $& 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 $& 243 244 245 246 247 248 249 250 251 252 253 254 255 153 return $chr($gettok(%B,$findtokcs(%a,$1,32),32)) } alias -l html2ascii { var %r /&(.{2,6});/Ug return $regsubex($1-,%r, $iif(#* iswm \t, $chr($mid(\t,2) ), $HT2AS(\t) )) }
echo -a 1: Title: $html2ascii(%title)
An unicode version of this exists, but it's a huge script and it's rare to get unicode html entities so I'm giving you ascii only.
#mircscripting @ irc.swiftirc.net == the best mIRC help channel
|
|
|
|
|