The Un-Official Proxomitron Forum

Full Version: document.write(unescape
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi,

More of the sportstream serving sites (like http://www.fromsport.com for example) make heavy use of all kinds of tricks to hide their (sometimes not so nice) code, like countless document.write(unescape javascripts.

Should it be hard, or anyone willing to give me a little push on how to create a filter that will still allow the unescape script, but adds a comment after it containing the decoded (unescape) source of the javascript above ?

Something like this :
Code:
<script language='Javascript'>document.write(unescape('%54%6f%72%6f%6e%74%6f%20%4d%61%70%6c%65%20%4c%65%61%66%73%20%76%73%20%57%61%73%68%69%6e%67%74%6f%6e%20%43%61%70%69%74%61%6c%73'));</script>
<!--
Unescape script above reads :
Toronto Maple Leafs vs Washington Capitals
//-->
D'oh!
Code:
[Patterns]
Name = "New HTML filter"
Active = FALSE
Bounds = "<script*</script>"
Limit = 256
Match = "\0"
        "&*document.write\(unescape\($AV(\1)"
Replace = "\0\r\n"
          "<!--"
          "Unescape script above reads :"
          "$UESC(\1)"
          "//-->"
Thanks JJoe,

It doesn't catch the majority of these things like :
Code:
document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));
but it's a start.Wink
Don't even know if it's possible to catch exactly what it has to, will probably have to take into account that all those functions start with a ( and end with a )
My limited knowledge hasn't got to handle it perfectly yet, but I'll keep trying.

EDIT: Currently I have this, which *seems* to catch all at first sight.Smile!
Code:
[Patterns]
Name = "Log Unescaped javascript code - Tpy - TEST"
Active = TRUE
Multi = TRUE
Bounds = "<script*</script>"
Limit = 4096
Match = "\0"
        "&*document.write\(unescape\("\1"\)"
Replace = "\0\r\n"
          "<!--"
          "Unescape script above reads :\r\n"
          "$UESC(\1)"
          "\r\n//-->\r\n"
... yet I don't know why the unecaped replacement still contains %20 as space:
Code:
<script language='Javascript'>document.write(unescape('%4e%41%53%43%41%52%20%53%70%72%69%6e%74%20%43%75%70'));</script>

<!--Unescape script above reads :
NASCAR%20Sprint%20Cup
//-->
I don't think the Proxomitron's unescape routine does space (%20), tab (\t %09), return (\r %0d), new line (\n %0a), or non-displayable ASCII characters.
http://www.proxomitron.info/45/help/Matc....html#UESC

Code:
[Patterns]
Name = "Log Unescaped javascript code - Tpy - TEST"
Active = TRUE
Multi = TRUE
Bounds = "<script*</script>"
Limit = 4096
Match = "*document.write\(unescape\("(\#%20$SET(\#= ))+\#"\)"
        "&\0"
Replace = "\0\r\n"
          "<!--"
          "Unescape script above reads :\r\n"
          "$UESC(\@)"
          "\r\n//-->\r\n"

Checking for "document.write" first should speed up page load by letting the filter fail quicker.

Sidki's set does have a converter.
"Test Window: Unescape Strings" unescapes most escaped characters in the code added to the Proxomitron's Test Window.
Reference URL's