Post Reply 
Obscured code on web pages
Sep. 13, 2005, 05:15 AM
Post: #16
 
That scrambled code business has been bugging me. There could be a potential for nastiness which will escape filtering simply because the filtering program cannot read the code. There must be a way to remove scrambled webpage elements with filters. Kye-U's original idea was the best one: remove code if it is scrambled. How would I get Prox to recognize code that has been scrambled?
Add Thank You Quote this message in a reply
Sep. 13, 2005, 06:45 AM
Post: #17
 
Well, there are two variations of this encryption method, and the link you posted uses the easy one. It hides its decryption key inside an "unescape()" string.

My config is usually blocking this type of code, as long as the escaped string contains certain keywords, ".charAt" in this case.

Have a look at "Block: Specific JS Code - Escaped". The general idea is to scan for escaped strings inside scripts, unescape them, and look at the content, being aware that chances that something is fishy are higher than in normal code (although "unescape" is often used in good scripts as well).


Then there is the direct method. Sometimes it uses a "fromCharCode(....charCodeAt...)" sequence, that's easy too, and i didn't see any false hits while blocking it. But mostly the decryption key goes like:
Code:
function wsp_ne(u){var Alpha = "Az0By1Cx2Dw3Ev4Fu5Gt6Hs7Ir8Jq9KpLoMnNmOkPjQlRiShTgUfVeWdXcYbZa";var ALen = Alpha.length-1;var C = "";var I=0;for(I=0;I
<u.length;I++){L = u.charAt(I);pos = Alpha.indexOf(L);if(pos == -1) C += L;else C += Alpha.charAt(ALen - pos);}return C;}
That's tricky. I'm trying to block it in my private config. Currently i use this match:
Code:
.char(Code|)At \( [a-z_][a-z0-9_.]+ [+-] [a-z_]*\)
When looking at my log, i see 15 hits. Only two are true matches, the rest are false positives (incl. Google and major newssites). No Expression

sidki
Add Thank You Quote this message in a reply
Sep. 14, 2005, 06:25 AM
Post: #18
 
Sidki,
Quote:When looking at my log, i see 15 hits. Only two are true matches, the rest are false positives (incl. Google and major newssites).
Maybe I should think about adding a $CONFIRM if I do decide to use such a filter, to give a warning but give me leeway to allow the code through or not.
Add Thank You Quote this message in a reply
Sep. 25, 2005, 10:19 PM
Post: #19
 
I have found a solution to the popup test, using Kye-U's charAt filter.
With a small modification, it now prevents the unscrambling of the scrambled
code, when used in conjunction with a hexadecimal to ASCII filter. The hex
to ASCII filter should be the first filter Prox uses. Here is the solution:
Code:
Name = "Hexadecimal to ASCII"
Active = TRUE
Multi = TRUE
URL = "$IHDR(Content-Type: (*(html|xml)*))"
Bounds = "<(a|img|image|iframe|input|script)\9\s*</\9>"
Limit = 900
Match = "(*%[2-7][0-9a-f]*)\1"
Replace = "$UESC(\1)"


Name = "Javascript "charAt" Remover"
Active = TRUE
URL = "($TYPE(htm)|$TYPE(js))"
Limit = 128
Match = "(\w.|)charAt\(\w\)(\)|)(\+|\-)[#1-9]\)"
        "|(\w.|)charAt\(\w(\+|\-)[#1-9]\)(\)|)"
        "$SET(\9=Adding a number to an "encoded" set of characters can often lead to the"
        "download/installation of unauthorized applications/scripts to your hard-drive.)"
Replace = "Shonenscape"
Add Thank You Quote this message in a reply
Sep. 26, 2005, 08:48 AM
Post: #20
 
Siamesecat;

There's a discussion going on right now over on the Yahoo list, about this very topic. JJoebugg included this filter, from JD5000's set:

[Patterns]
Name = "Convert: BASE16 to ASCII v.2.2 {7.d}"
Active = FALSE
Multi = TRUE
URL = "$TYPE(htm)(^$LST(Bypass-Con))"
Bounds = "<a\s[^>]++href=*>"
Limit = 768
Match = "(*%??*)\1&(^*%(3C|20|22|5B|5D)*)"
Replace = "$UESC(\1)"

Essentially, it will convert hex to ascii. While jjoe is recommending this particular thread to the list's readers, he also points out not only JD's filter set, but that sidki3003 has more of this kind of thing in his config set, too.

Just to pass the time, you understand. Crazy


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Sep. 28, 2005, 03:18 AM
Post: #21
 
Siamesecat's post was accidentally deleted! Good thing I quoted it, so that this reply makes sense. Sorry 'bout that. O.


Siamesecat;
Quote:Oddysey,
Code:
Bounds = "<a\s[^>]++href=*>"
Limit = 768
Match = "(*%??*)\1&(^*%(3C|20|22|5B|5D)*)"
Why restrict the unescape of ASCII code to only anchors? I had to add script to the list of tags for that filter I used in order to make it work on that page. The unscrambling code was a Javascript.
Why exclude "<", space, quotes, and brackets?
I believe that the original construct was aimed only at links. Of course, a URL can be reached in other ways, so I assume that the original author 'just figgered' that users could modify at will, and didn't put the finishing touches on the basic working filter.

And..... The \s catches the initial space of course, and I don't recall that an anchor tag would work if there weren't a space after the <a, so that particular test should never fail. I think the double plus signs will catch everything else you mentioned. Whether or not the quote marks remain in place, and useful in the correct context, I don't know - never had one fail or go sideways on me. Wink


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Sep. 28, 2005, 06:15 AM
Post: #22
 
Oddysey,
Look at the match. It excludes the codes for "<" (3C), space (20), quotes (22), "[" (5B), and "]" (5D).
Those would not be translated into ASCII characters. Why??
Add Thank You Quote this message in a reply
Sep. 29, 2005, 03:26 AM
Post: #23
 
Siamesecat;
Siamesecat Wrote:Oddysey,
Look at the match. It excludes the codes for "<" (3C), space (20), quotes (22), "[" (5B), and "]" (5D).
Those would not be translated into ASCII characters. Why??
Well, obviously I didn't go quite far enough, did I? <grrrr>

I think that, because this particular filter was designed to work on URL addresses, if you converted those particular characters, you run the risk that the address might be converted to something that wouldn't work! I can imagine this happening easily for someone who isn't using the (pretty much standard) Western Europe encoding set. Shock

Bear in mind that I've not personally tested this, nor have I conversed with anyone who has. Shock So..... Your Mileage May Vary! Crazy


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: