Post Reply 
Difficulties filtering XML
Aug. 10, 2004, 11:14 PM
Post: #1
 
I'm trying to filter data that is passed in an XML stream through IE. For some reason Proxomitron doesn't appear to filter XML. Is there a way to have it parse XML as well?

Thanks,

Korax
Add Thank You Quote this message in a reply
Aug. 11, 2004, 01:44 AM
Post: #2
 
Korax;

First, Welcome to the Un-Official Proxomitron Forum! Glad to have ya here, and we hope you find the forum to be useful. Smile!

Quote:For some reason Proxomitron doesn't appear to filter XML. Is there a way to have it parse XML as well?
The Proxomitron (or, as we call it around here, Proxo) is so simple in nature that it doesn't discriminate between XML and HTML, nor even javascript, CSS styles, etc. Proxo is only looking for strings of text, and when a match is found, the string is either modified, deleted, encapsulated with more text, or even just left alone, if it was used merely to trigger some other action. Generically speaking, Proxo has no limits - all text is fair game for matching. (Unless the Bypass mode has been invoked, of course.)

In order to help you further, we'll need to ask you to provide an example of what seems to be not working. You can do this either by pasting the code directly into your reply, or by attaching a file to it. If your code is lengthy, we ask that you use the attachment method, please. That way, we don't scroll forever when navigating upwards or downwards through a thread. :P

BTW, how did you hear about us, if you don't mind my asking? TIA.


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Aug. 11, 2004, 07:50 AM
Post: #3
 
Quote:Is there a way to have it parse XML as well?
I use a header filter to force filtering of several types of files, using the extension to determine whether to filter. Here is the filter I have:
Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: Filter *.js/*.vbs/*.xml/*.xsl/*.xhtml (in)"
URL = "*.(js|jse|vbs|vbe|x(htm|m|s)l)"
Match = "\0"
Replace = "\0$FILTER(True)"
Add Thank You Quote this message in a reply
Aug. 11, 2004, 04:26 PM
Post: #4
 
Siamesecat;
Quote:[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: Filter *.js/*.vbs/*.xml/*.Deadl/*.xhtml (in)"
URL = "*.(js|jse|vbs|vbe|x(htm|m|s)l)"
Match = "\0"
Replace = "\0$FILTER(True)"
OK, I'm feeling particularly dense this morning, so I'll bite. Why don't I see any difference between what this filter does, and Proxo's normal default behavior? I base my question on the Help file entry for $FILTER, which says in relevant part:
Quote:$FILTER
.....
Normally only specific types are filtered (like text/html, text/css, image/gif, etc).
Scott did not mean for his list to be exhaustive, only illustrative. I have always worked on the hypothesis that all text types were filtered, and so far (knock on wood), I haven't had any problems. I took this explanation to mean that one could force Proxo to pay attention to non-textual types, and manipulate filter behavior as one desired. One example would be mime-types.

This isn't to say that your filter isn't worthwhile or anything, far from it. I'm just wondering how it helps the situation in which Korax finds himself, that's all. And just for drill, how come you chose not to filter CSS styles? Inquiring mnds wanna know! Big Teeth


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Aug. 11, 2004, 04:40 PM
Post: #5
 
Siamesecat;

Forgive me for bugging you again, but I felt something gnawing at me as I wrote that last post. It finally dawned on me......

In your Replace string, you use a "\0". According to the Help file, that's superflous. You're not concerned with any matching at all, you only want to force the web filters to work on all the listed file types, no matter what their content might be. According to Scott's example, it would work equally well to eliminate the Match component entirely, and simply use the "$FILTER(True)" statement in the Replace string.

I don't think this would speed things up more than a nanosecond or so, but it would be more elegant to make Proxo do the least amount of work in order to get the job done. Wink


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Aug. 11, 2004, 06:24 PM
Post: #6
 
Hi all,

Here's the filter that I use for filtering xml:

Code:
[HTTP headers]
In = FALSE
Out = FALSE
Key = "Content-Type: 5. Filter XML (in)"
Match = "((text/xml|application/xml)*)\0"
Replace = "\0$FILTER(true)"


HTH
Mike
Add Thank You Quote this message in a reply
Aug. 11, 2004, 06:36 PM
Post: #7
 
z12 Wrote:Here's the filter that I use for filtering xml:
Similar to what I used to use as well. I posted a topic on XML on Arne's board quite a while
back when I was using Proxo to filter SOAP messages. (I was going to point to it,
but Arne's site is giving an ODBC error right now)

I used "(*/xml)" to match content-type, can't remember if there was another
xml content type other than "text" and "application" that I needed to match on.

text/plain is not filtered by default either, i have a rule for enabling filtering for it
as well, although I can't remember why I needed it.
Add Thank You Quote this message in a reply
Aug. 11, 2004, 08:17 PM
Post: #8
 
(goes off muttering about spending the afternoon making up a bunch of tests, just to see if he's been out of step with everyone else.....) (again.)

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Aug. 12, 2004, 05:50 AM
Post: #9
 
Oddysey,
I thought that Prox filtered text/html unless you tell it to filter something else, either with a $TYPE() command in a web filter or $FILTER(true) in a header filter. I got that filter from someone else and modified it slightly.
Quote:how come you chose not to filter CSS styles?
If I think that stylesheets need to have something changed, I use $TYPE(css) because it is a web filter that I am using.
Quote:According to Scott's example, it would work equally well to eliminate the Match component entirely, and simply use the "$FILTER(True)" statement in the Replace string.
It just seemed to me to be the best way to handle the problem. I wanted to make sure that the file type, whatever it was, was going to be filtered. I probably got the idea from a filter from someone else that did something similar.

pooms,
Quote:text/plain is not filtered by default either, i have a rule for enabling filtering for it
as well, although I can't remember why I needed it.
If you use Internet Explorer, you might well want to filter plain text, since a file can have the content-type (and extension) stated as plain text, but if that file contains HTML tags, IE will render it as HTML.
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: