Post Reply 
Need help with text replacement
Nov. 29, 2005, 03:10 PM
Post: #1
Need help with text replacement
Hi Smile!

I just downloaded Proxomitron a few hours ago and I'm having some problems getting it to work properly. I would like to use the program for filtering some offensive/negative words from web pages, and replacing them with stars '*****' instead. I briefly went through the online help and tried a few of the replace commands, (/w, /s, etc..) but none of them helped. Sad

So far I have managed to get the filtering to work, but it doesn't work like I would like it to. For example, if I tell Proxomitron to replace the word 'kill' with '****' and then load a web page, it would replace all instances of 'kill' with '****' as expected. However, if I come across a web page with the word 'skill' on it, it would show 's****' instead, and all instances of the word 'skilled' would show up as 's****ed'. I obviously don't want words like skill, skilled or skillful to be censored so how do I stop this from happening?
Add Thank You Quote this message in a reply
Nov. 29, 2005, 09:56 PM
Post: #2
 
Code:
[Patterns]
Name = "Filter Offensive Language"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 256
Match = "(?)\1"
        "kill$SET(9=****)"
        "|stupid$SET(9=******)"
        "(?)\2$TST(\1=\s)$TST(\2=\s)"
Replace = " \9 "

This makes it so that it will only match the words included that have a space in front and a space at the end of it.

For example, it won't match:

Quote:skill

But will match:

Quote:A kill is bad

I'm sure there are ways to improve the filter, but I can't think of any other solution right now Smile! Perhaps Sidki can suggest something.

BTW, welcome!

Eyes Closed Smile
Visit this user's website
Add Thank You Quote this message in a reply
Nov. 29, 2005, 10:31 PM
Post: #3
 
Kye-U Wrote:
Code:
[Patterns]
Name = "Filter Offensive Language"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 256
Match = "(?)\1"
        " **** $SET(9=****)"
        "|stupid$SET(9=******)"
        "(?)\2$TST(\1=\s)$TST(\2=\s)"
Replace = " \9 "

This makes it so that it will only match the words included that have a space in front and a space at the end of it.

For example, it won't match:

Quote: ****

But will match:

Quote:A **** is bad

I'm sure there are ways to improve the filter, but I can't think of any other solution right now Smile! Perhaps Sidki can suggest something.

BTW, welcome!

Eyes Closed Smile

Sorry but your filter doesn't work properly, it does the same thing as the filter I created earlier. It still censors the word 'skill', 'skilled', etc.

Thanks for trying though. Smile!
Add Thank You Quote this message in a reply
Nov. 30, 2005, 10:11 AM
Post: #4
 
take a look at http://www.w3.org and search for PICS-Label...

perhaps you can tackle your little censoring project by merely blocking access to all sites whose HTML header contains a PICS-Label with an 'age range' of "Adults Only", "Adult Supervision Recommended", "Older Teens", et cetera... or blocking access to sites containing a PICS-Label with a 'category' of "Sex Violence and Profanity", for example...


would seem to me that filtering a page's entire textual content for various four-letter words, for example, would cause the page to load very, very slowly...
Add Thank You Quote this message in a reply
Nov. 30, 2005, 10:43 AM
Post: #5
 
ProxRocks Wrote:take a look at http://www.w3.org and search for PICS-Label...

perhaps you can tackle your little censoring project by merely blocking access to all sites whose HTML header contains a PICS-Label with an 'age range' of "Adults Only", "Adult Supervision Recommended", "Older Teens", et cetera... or blocking access to sites containing a PICS-Label with a 'category' of "Sex Violence and Profanity", for example...


would seem to me that filtering a page's entire textual content for various four-letter words, for example, would cause the page to load very, very slowly...

I'm not that bothered about blocking web sites with adult content, I just want certain words censored. I suffer from a mental illness, and some negative words TRIGGER severe panic attacks, or trigger negative thoughts/mental chatter in my head. Unfortunately I see these negative words every day when I visit various web sites and forums, so I want them censored in my browser.
Add Thank You Quote this message in a reply
Nov. 30, 2005, 11:26 AM
Post: #6
 
would not writing the filter itself induce said panic attack?
sorry, just messin'...

we have a guy here that goes into convulsions everytime someone says "Republican", "GOP", or "welfare reform"...
ironically, "social security reform" doesn't seem to effect him so adversely...


but anyway, off topic...
Add Thank You Quote this message in a reply
Nov. 30, 2005, 11:42 AM
Post: #7
 
Igraz Wrote:
Kye-U Wrote:
Code:
[Patterns]
Name = "Filter Offensive Language"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 256
Match = "(?)\1"
        " **** $SET(9=****)"
        "|stupid$SET(9=******)"
        "(?)\2$TST(\1=\s)$TST(\2=\s)"
Replace = " \9 "

Sorry but your filter doesn't work properly, it does the same thing as the filter I created earlier. It still censors the word 'skill', 'skilled', etc.

Thanks for trying though. Smile!

Try something like this instead (have not tried it):
Code:
[Patterns]
Name = "Trap 'kill', leave 'skill'..."
Active = TRUE
URL = "$TYPE(htm)"
Limit = 8
Match = "(^[a-z])kill(^[a-z])"
Replace = "****"
Basically, so long as 'kill' has any letter before or after it, it is NOT matched and therefore NOT replaced... (that is, assuming I have the ^ syntax correct, I haven't tried it...)

You'd obviously need to tweak it 'cause the above will really only work for four-letter words... and you'd obviously want a $LST...
Add Thank You Quote this message in a reply
Nov. 30, 2005, 12:21 PM
Post: #8
 
ProxRocks Wrote:Try something like this instead (have not tried it):
Code:
[Patterns]
Name = "Trap 'kill', leave 'skill'..."
Active = TRUE
URL = "$TYPE(htm)"
Limit = 8
Match = "(^[a-z])kill(^[a-z])"
Replace = "****"
Basically, so long as 'kill' has any letter before or after it, it is NOT matched and therefore NOT replaced... (that is, assuming I have the ^ syntax correct, I haven't tried it...)

You'd obviously need to tweak it 'cause the above will really only work for four-letter words... and you'd obviously want a $LST...

That filter doesn't work either, it doesn't even censor the word 'kill'. You did say you never tried the filter though, so I'll let you off this time. Wink :p
Add Thank You Quote this message in a reply
Nov. 30, 2005, 12:35 PM
Post: #9
 
oh well, was only supposed to invoke thought, not as a direct plug-and-play, import-it-and-run-with-it solution...

for kill versus skill, import this directly (no guarantees)...
Code:
[Patterns]
Name = "Trap 'kill', leave 'skill'..."
Active = TRUE
URL = "$TYPE(htm)"
Limit = 8
Match = "(^s)kill"
Replace = "****"
or maybe:
Code:
[Patterns]
Name = "Trap 'kill', leave 'skill'..."
Active = TRUE
URL = "$TYPE(htm)"
Limit = 8
Match = "[^s]kill"
Replace = "****"
or maybe:
Code:
[Patterns]
Name = "Trap 'kill', leave 'skill'..."
Active = TRUE
URL = "$TYPE(htm)"
Limit = 8
Match = "(^s)(kill)"
Replace = "****"
I don't know "exactly" how the syntax "should" be, play around with it...
Add Thank You Quote this message in a reply
Nov. 30, 2005, 08:12 PM
Post: #10
 
Igraz Wrote:I'm not that bothered about blocking web sites with adult content, I just want certain words censored. I suffer from a mental illness, and some negative words TRIGGER severe panic attacks, or trigger negative thoughts/mental chatter in my head. Unfortunately I see these negative words every day when I visit various web sites and forums, so I want them censored in my browser.
Normally, I'd try to limit the filter to text that is seen but, since you have a significant problem, lets be more aggressive. For now I'd filter all $TYPEs.

Some warnings:
A problem is pictures that contain the words.
Another is filters that will match the words first and hide them from your censor filter.
Also, the words can be obfuscated or jumbled in the page's code.


Try:

[Patterns]
Name = "Censor words with list"
Active = TRUE
Multi = TRUE
Bounds = "[^a-z]$LST(BadStrings)[^a-z]"
Limit = 256
Match = "([^a-z])\#(?$SET(#=*))++([^a-z])\#"
Replace = "\@"

or:

[Patterns]
Name = "Censor words no list"
Active = TRUE
Multi = TRUE
Bounds = "[^a-z](kill|nextwordhere|anotherword)[^a-z]"
Limit = 256
Match = "([^a-z])\#(?$SET(#=*))++([^a-z])\#"
Replace = "\@"

You'll need to add a list to Proxomitron to use the first.
For the second you can change or add to the Bounds.
Copy and import the filters

HTH,
--
JJoe
Add Thank You Quote this message in a reply
Dec. 01, 2005, 04:14 AM
Post: #11
 
ProxRocks;
Quote:would not writing the filter itself induce said panic attack?
sorry, just messin'...

we have a guy here that goes into convulsions everytime someone says "Republican", "GOP", or "welfare reform"...
ironically, "social security reform" doesn't seem to effect him so adversely...


but anyway, off topic...
That was mean! Cry >Smile! But I like the way you think! Applause Hail

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Dec. 01, 2005, 04:42 AM
Post: #12
 
Igraz;

First of all, Welcome to the UOPF! Cheers

Second, while this may seem a bit insulting, did you just download the latest and greatest version of Proxo? There are plenty of places on the innerweb that still carry older versions of the proggie, so I thought we'd better check. Shock

You should have, hopefully, version 4.5j. There was a 4.5m, but most reputable places where you might find that one will also carry the warning that it has what the community in general considers to be a bug - it was the reason for 4.5j, the error was corrected therein.

And the first reason I ask this is, Kye-U's filter should work, but no version of Proxo prior to 4.5 had the TST matching command - that's one possible reason why the filter didn't work for you.

And as luck would have it, JJoe's filters should work in any version back to 3 (I think). But if you need to excise more than 3 or 4 words, then I suggest that you use a list. If that doesn't make sense to you, then just let us know, and we'll get you over that hurdle. Smile!


Oddysey

I'm no longer in the rat race - the rats won't have me!
Add Thank You Quote this message in a reply
Dec. 01, 2005, 06:58 PM
Post: #13
 
Oddysey Wrote:But if you need to excise more than 3 or 4 words, then I suggest that you use a list. If that doesn't make sense to you, then just let us know, and we'll get you over that hurdle. Smile!
And about that List:
If you use the filter above and wanted to censor "word" and "words",
you'd want "word" to come after "words".
Like:
  • words
    word
Otherwise, Proxomitron could find the match for "word" first and the match would fail when the "s" in words was found.

If you'll change the filter to
[Patterns]
Name = "Censor words with list"
Active = TRUE
Multi = TRUE
Bounds = "[^a-z]$LST(BadStrings)"
Limit = 256
Match = "([^a-z])\#(?$SET(#=*))++([^a-z])\#"
Replace = "\@"

and build the list like:
  • word[^a-z]
    words[^a-z]
, the order of the words shouldn't matter.

HTH,
--
JJoe
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: