473,466 Members | 1,404 Online
Bytes | Software Development & Data Engineering Community
Create Post

Home Posts Topics Members FAQ

A better webpage filter

Since a few days I've been experimenting with a construct that enables
me to send the sourcecode of the web page I'm reading through a Python
script and then into a new tab in Mozilla. The new tab is automatically
opened so the process feels very natural, although there's a lot of
reading, filtering and writing behind the scene.

I want to do three things with this post:

A) Explain the process so that people can try it for themselves and say
"Hey stupid, I've been doing the same thing with greasemonkey for ages",
or maybe "You're great, this is easy to see, since the crux of the
biscuit is the apostrophe." Both kind of comments are very welcome.

B) Explain why I want such a thing.

C) If this approach is still valid after all the before, ask help for
writing a better Python htmlfilter.py

So here we go:

A) Explain the process

We need :

- mozilla firefox http://en-us.www.mozilla.com/en-US/
- add-on viewsourcewith https://addons.mozilla.org/firefox/394/
- batch file (on windows):
(htmfilter.bat)
d:\python25\python.exe D:\Python25\Scripts\htmlfilter.py "%1" out.html
start out.html
- a python script:
#htmfilter.py

import sys

def htmlfilter(fname, skip = []):
f = file(fname)
data = f.read()
L = []
for i,x in enumerate(data):
if x == '<':
j = i
elif x =='>':
L.append((j,i))
R = list(data)
for i,j in reversed(L):
s = data[i:j+1]
for x in skip:
if x in s:
R[i:j+1] = ' '
break
return ''.join(R)

def test():
if len(sys.argv) == 2:
skip = ['div','table']
fname = sys.argv[1].strip()
print htmlfilter(fname,skip)

if __name__=='__main__':
test()

Now install the htmlfilter.py file in your Python scripts dir and adapt
the batchfile to point to it.

To use the viewsourcewith add-on to open the batchfile: Go to some
webpage, left click and view the source with the batchfile.

B) Explain why I want such a thing.

OK maybe this should have been the thing to start with, but hey it's
such an interesting technique it's almost a waste no to give it a chance
before my idea is dissed :-)

Most web pages I visit lately are taking so much room for ads (even with
adblocker installed) that the mere 20 columns of text that are available
for reading are slowing me down unacceptably. I have tried clicking
'print this' or 'printer friendly' or using 'no style' from the mozilla
menu and switching back again for other pages but it was tedious to say
the least. Every webpage has different conventions. In the end I just
started editing web pages' source code by hand, cutting out the beef and
saving it as a html file with only text, no scripts or formatting. But
that was also not very satisfying because raw web pages are *big*.

Then I found out I often could just replace all 'table' or 'div'
elements with a space and the page -although not very html compliant any
more- still loads and often the text looks a lot better. This worked for
at least 50 percent of the pages and restored my autonomy and
independence in reading web pages! (Which I do a lot by the way, maybe
for most people the problem is not very irritating, because they don't
read as much? Tell me that too, I want to know :-)

C) Ask help writing a better Python htmlfilter.py

Please. You see the code for yourself, this must be done better :-)

A.
Mar 24 '07 #1
6 1445
En Sat, 24 Mar 2007 15:45:41 -0300, Anton Vredegoor
<an*************@gmail.comescribió:
Since a few days I've been experimenting with a construct that enables
me to send the sourcecode of the web page I'm reading through a Python
script and then into a new tab in Mozilla. The new tab is automatically
opened so the process feels very natural, although there's a lot of
reading, filtering and writing behind the scene.

I want to do three things with this post:

A) Explain the process so that people can try it for themselves and say
"Hey stupid, I've been doing the same thing with greasemonkey for ages",
or maybe "You're great, this is easy to see, since the crux of the
biscuit is the apostrophe." Both kind of comments are very welcome.
I use the Opera browser: http://www.opera.com
Among other things (like having tabs for ages!):
- enable/disable tables and divs (like you do)
- enable/disable images with a keystroke, or only show cached images.
- enable/disable CSS
- banner supressing (aggressive)
- enable/disable scripting
- "fit to page width" (for those annoying sites that insist on using a
fixed width of about 400 pixels, less than 1/3 of my actual screen size)
- apply your custom CSS or javascript on any page
- edit the page source and *refresh* the original page to reflect your
changes

All of this makes a very smooth web navigation - specially on a slow
computer or slow connection.

--
Gabriel Genellina

Mar 24 '07 #2
Gabriel Genellina wrote:
I use the Opera browser: http://www.opera.com
Among other things (like having tabs for ages!):
- enable/disable tables and divs (like you do)
- enable/disable images with a keystroke, or only show cached images.
- enable/disable CSS
- banner supressing (aggressive)
- enable/disable scripting
- "fit to page width" (for those annoying sites that insist on using a
fixed width of about 400 pixels, less than 1/3 of my actual screen size)
- apply your custom CSS or javascript on any page
- edit the page source and *refresh* the original page to reflect your
changes

All of this makes a very smooth web navigation - specially on a slow
computer or slow connection.
Thanks! I forgot about that one. It does what I want natively so I will
go that route for now. Still I think there must be some use for my
method of filtering. It's just too good to not have some use :-) Maybe
in the future -when web pages will add new advertisement tactics faster
than web browser builders can change their toolbox or instruct their
users. After all, I was editing the filter script on one screen and
another screen was using the new filter as soon as I had saved it.

Maybe someday someone will write a GUI where one can click some radio
buttons that would define what goes through and what not. Possibly such
a filter could be collectively maintained on a live webpage with an
update frequency of a few seconds or something. Just to make sure we're
prepared for the worst :-)

A.
Mar 24 '07 #3
Anton Vredegoor <an*************@gmail.comwrites:
[...]
Most web pages I visit lately are taking so much room for ads (even
with adblocker installed) that the mere 20 columns of text that are
available for reading are slowing me down unacceptably. I have tried
[...]

http://webcleaner.sourceforge.net/
Not actually tried it myself, though did browse some of the code once
or twice -- does some clever stuff.

Lots of other Python-implemented HTTP proxies, some of which are
relevant (though AFAIK all less sophisticated than webcleaner), are
listed on Alan Kennedy's nice page here:

http://xhaus.com/alan/python/proxies.html
A surprising amount of diversity there.
John
Mar 25 '07 #4
John J. Lee wrote:
http://webcleaner.sourceforge.net/
Thanks, I will look into it sometime. Essentially my problem has been
solved by switching to opera, but old habits die hard and I find myself
using Mozilla and my little script more often than would be logical.

Maybe the idea of having a *Python* script open at all times to which
all content goes through is just too tempting. I mean if there's some
possible irritation on a site theoretically I could just write a
specific function to get rid of it. This mental setting works as a
placebo on my web browsing experience so that the actual problems don't
always even need to be solved ... I hope I'm not losing all traditional
programmers here in this approach :-)
Not actually tried it myself, though did browse some of the code once
or twice -- does some clever stuff.

Lots of other Python-implemented HTTP proxies, some of which are
relevant (though AFAIK all less sophisticated than webcleaner), are
listed on Alan Kennedy's nice page here:

http://xhaus.com/alan/python/proxies.html
A surprising amount of diversity there.
At least now I know what general category seems to be nearest to my
solution so thanks again for that. However my solution is not really
doing anything like the programs on this page (although it is related to
removing ads), instead it tries to modulate a copy of the page after
it's been saved on disk. This removes all kinds of links and enables one
to definitely and finally reshape the form the page will take. As such
it is more concerned with the metaphysical image the page makes on the
users brain and less with the actual content or the security aspects.

One thing I noticed though on that (nice!) Alan Kennedy page is that
there was a script that was so small that it didn't even have a homepage
but instead it just relied on a google groups post! I guess you can see
that I liked that one :-)

My filter is even smaller. I've tried to make it smaller still by
removing the batch file and using webbrowser.open(some cStringIO object)
but that didn't work on windows.

regards,

A.
Mar 26 '07 #5
En Mon, 26 Mar 2007 06:06:00 -0300, Anton Vredegoor
<an*************@gmail.comescribió:
Thanks, I will look into it sometime. Essentially my problem has been
solved by switching to opera, but old habits die hard and I find myself
using Mozilla and my little script more often than would be logical.

Maybe the idea of having a *Python* script open at all times to which
all content goes through is just too tempting. I mean if there's some
possible irritation on a site theoretically I could just write a
specific function to get rid of it. This mental setting works as a
If you don't mind using JavaScript instead of Python, UserJS is for you:
http://www.opera.com/support/tutorials/userjs/

--
Gabriel Genellina

Mar 26 '07 #6
Gabriel Genellina wrote:
If you don't mind using JavaScript instead of Python, UserJS is for you:
http://www.opera.com/support/tutorials/userjs/
My script loads a saved copy of a page and uses it to open an extra tab
with a filtered view. It also works when javascript is disabled.

A.
Mar 26 '07 #7

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

21
by: Michele Simionato | last post by:
I often feel the need to extend the string method ".endswith" to tuple arguments, in such a way to automatically check for multiple endings. For instance, here is a typical use case: if...
14
by: Sean C. | last post by:
Helpful folks, Most of my previous experience with DB2 was on s390 mainframe systems and the optimizer on this platform always seemed very predictable and consistent. Since moving to a WinNT/UDB...
6
by: Peter | last post by:
Hi, I have two simple classes called 'User' and 'Users', the entire code for both classes is shown below. ****======== User.cs ========**** public class User { private string...
2
by: Mike P | last post by:
On my webpage I want to have an image which dissolves into another image, for example like the 'HotSip' image on www.companywire.net. Does anybody know how to do this? Is it done using Flash or...
24
by: markscala | last post by:
Problem: You have a list of unknown length, such as this: list = . You want to extract all and only the X's. You know the X's are all up front and you know that the item after the last X is...
4
by: James | last post by:
Basically I have a DataGrid that I'm binding to the results of a stored procedure call. The recordset is fairly small. Initially I'm creating a DataSet from the results and binding it. There's a...
19
by: Alexandre Badez | last post by:
I'm just wondering, if I could write a in a "better" way this code lMandatory = lOptional = for arg in cls.dArguments: if arg is True: lMandatory.append(arg) else: lOptional.append(arg)...
6
by: Christopher Vogt | last post by:
Hej everybody, I built something for myself that might help some of you as well. Looking at a couple of PHP template engines made me think. I have two main requirements for a presentation...
25
by: tmallen | last post by:
I'm parsing some text files, and I want to strip blank lines in the process. Is there a simpler way to do this than what I have here? lines = filter(lambda line: len(line.strip()) 0, lines) ...
0
by: Hystou | last post by:
Most computers default to English, but sometimes we require a different language, especially when relocating. Forgot to request a specific language before your computer shipped? No problem! You can...
0
Oralloy
by: Oralloy | last post by:
Hello folks, I am unable to find appropriate documentation on the type promotion of bit-fields when using the generalised comparison operator "<=>". The problem is that using the GNU compilers,...
0
jinu1996
by: jinu1996 | last post by:
In today's digital age, having a compelling online presence is paramount for businesses aiming to thrive in a competitive landscape. At the heart of this digital strategy lies an intricately woven...
0
tracyyun
by: tracyyun | last post by:
Dear forum friends, With the development of smart home technology, a variety of wireless communication protocols have appeared on the market, such as Zigbee, Z-Wave, Wi-Fi, Bluetooth, etc. Each...
0
agi2029
by: agi2029 | last post by:
Let's talk about the concept of autonomous AI software engineers and no-code agents. These AIs are designed to manage the entire lifecycle of a software development project—planning, coding, testing,...
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 1 May 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome a new...
0
by: conductexam | last post by:
I have .net C# application in which I am extracting data from word file and save it in database particularly. To store word all data as it is I am converting the whole word file firstly in HTML and...
0
by: TSSRALBI | last post by:
Hello I'm a network technician in training and I need your help. I am currently learning how to create and manage the different types of VPNs and I have a question about LAN-to-LAN VPNs. The...
0
by: 6302768590 | last post by:
Hai team i want code for transfer the data from one system to another through IP address by using C# our system has to for every 5mins then we have to update the data what the data is updated ...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.