472,334 Members | 1,483 Online
Bytes | Software Development & Data Engineering Community
+ Post

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 472,334 software developers and data experts.

Using difflib to compare text ignoring whitespace differences

Hi

I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:

In [104]: d = difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)

In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']

Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?

Thanks
Neilen

--
you know its kind of tragic
we live in the new world
but we've lost the magic
-- Battery 9 (www.battery9.co.za)

Dec 19 '06 #1
1 8999
On 19 dic, 11:53, Neilen Marais <nmar...@sun.ac.zawrote:
Hi

I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:

In [104]: d =difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK )

In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']

Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?
The docs for Differ are a bit terse and misleading.
compare() does a two-level matching: first, on a *line* level,
considering only the linejunk parameter. And then, for each pair of
similar lines found on the first stage, it does a intraline match
considering only the charjunk parameter.
Also note that junk!=ignored, the algorithm tries to "find the longest
contiguous matching subsequence that contains no ``junk'' elements"

Using a slightly longer text gets closer to what you want, I think:

d=difflib.Differ(charjunk=difflib.IS_CHARACTER_JUN K)
for delta in d.compare([' a larger line'],['a longer line']): print
delta

- a larger line
? --- ^^

+ a longer line
? ^^

--
Gabriel Genellina

Dec 21 '06 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

3
by: Russell | last post by:
I'm trying to automate the comparison of some source code but I'm having trouble using the differ object. I'm new to Python so I'm probably doing...
3
by: Humpdydum | last post by:
Can anyone try the following in their python interpreter? These give correct output: >>> print list(ndiff(,)) >>> print list(ndiff(,)) ...
2
by: RobG | last post by:
Why does Firefox insert #text nodes as children of TR elements? As a work-around for older Safari versions not properly supporting a table row's...
11
by: John Henry | last post by:
I am just wondering what's with get_close_matches() in difflib. What's the magic? How fuzzy do I need to get in order to get a match?
2
by: veinnz | last post by:
hi all... i'm kinda newbie in php...so i would like to ask ur guidance on how to compare the distribution of whitespace between two text area? so...
7
by: whitewave | last post by:
Hi Guys, I'm a bit confused in difflib. In most cases, the differences found using difflib works well but when I have come across the following...
2
by: krishnakant Mane | last post by:
hello all, I have a bit of a confusing question. firstly I wanted a library which can do an svn like diff with two files. let's say I have file1...
3
by: n00m | last post by:
from random import randint s1 = '' s2 = '' for i in xrange(1000): s1 += chr(randint(97,122)) s2 += chr(randint(97,122)) print s1
0
by: Gabriel Genellina | last post by:
En Thu, 25 Sep 2008 05:30:41 -0300, <dudeja.rajat@gmail.comescribió: Instead of parsing that generated file (intended for human usage), use...
0
better678
by: better678 | last post by:
Question: Discuss your understanding of the Java platform. Is the statement "Java is interpreted" correct? Answer: Java is an object-oriented...
0
by: teenabhardwaj | last post by:
How would one discover a valid source for learning news, comfort, and help for engineering designs? Covering through piles of books takes a lot of...
0
by: Kemmylinns12 | last post by:
Blockchain technology has emerged as a transformative force in the business world, offering unprecedented opportunities for innovation and...
0
by: CD Tom | last post by:
This only shows up in access runtime. When a user select a report from my report menu when they close the report they get a menu I've called Add-ins...
0
by: Naresh1 | last post by:
What is WebLogic Admin Training? WebLogic Admin Training is a specialized program designed to equip individuals with the skills and knowledge...
0
by: antdb | last post by:
Ⅰ. Advantage of AntDB: hyper-convergence + streaming processing engine In the overall architecture, a new "hyper-convergence" concept was...
0
by: Matthew3360 | last post by:
Hi there. I have been struggling to find out how to use a variable as my location in my header redirect function. Here is my code. ...
0
by: Arjunsri | last post by:
I have a Redshift database that I need to use as an import data source. I have configured the DSN connection using the server, port, database, and...
0
hi
by: WisdomUfot | last post by:
It's an interesting question you've got about how Gmail hides the HTTP referrer when a link in an email is clicked. While I don't have the specific...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.