By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
446,156 Members | 1,026 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 446,156 IT Pros & Developers. It's quick & easy.

Using difflib to compare text ignoring whitespace differences

P: n/a
Hi

I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:

In [104]: d = difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK)

In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']

Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?

Thanks
Neilen

--
you know its kind of tragic
we live in the new world
but we've lost the magic
-- Battery 9 (www.battery9.co.za)

Dec 19 '06 #1
Share this Question
Share on Google+
1 Reply


P: n/a
On 19 dic, 11:53, Neilen Marais <nmar...@sun.ac.zawrote:
Hi

I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:

In [104]: d =difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK )

In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']

Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?
The docs for Differ are a bit terse and misleading.
compare() does a two-level matching: first, on a *line* level,
considering only the linejunk parameter. And then, for each pair of
similar lines found on the first stage, it does a intraline match
considering only the charjunk parameter.
Also note that junk!=ignored, the algorithm tries to "find the longest
contiguous matching subsequence that contains no ``junk'' elements"

Using a slightly longer text gets closer to what you want, I think:

d=difflib.Differ(charjunk=difflib.IS_CHARACTER_JUN K)
for delta in d.compare([' a larger line'],['a longer line']): print
delta

- a larger line
? --- ^^

+ a longer line
? ^^

--
Gabriel Genellina

Dec 21 '06 #2

This discussion thread is closed

Replies have been disabled for this discussion.