On 19 dic, 11:53, Neilen Marais <nmar...@sun.ac.zawrote:
Hi
I'm trying to compare some text to find differences other than whitespace.
I seem to be misunderstanding something, since I can't even get a basic
example to work:
In [104]: d =difflib.Differ(charjunk=difflib.IS_CHARACTER_JUNK )
In [105]: list(d.compare([' a'], ['a']))
Out[105]: ['- a', '+ a']
Surely if whitespace characters are being ignored those two strings should
be marked as identical? What am I doing wrong?
The docs for Differ are a bit terse and misleading.
compare() does a two-level matching: first, on a *line* level,
considering only the linejunk parameter. And then, for each pair of
similar lines found on the first stage, it does a intraline match
considering only the charjunk parameter.
Also note that junk!=ignored, the algorithm tries to "find the longest
contiguous matching subsequence that contains no ``junk'' elements"
Using a slightly longer text gets closer to what you want, I think:
d=difflib.Differ(charjunk=difflib.IS_CHARACTER_JUN K)
for delta in d.compare([' a larger line'],['a longer line']): print
delta
- a larger line
? --- ^^
+ a longer line
? ^^
--
Gabriel Genellina