By using this site, you agree to our updated Privacy Policy and our Terms of Use. Manage your Cookies Settings.
449,215 Members | 1,920 Online
Bytes IT Community
+ Ask a Question
Need help? Post your question and get tips & solutions from a community of 449,215 IT Pros & Developers. It's quick & easy.

difflib.ndiff broken?

P: n/a
Can anyone try the following in their python interpreter?

These give correct output:
print list(ndiff(['saving2 <<A'],['saving <<a>>'])) ['- saving2 <<A', '? - ^\n', '+ saving <<a>>', '? ^^^\n'] print list(ndiff(['saving2 <<AA'],['saving <<a>>'])) ['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '? ^^^\n'] print list(ndiff(['saving2 <<A'],['saving <<aa>>'])) ['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '? ^^^^\n'] print list(ndiff(['saving <<A'],['saving <<aa>>'])) ['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

Now try the very slight variations:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>'])) ['- saving2 <<AA', '+ saving <<aa>>'] print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))

['- saving2 <<AA', '+ saving <<aa>>']

This can't be right... or is it? Where are the '? ...' lines? It does this
for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
correct, how come???

Oliver
Jul 18 '05 #1
Share this Question
Share on Google+
3 Replies


P: n/a
[Humpdydum]
Can anyone try the following in their python interpreter?

These give correct output:
print list(ndiff(['saving2 <<A'],['saving <<a>>'])) ['- saving2 <<A', '? - ^\n', '+ saving <<a>>', '? ^^^\n'] print list(ndiff(['saving2 <<AA'],['saving <<a>>'])) ['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '? ^^^\n'] print list(ndiff(['saving2 <<A'],['saving <<aa>>'])) ['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '? ^^^^\n'] print list(ndiff(['saving <<A'],['saving <<aa>>'])) ['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

Now try the very slight variations:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>'])) ['- saving2 <<AA', '+ saving <<aa>>'] print list(ndiff(['saving2 <<AA'],['saving <<aa>>'])) ['- saving2 <<AA', '+ saving <<aa>>']

This can't be right... or is it? Where are the '? ...' lines? It does this
for both Python 2.3.2 on Windows 2000 and Python 2.3.3 on SGI. If it's
correct, how come???


ndiff produces intraline difference marking if and only if it thinks
the inputs are "reasonably close". The cutoff between "reasonably
close" and "not reasonably close" is necessarily heuristic. '?' lines
are more irritating than helpful when they have a lot of markup in
them, so it certainly wan't intended that '?' lines *always* be
produced. The '+' and '-' lines contain all the information about how
to change one sequence into another; the '?' lines are fluff (abeit
sometimes helpful fluff -- that's why they're (sometimes) there).

Concretely, ndiff produces intraline marking iff two lines have a
similarity ratio of at least 0.75. In your first examples, the lines
do:
import difflib
m = difflib.SequenceMatcher()
m.set_seqs('saving2 <<A', 'saving <<a>>')
print m.ratio() 0.782608695652

In your last examples, the lines don't:
m.set_seqs('saving2 <<AA', 'saving <<aa>>')
print m.ratio() 0.72


Internally, 0.75 is the default value of FancyReplacer's optional
minimal_cutoff argument.
Jul 18 '05 #2

P: n/a
OK, forget it, sorry it was my mistake: it wasn't obvious from the difflib
docs, but it appears that ndiff points out the sub-line differences (lines
that start with ?) only if it was able to figure out operations that could
be applied to substrings on the line. Though often such operations are
obvious by looking at the strings being compared, ndiff doesn't always find
them, and so marks the whole line as + or -.

Anyone know of web site that explains ndiff output? I coulnd't figure out a
good set of search terms in google, didn't get anything useful. Thanks,

Oliver

"Humpdydum" <ol***************@utoronto.ca> wrote in message
news:cd**********@nrc-news.nrc.ca...
Can anyone try the following in their python interpreter?

These give correct output:
print list(ndiff(['saving2 <<A'],['saving <<a>>'])) ['- saving2 <<A', '? - ^\n', '+ saving <<a>>', '? ^^^\n'] print list(ndiff(['saving2 <<AA'],['saving <<a>>'])) ['- saving2 <<AA', '? - ^^\n', '+ saving <<a>>', '?

^^^\n']
print list(ndiff(['saving2 <<A'],['saving <<aa>>'])) ['- saving2 <<A', '? - ^\n', '+ saving <<aa>>', '?

^^^^\n']
print list(ndiff(['saving <<A'],['saving <<aa>>'])) ['- saving <<A', '? ^\n', '+ saving <<aa>>', '? ^^^^\n']

Now try the very slight variations:
print list(ndiff(['saving2 <<AA'],['saving <<aa>>'])) ['- saving2 <<AA', '+ saving <<aa>>'] print list(ndiff(['saving2 <<AA'],['saving <<aa>>']))

['- saving2 <<AA', '+ saving <<aa>>']

This can't be right... or is it? Where are the '? ...' lines?

Jul 18 '05 #3

P: n/a
[Humpdydum]
OK, forget it, sorry it was my mistake:
I didn't see a mistake, just a question.
it wasn't obvious from the difflib docs, but it appears that ndiff points out the
sub-line differences (lines that start with ?) only if it was able to figure out
operations that could be applied to substrings on the line. Though often such
operations are obvious by looking at the strings being compared,
They can be for a program but often aren't for people. That's why
ndiff produces '?' lines when it thinks they might help. This is a
heuristic -- a guess. Sometimes it's not the same guess you'd make.
There's always a sequence of operations that can be applied to change
any line into any other line, but *usually* they're uninteresting.
'?' lines attempt to point out "minor edits".
ndiff doesn't always find them, and so marks the whole line as + or -.
It marks two input lines that differ with - and + regardless of
whether it produces two ? lines too.
Anyone know of web site that explains ndiff output? I coulnd't figure out a
good set of search terms in google, didn't get anything useful. Thanks,


ndiff is unique to Python, and you have the source code for it.
Because '?' lines are fluff, precise docs for them would be
counterproductive. They're meant to guide the eye to minor intraline
differences, and that's all.

If a ? line appears, there are always two of them, interleaved between
a -+ pair, in this pattern:

-
?
+
?

Each ? line implicitly refers to the line immediately above it. Four
meaningful characters appear in ? lines. A caret (^) means the
character immediately above it was replaced, in going from the - to
the + line. "-" means the character immediately above it was deleted;
'+' means it was inserted; and a blank means the character immediately
above it is the same in both (- and +) lines. A '-' can appear only
in the ? line following a - line, and a '+' can appear only in the ?
line following a + line, because we're picturing the edits needed to
change the - line into the + line.
Jul 18 '05 #4

This discussion thread is closed

Replies have been disabled for this discussion.