FIX: Show a correct diff when editing consecutive paragraphs (PR #8177)

Currently, if two consecutive paragraphs are edited, their contents are not properly diffed: Screen Shot 2019-10-10 at 09 35 13

The current algorithm performs a pairwise diff whenever a deletion follows an addition or vice versa. This happens here:

https://github.com/discourse/discourse/blob/c5326682d6346c13b495ece9f3ee4e82f1b13a72/lib/discourse_diff.rb#L38-L39

However, in the example above we have two additions followed by two deletions. This leads to:

  • an added paragraph: this is one great paragraph
  • a pairwise comparison: this is one paragraph vs. this is another
  • a deleted paragraph: here is yet another

The ideal approach would be to perform an alignment step before diffing (which is not trivial to implement, see Gale-Church algorithm), so a good compromise is to use one of the alternative edit sequences to the one returned by ONPDiff.

In other words, ONPDiff.paragraph_diff transforms the edit sequence from ONPDiff.diff Add1 Add2 Delete1 Delete2 into this one Add1 Delete1 Add2 Delete2.

Current behavior using ONPDiff.diff:

image

Proposed behavior using ONPDiff.paragraph_diff:

image

GitHub

You’ve signed the CLA, nachocab. Thank you! This pull request is ready for review.

Wow :heart_eyes:

SO MUCH BETTERER :clap:

Can you benchmark this on very large diffs? Both when there are lots of text but a small number of diffs or a large-ish amount of text but a lot of diffs.

Can you benchmark this on very large diffs? Both when there are lots of text but a small number of diffs or a large-ish amount of text but a lot of diffs. Sure, do you need me to use a specific tool or just to sanity check it on my computer?

Sure, do you need me to use a specific tool or just to sanity check it on my computer?

Sanity check is fine as long as I have before/after :wink:

And how big do you want the diffs?

Regarind text size, near the default maximum post lenght.

@ZogStriP I tried creating a 32K message and made a few edits. The diff was instantaneous:

I also tried adding 1100 edits and it was also instantaneous:

Just to compare, how long did it take when using the previous algorithm?

They were both equally fast (less than a second, probably less than .5s):

1 Like