In this article we will create a simple basic file diff tool/program using Apache commons text library & output diff in HTML.
Example in this article:
- Take 2 text file as input i.e. file-1.txt & file-2.txt
- Compare both files using Apache commons text & generate HTML output highlighting differences in both files.
- Create simple java main program to keep things simple.
Basic concept:
Apache commons text library’s org.apache.commons.text.diff is based on a “very efficient algorithm from Eugene W. Myers“. Algorithm executes comparison of ‘left’ & ‘right’ Strings character by character. We can provide a visitor object which is used by algorithm to specify,
- If a character is present in both ‘left’ & ‘right’ – Referred as ‘KeepCommand‘
- If a character is present in ‘left’ file but not in ‘right’, that means it need to be deleted from ‘left’ to match ‘right. – Referred as ‘DeleteCommand‘
- If a character is not present in ‘left’ but present in ‘right’, that means it needs to be inserted into ‘left’ to match right – Referred as ‘InsertCommand‘
Library Dependency (Maven, gradle etc.) –
You will need a library dependency added to your project.
- You can get latest commons text dependency version from here.
- This program also uses commons-io for file interaction, so add that dependency as well to your program.
Here is the simplest code to diff two strings using org.apache.commons.text.diff.StringsComparator with a very simple CommandVisitor. This program uses brackets to highlight differences in two strings as shown in output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
package com.itsallbinary.apache.diff; import org.apache.commons.text.diff.CommandVisitor; import org.apache.commons.text.diff.StringsComparator; public class SimpleDiff { public static void main(String[] args) { // Create a diff comparator with two inputs strings. StringsComparator comparator = new StringsComparator("Its All Binary.", "Its All fun."); // Initialize custom visitor and visit char by char. MyCommandsVisitor myCommandsVisitor = new MyCommandsVisitor(); comparator.getScript().visit(myCommandsVisitor); // Print final diff. System.out.println("FINAL DIFF = " + myCommandsVisitor.left + " | " + myCommandsVisitor.right); } } /* * Custom visitor. */ class MyCommandsVisitor implements CommandVisitor<Character> { String left = ""; String right = ""; @Override public void visitKeepCommand(Character c) { // Character is present in both files. left = left + c; right = right + c; } @Override public void visitInsertCommand(Character c) { /* * Character is present in right file but not in left. Method name * 'InsertCommand' means, c need to insert it into left to match right. */ right = right + "(" + c + ")"; } @Override public void visitDeleteCommand(Character c) { /* * Character is present in left file but not right. Method name 'DeleteCommand' * means, c need to be deleted from left to match right. */ left = left + "{" + c + "}"; } } |
1 |
FINAL DIFF = Its All {B}{i}n{a}{r}{y}. | Its All (f)(u)n. |
Lets code to compare files & generate HTML diff
Now we will apply same logic as above & compare two files & generate diff in HTML format with proper highlighted differences.
For the output, we will use this very simple HTML template which has proper look-n-feel for showing content on left & right side-by-side. This has placeholders ${left} & ${right} which we will replace using String.replace() from our program. (You may chose to use proper template libraries like velocity etc.)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
<!DOCTYPE html> <html> <head> <meta name="viewport" content="width=device-width, initial-scale=1"> <style> body { font-family: Arial; color: black; } .split { height: 100%; width: 50%; position: fixed; z-index: 1; top: 0; overflow-x: hidden; padding-top: 20px; } .left { left: 0; background-color: #D7FBF6; } .right { right: 0; background-color: #FCFCC3; } </style> </head> <body> <div class="split left"> <div class="centered"> <p>${left}</p> </div> </div> <div class="split right"> <div class="centered"> <p>${right}</p> </div> </div> </body> </html> |
Here is the complete code for diff program with explanatory comments inline. It has,
- Main program which reads ‘file-1.txt’ & ‘file-2.txt’ from root of project. (Make sure to have these files in root). It uses apache commons text to do the diff using custom visitor.
- Visitor class which stores diff of characters in HTML format with highlighting spans. It also provides a final HTML generating method which provides final diff HTML.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
package com.itsallbinary.apache.diff; import java.io.File; import java.io.IOException; import org.apache.commons.io.FileUtils; import org.apache.commons.io.LineIterator; import org.apache.commons.text.diff.CommandVisitor; import org.apache.commons.text.diff.StringsComparator; public class FileDiff { public static void main(String[] args) throws IOException { // Read both files with line iterator. LineIterator file1 = FileUtils.lineIterator(new File("file-1.txt"), "utf-8"); LineIterator file2 = FileUtils.lineIterator(new File("file-2.txt"), "utf-8"); // Initialize visitor. FileCommandsVisitor fileCommandsVisitor = new FileCommandsVisitor(); // Read file line by line so that comparison can be done line by line. while (file1.hasNext() || file2.hasNext()) { /* * In case both files have different number of lines, fill in with empty * strings. Also append newline char at end so next line comparison moves to * next line. */ String left = (file1.hasNext() ? file1.nextLine() : "") + "\n"; String right = (file2.hasNext() ? file2.nextLine() : "") + "\n"; // Prepare diff comparator with lines from both files. StringsComparator comparator = new StringsComparator(left, right); if (comparator.getScript().getLCSLength() > (Integer.max(left.length(), right.length()) * 0.4)) { /* * If both lines have atleast 40% commonality then only compare with each other * so that they are aligned with each other in final diff HTML. */ comparator.getScript().visit(fileCommandsVisitor); } else { /* * If both lines do not have 40% commanlity then compare each with empty line so * that they are not aligned to each other in final diff instead they show up on * separate lines. */ StringsComparator leftComparator = new StringsComparator(left, "\n"); leftComparator.getScript().visit(fileCommandsVisitor); StringsComparator rightComparator = new StringsComparator("\n", right); rightComparator.getScript().visit(fileCommandsVisitor); } } fileCommandsVisitor.generateHTML(); } } /* * Custom visitor for file comparison which stores comparison & also generates * HTML in the end. */ class FileCommandsVisitor implements CommandVisitor<Character> { // Spans with red & green highlights to put highlighted characters in HTML private static final String DELETION = "<span style=\"background-color: #FB504B\">${text}</span>"; private static final String INSERTION = "<span style=\"background-color: #45EA85\">${text}</span>"; private String left = ""; private String right = ""; @Override public void visitKeepCommand(Character c) { // For new line use <br/> so that in HTML also it shows on next line. String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c; // KeepCommand means c present in both left & right. So add this to both without // any // highlight. left = left + toAppend; right = right + toAppend; } @Override public void visitInsertCommand(Character c) { // For new line use <br/> so that in HTML also it shows on next line. String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c; // InsertCommand means character is present in right file but not in left. Show // with green highlight on right. right = right + INSERTION.replace("${text}", "" + toAppend); } @Override public void visitDeleteCommand(Character c) { // For new line use <br/> so that in HTML also it shows on next line. String toAppend = "\n".equals("" + c) ? "<br/>" : "" + c; // DeleteCommand means character is present in left file but not in right. Show // with red highlight on left. left = left + DELETION.replace("${text}", "" + toAppend); } public void generateHTML() throws IOException { // Get template & replace placeholders with left & right variables with actual // comparison String template = FileUtils.readFileToString(new File("difftemplate.html"), "utf-8"); String out1 = template.replace("${left}", left); String output = out1.replace("${right}", right); // Write file to disk. FileUtils.write(new File("finalDiff.html"), output, "utf-8"); System.out.println("HTML diff generated."); } } |
Time to execute the file diff
We will test the program using below test files in project root. These files have lines that have 40% commonality & also lines which do not.
1 2 3 4 5 |
I like Java a lot. Java is fun. Java is platform independent. This goes great with server. Java is oops based. |
1 2 3 4 5 |
I like Javascript very very much. Javascript is cool. Javascript is not platform independent. Browser loves it. Javascript is procedural based. |
With these files in place, we execute the program.
1 |
HTML diff generated. |
This will generate “finalDiff.html” in the project root. Open this file in a browser. It will show the highlighted diff as shown below.
Further improvements:
You can take this & improve it further more on your own to achieve better diff tools. Here are few ideas to get you started.
- Currently if lines don’t have 40% commonality, then we simply show them on separate lines, You can improve this to try to match it with next lines to see if it matches with other lines & align with that line instead.
- Currently HTML highlighting is done per character. Enhance it to group continuous inserts or delete characters in single span.
- Improve program to be more efficient for larger files & also try different ways of output formats.
- Convert main program into UI oriented tool with a fancy look-n-feel. Enjoy !
Thanks for the wonderful post, I was just looking for this solution.
Excellent article with examples that just worked.
Very helpful in my file comparison endevours.
Excellent article
it eliminates white spaces at the beggining with json files. can you help?