In this article we will get familiar with different ways to check if two strings are similar. ‘Being Similar’ is different than ‘Being equal’ or ‘Being equals ignore case’ as inputs might have slight differences yet similar in some ways.
Similarity features
Here are few features in two inputs strings that might tell us how much similar they are.
- Matching – How many words or characters match between two inputs. Ex: “abc” and “abz” have 2 characters matching but 1 different.
- Sequence – Are matching characters in same order or sequence? Ex: “abc” and “abz” have matching characters ‘ab’ in same sequence. “abc” and “zba” have matching ‘ab’ characters but not in same sequence.
- Sound – Some spellings might be different, but when we pronounce them they might sound same. That might be a criteria to categorize them as similar.
Algorithms
Below is very quick comparison of existing algorithms to find out String similarity or String distance (As per Apache implementation). Algorithms are compared based on –
- Comparison by – Tells if algorithm compares input strings by words, or by characters or by phonetics.
- Case sensitive – Some algorithms treat different case of same character as different characters so they are case sensitive. Others might treat them same so case insensitive.
- Sequence matters? – In addition to matching characters, some algorithms will also verify if sequence of matching characters in one string is same in other string. This gives additional perspective about how much similar strings are.
- Result – Some algorithms give number between 0 to 1 indicating percentage of similarity but others might give some points indicating similarity.
Algorithm | Comparison by |
Case Sensitive |
Sequence Matters? |
Result |
---|---|---|---|---|
Cosine Similarity / Distance | word | ✓ | ✗ | Number (0 to 1) |
Fuzzy Score | character | ✗ | ✓ | Score/ points |
Hamming Distance (Same length input strings only) |
character | ✓ | ✓ | Count of substitutions |
Jaccard Similarity / Distance | character | ✓ | ✗ | Number (0 to 1) |
Jaro Winkler Similarity / distance | character | ✓ | ✓ | Number (0 to 1) |
Levenshtein Distance | character | ✓ | ✓ | Count of edits |
Longest Common Subsequence | character | ✓ | ✓ | Common String |
Soundex | phonetics | ✗ | ✓ | Soundex codes or their diff |
How similarity algorithms work?
This video gives detailed understanding of how all above similarity algorithms works. Continue after video to get into actual code examples.
Apache implementation examples
Apache provides out of the box implementations of above algorithms. You will need below library dependencies in order to execute examples.
Library Dependency
- Maven dependency for similarity package – Apache Commons Text
- Maven dependency for Soundex – Apache Commons Codec
Cosine similarity/distance
- Algorithm
- Similarity is checked by words in both inputs. Word counts are put in the cosine similarity formula as shown in above video to get similarity.
- Algorithm is case sensitive so if words are in different case, they count as different.
- Sequence of words doesn’t matter. Two sentences with same words in different order count as exact same.
- Result
- Algorithm gives value between 0 to 1. You can multiply it by 100 to get in percentage format.
- Similarity = 1 – Distance (one minus distance)
- Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
import org.apache.commons.text.similarity.CosineDistance; public class CosineDistanceExamples { private static String gravityCambridge = "the force that makes objects fall toward the earth, or toward some other large object such as a planet or a star"; private static String gravityNasa = " the force by which a planet or other body draws objects toward its center"; public static void main(String[] args) { String[][] inputStrings = new String[][] { // No similarity { "Its All Binary", "Java is great" }, // One out of three word similar { "Hi All Its All Binary", "Hi Binary" }, // Three out of 4 words similar { "Its All Binary", "Really Its All Binary" }, // Completely exact similar { "Its All Binary", "Its All Binary" }, // Completely exact similar but different sequence { "Its All Binary", "All Binary Its" }, // Different case of same string. Its case sensitive. { "Its All Binary", "iTS aLL bINARY" } }; for (String[] input : inputStrings) { // How dis-similar words are between both strings. double cosineDistance = new CosineDistance().apply(input[0], input[1]); double cosineDistancePercentage = Math.round(cosineDistance * 100); double cosineSimilarityPercentage = Math.round((1 - cosineDistance) * 100); System.out.println("CosineDistance of '" + input[0] + "' & '" + input[1] + "' | Words in strings are " + cosineDistancePercentage + "% dis-similar or " + cosineSimilarityPercentage + "% similar."); } // Realistic example to match two documents & find hwo much similar they are double cosineDistanceOfGravitDefinitions = new CosineDistance().apply(gravityNasa, gravityCambridge); System.out.println("Gravity definitions from Nasa Website & Cambdrige Dictionary are " + Math.round((1 - cosineDistanceOfGravitDefinitions) * 100) + "% similar."); } } |
1 2 3 4 5 6 7 |
CosineDistance of 'Its All Binary' & 'Java is great' | Words in strings are 100.0% dis-similar or 0.0% similar. CosineDistance of 'Hi All Its All Binary' & 'Hi Binary' | Words in strings are 47.0% dis-similar or 53.0% similar. CosineDistance of 'Its All Binary' & 'Really Its All Binary' | Words in strings are 13.0% dis-similar or 87.0% similar. CosineDistance of 'Its All Binary' & 'Its All Binary' | Words in strings are 0.0% dis-similar or 100.0% similar. CosineDistance of 'Its All Binary' & 'All Binary Its' | Words in strings are 0.0% dis-similar or 100.0% similar. CosineDistance of 'Its All Binary' & 'iTS aLL bINARY' | Words in strings are 100.0% dis-similar or 0.0% similar. Gravity definitions from Nasa Website & Cambdrige Dictionary are 59% similar. |
Fuzzy score
- Algorithm
- Similarity is checked by matching characters in query String against term String.
- This algorithms is case insensitive. Apache implementation converts inputs to lower case before comparison.
- Sequence of characters in query, if matches against term then algorithm gives bonus points. So higher point also indicate that characters are in same sequence.
- Note that method arguments in Apache implementation are position sensitive. First argument is term & second argument is query. Locale is needed to normalize strings to lower case.
- Position of query in term doesn’t matter. Score of query ‘abc’ against term ‘abcxyz’ or ‘xyzabc’ or ‘xabcyz’ gives same result since they are in same sequence i.e. fuzzy score of query against term is 7.0 points, so matches query anywhere in term.
- Result
- Algorithm gives score in terms of points. Higher points means more similarity.
- Usage
- As noted in Apache documentation, editors like Sublime Text use similar algorithms to show suggestions in order of score. Here is screenshot of Sublime Text with same test data from below example. Search results are ordered by score.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import java.util.Locale; import org.apache.commons.text.similarity.FuzzyScore; public class FuzzyScoreExamples { public static void main(String[] args) { String[][] inputStrings = new String[][] { // Matches abc at start of term { "abcxyz", "abc" }, // ABC in different case than term { "abcxyz", "ABC" }, // Matches abc at end of term { "xyzabc", "abc" }, // Matches abc in middle { "xabcyz", "abc" }, // Matches abc but not continuous. { "abxycz", "abc" }, { "axbycz", "abc" }, // Reverse order of abc { "cbaxyz", "abc" }, // Matches abc but different order. { "cabxyz", "abc" } }; for (String[] input : inputStrings) { String term = input[0]; String query = input[1]; // Fuzzy score of query against term double fuzzyScore = new FuzzyScore(Locale.getDefault()).fuzzyScore(term, query); System.out.println( "FuzzyScore of query '" + query + "' against term '" + term + "' is " + fuzzyScore + " points"); } } } |
1 2 3 4 5 6 7 8 |
FuzzyScore of query 'abc' against term 'abcxyz' is 7.0 points FuzzyScore of query 'ABC' against term 'abcxyz' is 7.0 points FuzzyScore of query 'abc' against term 'xyzabc' is 7.0 points FuzzyScore of query 'abc' against term 'xabcyz' is 7.0 points FuzzyScore of query 'abc' against term 'abxycz' is 5.0 points FuzzyScore of query 'abc' against term 'axbycz' is 3.0 points FuzzyScore of query 'abc' against term 'cbaxyz' is 1.0 points FuzzyScore of query 'abc' against term 'cabxyz' is 4.0 points |
Hamming distance
- Algorithm
- Distance / dis-similarity is checked by matching characters at same indexes. It is mainly a distance algorithm so it does not give ‘similarity’.
- Algorithm is case sensitive, hence different case means different character.
- If same characters in different sequence in two words, they count as different so sequence matters.
- Algorithms has constraint that Both inputs must be same length. Apache implementation gives IllegalArgumentException if different length.
- Result
- Number of substitutions needed to change one input into other. Its called edit distance.
- Usage
- Hamming distance is used to determine transmission errors of bits. For example: if receiver is supposed to receive code as 0101 but they receive 0100, then number of error bits = hamming distance of 0101 & 0100 i.e. 1. So it has 1 bit error. For more info Read This.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
import org.apache.commons.text.similarity.HammingDistance; public class HammingDistanceExamples { public static void main(String[] args) { String[][] inputStrings = new String[][] { // Exact same strings { "Binary", "Binary" }, // Same characters but 1 character misplaced { "ABinary", "BinaryA" }, // Only 2 characters misplaced. { "Binary", "Binray" }, // Different case. { "Binary", "bINARY" }, // Same characters different order { "sing", "sign" }, // Different length strings { "singing", "sign" }, // Completely different characters { "Binary", "uvwstu" } }; for (String[] input : inputStrings) { try { // How many characters need to be changed to match both strings. double hammingDistance = new HammingDistance().apply(input[0], input[1]); System.out.println("HammingDistance of '" + input[0] + "' & '" + input[1] + "' | Need to change " + hammingDistance + " characters to match both strings."); } catch (Exception e) { System.out.println("HammingDistance: input-1 = " + input[0] + " | input-2 " + input[1] + " | Result = " + e.getClass() + ' ' + e.getMessage()); continue; } } // Real world usage example of 1-bit error detection. double bitErrorCount = new HammingDistance().apply("0101", "0100"); System.out.println("Bit stream of " + bitErrorCount + " bit error."); } } |
1 2 3 4 5 6 7 8 |
HammingDistance of 'Binary' & 'Binary' | Need to change 0.0 characters to match both strings. HammingDistance of 'ABinary' & 'BinaryA' | Need to change 7.0 characters to match both strings. HammingDistance of 'Binary' & 'Binray' | Need to change 2.0 characters to match both strings. HammingDistance of 'Binary' & 'bINARY' | Need to change 6.0 characters to match both strings. HammingDistance of 'sing' & 'sign' | Need to change 2.0 characters to match both strings. HammingDistance: input-1 = singing | input-2 sign | Result = class java.lang.IllegalArgumentException CharSequences must have the same length HammingDistance of 'Binary' & 'uvwstu' | Need to change 6.0 characters to match both strings. Bit stream of 1.0 bit error. |
Jaccard distance
- Algorithm
- Similarity is checked by characters using intersection of characters over union of characters. i.e. in case of exact match intersection = union. If no match at all, Intersection is zero.
- Algorithms is case sensitive.
- Sequence of characters or count of each character in given input doesn’t matter. Two words with same characters & different counts of same characters count as same i.e. ‘singing’ & ‘sign’ are considered 100% same.
- Result
- Algorithm gives value between 0 to 1. You can multiply it by 100 to get in percentage format.
- Similarity = 1 – Distance (one minus distance)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import org.apache.commons.text.similarity.JaccardDistance; public class JaccardDistanceExamples { public static void main(String[] args) { String[][] inputStrings = new String[][] { // Exact same { "Binary", "Binary" }, // Same characters but different counts of characters { "singing", "sign" }, // Half characters similar { "Binary", "ray" }, // Different case. { "Binary", "bINARY" }, // No similarity { "cat", "dog" } }; for (String[] input : inputStrings) { // How much of distinct charaters are similar between two string. double jaccardDistance = new JaccardDistance().apply(input[0], input[1]); System.out.println("JaccardDistance of '" + input[0] + "' & '" + input[1] + "' | Distinct characters in strings are " + (jaccardDistance * 100) + "% dis-similar or " + ((1 - jaccardDistance) * 100) + "% similar."); } } } |
1 2 3 4 5 |
JaccardDistance of 'Binary' & 'Binary' | Distinct characters in strings are 0.0% dis-similar or 100.0% similar. JaccardDistance of 'singing' & 'sign' | Distinct characters in strings are 0.0% dis-similar or 100.0% similar. JaccardDistance of 'Binary' & 'ray' | Distinct characters in strings are 50.0% dis-similar or 50.0% similar. JaccardDistance of 'Binary' & 'bINARY' | Distinct characters in strings are 100.0% dis-similar or 0.0% similar. JaccardDistance of 'cat' & 'dog' | Distinct characters in strings are 100.0% dis-similar or 0.0% similar. |
Jaro Winkler distance
- Algorithm
- Similarity is checked by matching characters in specific way & then performing a Jaro formula & Winkler adjustment formula. on it to get final similarity/distance.
- Algorithm is case sensitive.
- Algorithm doesn’t really look for sequence. But sequence plays some role in sense that, if same character is beyond matching range then its not considered as match. Above video showcase matching range in action.
- Result –
- Gives value between 0 to 1 kind of percentage.
- Gives dis-similarity, do 1-x for similarity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
import org.apache.commons.text.similarity.JaroWinklerDistance; public class JaroWinklerDistanceExamples { public static void main(String[] args) { String[][] inputStrings = new String[][] { { "winkler", "welfare" }, { "singing", "sign" }, // Similar but different case { "Binary", "bINARY" }, // Similar word at the end { "Binary", "Its All Binary" }, // Different characters { "cat", "dog" }, // Variations of containing abc { "abcxyz", "abc" }, { "xyzabc", "abc" }, { "xabcyz", "abc" }, { "axbycz", "abc" }, // Exact similar { "Binary", "Binary" }, // One character difference { "ABinary", "BinaryA" }, { "ABinary", "BAinary" } }; for (String[] input : inputStrings) { double jaroWinklerDistance = new JaroWinklerDistance().apply(input[0], input[1]); System.out.println("JaroWinklerDistance of '" + input[0] + "' & '" + input[1] + "' | Distinct characters in strings are " + (jaroWinklerDistance * 100) + "% similar or " + ((1 - jaroWinklerDistance) * 100) + "% dis-similar."); } } } |
1 2 3 4 5 6 7 8 9 10 11 12 |
JaroWinklerDistance of 'winkler' & 'welfare' | Distinct characters in strings are 63.095238095238095% similar or 36.904761904761905% dis-similar. JaroWinklerDistance of 'singing' & 'sign' | Distinct characters in strings are 81.9047619047619% similar or 18.0952380952381% dis-similar. JaroWinklerDistance of 'Binary' & 'bINARY' | Distinct characters in strings are 0.0% similar or 100.0% dis-similar. JaroWinklerDistance of 'Binary' & 'Its All Binary' | Distinct characters in strings are 0.0% similar or 100.0% dis-similar. JaroWinklerDistance of 'cat' & 'dog' | Distinct characters in strings are 0.0% similar or 100.0% dis-similar. JaroWinklerDistance of 'abcxyz' & 'abc' | Distinct characters in strings are 88.33333333333334% similar or 11.666666666666659% dis-similar. JaroWinklerDistance of 'xyzabc' & 'abc' | Distinct characters in strings are 0.0% similar or 100.0% dis-similar. JaroWinklerDistance of 'xabcyz' & 'abc' | Distinct characters in strings are 83.33333333333334% similar or 16.666666666666664% dis-similar. JaroWinklerDistance of 'axbycz' & 'abc' | Distinct characters in strings are 85.00000000000001% similar or 14.999999999999991% dis-similar. JaroWinklerDistance of 'Binary' & 'Binary' | Distinct characters in strings are 100.0% similar or 0.0% dis-similar. JaroWinklerDistance of 'ABinary' & 'BinaryA' | Distinct characters in strings are 90.47619047619048% similar or 9.523809523809524% dis-similar. JaroWinklerDistance of 'ABinary' & 'BAinary' | Distinct characters in strings are 95.23809523809524% similar or 4.761904761904756% dis-similar. |
Levenshtein Distance
- Algorithm
- Similarity is checked by matching characters & finding out edits needed. Edits can be Replace character, Insert Character, Delete Character.
- Algorithm is case sensitive.
- Sequence of characters matters. Characters might be present, but if in wrong order then it might not be counted as match.
- Result
- Number of edits needed to change one input into other.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
import org.apache.commons.text.similarity.LevenshteinDistance; public class LevenshteinDistanceExamples { public static void main(String[] args) { String[][] inputStrings = new String[][] { // Same characters different counts { "singing", "sign" }, // Similar but different case. { "Binary", "bINARY" }, // One word same { "Binary", "Its All Binary" }, // Different characters { "cat", "dog" }, { "abcxyz", "abc" }, // Different variations of contains abc { "xyzabc", "abc" }, { "xyabcz", "abc" }, { "xabcyz", "abc" }, { "axbycz", "abc" }, // Similar words { "Binary", "Biryani" }, // Exact same { "Binary", "Binary" }, // One character variations { "Binary", "ABinary" }, { "Binary", "BinaryA" }, { "Binary", "BiAnary" }, // Spelling mistake { "Binary", "Binray" }, // MIsplaced A { "ABinary", "BinaryA" } }; for (String[] input : inputStrings) { double levenshteinDistance = LevenshteinDistance.getDefaultInstance().apply(input[0], input[1]); System.out.println("LevenshteinDistance of '" + input[0] + "' & '" + input[1] + "' | Need to change " + levenshteinDistance + " characters to match both strings."); } } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
LevenshteinDistance of 'singing' & 'sign' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'bINARY' | Need to change 6.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'Its All Binary' | Need to change 8.0 characters to match both strings. LevenshteinDistance of 'cat' & 'dog' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'abcxyz' & 'abc' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'xyzabc' & 'abc' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'xyabcz' & 'abc' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'xabcyz' & 'abc' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'axbycz' & 'abc' | Need to change 3.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'Biryani' | Need to change 4.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'Binary' | Need to change 0.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'ABinary' | Need to change 1.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'BinaryA' | Need to change 1.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'BiAnary' | Need to change 1.0 characters to match both strings. LevenshteinDistance of 'Binary' & 'Binray' | Need to change 2.0 characters to match both strings. LevenshteinDistance of 'ABinary' & 'BinaryA' | Need to change 2.0 characters to match both strings. |
Longest Common Subsequence
- Algorithm
- Similarity is checked by matching aligned characters.
- Algorithm is case sensitive.
- Sequence matters in this algorithm. If character out of sequence, its not considered match.
- Result
- This algorithm doesn’t really give similarity or distance indicating value.
- This gives String of common matching sequence of characters
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
import org.apache.commons.text.similarity.LongestCommonSubsequence; public class LongestCommonSubsequenceExamples { public static void main(String[] args) { String[][] inputStrings = new String[][] { // Same characters but different counts { "singing", "sign" }, // Similar but different case { "Binary", "bINARY" }, // One word similar { "Binary", "Its All Binary" }, // Not same { "cat", "dog" }, // Variations of contains abc { "abcxyz", "abc" }, { "xyzabc", "abc" }, { "xyabcz", "abc" }, { "xabcyz", "abc" }, { "axbycz", "abc" }, // Minor variations { "Binary", "Biryani" }, { "Biryani", "Binary" }, { "Binary", "Binary" }, { "Binary", "ABinary" }, { "Binary", "BinaryA" }, { "Binary", "BiAnary" }, { "Binary", "Binray" }, { "ABinary", "BinaryA" } }; for (String[] input : inputStrings) { CharSequence longestCommonSubsequence = new LongestCommonSubsequence().longestCommonSubsequence(input[0], input[1]); System.out.println("LongestCommonSubsequence of '" + input[0] + "' & '" + input[1] + "' is '" + longestCommonSubsequence + "'"); } } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
LongestCommonSubsequence of 'singing' & 'sign' is 'sign' LongestCommonSubsequence of 'Binary' & 'bINARY' is '' LongestCommonSubsequence of 'Binary' & 'Its All Binary' is 'Binary' LongestCommonSubsequence of 'cat' & 'dog' is '' LongestCommonSubsequence of 'abcxyz' & 'abc' is 'abc' LongestCommonSubsequence of 'xyzabc' & 'abc' is 'abc' LongestCommonSubsequence of 'xyabcz' & 'abc' is 'abc' LongestCommonSubsequence of 'xabcyz' & 'abc' is 'abc' LongestCommonSubsequence of 'axbycz' & 'abc' is 'abc' LongestCommonSubsequence of 'Binary' & 'Biryani' is 'Biry' LongestCommonSubsequence of 'Biryani' & 'Binary' is 'Biry' LongestCommonSubsequence of 'Binary' & 'Binary' is 'Binary' LongestCommonSubsequence of 'Binary' & 'ABinary' is 'Binary' LongestCommonSubsequence of 'Binary' & 'BinaryA' is 'Binary' LongestCommonSubsequence of 'Binary' & 'BiAnary' is 'Binary' LongestCommonSubsequence of 'Binary' & 'Binray' is 'Binry' LongestCommonSubsequence of 'ABinary' & 'BinaryA' is 'Binary' |
Soundex
- Algorithm
- Similarity is checked by converting input string into soundex code. Then diff of soundex code tells if String are similar in phonetic way. Above video shows algorithm in action.
- Algorithm is case in-sensitive. Case doesn’t matter because it is intended to find similarity based on phonetics or pronunciation.
- Result
- This algorithm can either give soundex code for inputs individually & you can compare by yourselves or you can use inbuilt diff which gives value betwee 0 to 4. 0 is no similarity, 4 is strong similarity.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
import org.apache.commons.codec.EncoderException; import org.apache.commons.codec.language.Soundex; public class SoundexExamples { public static void main(String[] args) throws EncoderException { String[][] inputStrings = new String[][] { // Same words from US & UK spellings { "recognise", "recognize" }, // Nick names { "John", "Jonny" }, { "John", "Jon" }, // Grammatic variations { "Code", "Coding" }, { "Code", "Coded" }, { "Code", "Codes" }, // Similar spellings { "singing", "sign" }, { "Binary", "bINARY" }, { "Binary", "Its All Binary" }, // Not similar { "cat", "dog" }, { "apple", "rock" } }; for (String[] input : inputStrings) { Soundex soundex = new Soundex(); int diff = soundex.difference(input[0], input[1]); System.out.println("Soundex codes: " + input[0] + " = " + soundex.encode(input[0]) + " " + input[1] + " = " + soundex.encode(input[1])); System.out.println("Soundex diff of '" + input[0] + "' & '" + input[1] + "' is " + diff + " (0=No Similarity, 4=String Similarity)"); } } } |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Soundex codes: recognise = R225 recognize = R225 Soundex diff of 'recognise' & 'recognize' is 4 (0=No Similarity, 4=String Similarity) Soundex codes: John = J500 Jonny = J500 Soundex diff of 'John' & 'Jonny' is 4 (0=No Similarity, 4=String Similarity) Soundex codes: John = J500 Jon = J500 Soundex diff of 'John' & 'Jon' is 4 (0=No Similarity, 4=String Similarity) Soundex codes: Code = C300 Coding = C352 Soundex diff of 'Code' & 'Coding' is 2 (0=No Similarity, 4=String Similarity) Soundex codes: Code = C300 Coded = C330 Soundex diff of 'Code' & 'Coded' is 3 (0=No Similarity, 4=String Similarity) Soundex codes: Code = C300 Codes = C320 Soundex diff of 'Code' & 'Codes' is 3 (0=No Similarity, 4=String Similarity) Soundex codes: singing = S525 sign = S250 Soundex diff of 'singing' & 'sign' is 1 (0=No Similarity, 4=String Similarity) Soundex codes: Binary = B560 bINARY = B560 Soundex diff of 'Binary' & 'bINARY' is 4 (0=No Similarity, 4=String Similarity) Soundex codes: Binary = B560 Its All Binary = I324 Soundex diff of 'Binary' & 'Its All Binary' is 0 (0=No Similarity, 4=String Similarity) Soundex codes: cat = C300 dog = D200 Soundex diff of 'cat' & 'dog' is 2 (0=No Similarity, 4=String Similarity) Soundex codes: apple = A140 rock = R200 Soundex diff of 'apple' & 'rock' is 1 (0=No Similarity, 4=String Similarity) |