Run this notebook

Use Livebook to open this notebook and explore new ideas.

It is easy to get started, on your machine or the cloud.

Click below to open and run it in your Livebook at .

(or change your Livebook location)

# Disambiguation ## Akin Akin is a collection of string comparison algorithms for Elixir. Algorithms can be called independently or combined to return a map of metrics. This library was built to facilitiate the disambiguation of names but can be used to compare any two binaries. ## Algorithms Utilities are provided to return all avialable algorithms. ```elixir Akin.Util.list_algorithms() ``` **Note**: Hamming Distance is excluded as it only compares strings of equal length. To use the Hamming Distance algorithm, call it directly (see: [Independent Algorithms](#independent-algorithms)). ## Combined Algorithms ### Metrics Results from all algorithms are returned as a map of metrics. <!-- livebook:{"break_markdown":true} --> #### Compare Strings Experiment by changing the value of the strings. ```elixir a = "weird" b = "wierd" Akin.compare(a, b) ``` ### Options Comparison accepts options in a Keyword list. 1. `algorithms`: algorithms to use in comparision. Accepts the name or a keyword list. Default is algorithms/0. 1. `metric` - algorithm metric. Default is both * "string": uses string algorithms * "phonetic": uses phonetic algorithms 2. `unit` - algorithm unit. Default is both. * "whole": uses algorithms best suited for whole string comparison (distance) * "partial": uses algorithms best suited for partial string comparison (substring) 2. `level` - level for double phonetic matching. Default is "normal". * "strict": both encodings for each string must match * "strong": the primary encoding for each string must match * "normal": the primary encoding of one string must match either encoding of other string (default) * "weak": either primary or secondary encoding of one string must match one encoding of other string 3. `match_at`: an algorith score equal to or above this value is condsidered a match. Default is 0.9 4. `ngram_size`: number of contiguous letters to split strings into. Default is 2. 5. `short_length`: qualifies as "short" to recieve a shortness boost. Used by Name Metric. Default is 8. 6. `stem`: boolean representing whether to compare the stemmed version the strings; uses Stemmer. Default `false` ```elixir opts = [algorithms: ["bag_distance", "jaccard", "jaro_winkler"]] Akin.compare(a, b, opts) ``` ```elixir opts = [algorithms: [metric: "phonetic", unit: "whole"]] Akin.compare(a, b, opts) ``` ```elixir Akin.compare(a, b, algorithms: [metric: "string", unit: "whole"], ngram_size: 1) ``` #### n-gram Size The default ngram size for the algorithms is 2. You can change by setting a value in opts. ```elixir opts = [algorithms: ["sorensen_dice"]] Akin.compare(a, b, opts) ``` ```elixir opts = [algorithms: ["sorensen_dice"], ngram_size: 1] Akin.compare(a, b, opts) ``` #### Match Level The default match strictness is "normal" You change it by setting a value in opts. Currently it only affects the outcomes of the `substring_set` and `double_metaphone` algorithms ```elixir left = "Alice in Wonderland" right = "Alice's Adventures in Wonderland" Akin.compare(left, right, algorithms: ["substring_set"]) ``` ```elixir Akin.compare(left, right, algorithms: ["substring_set"], level: "weak") ``` ```elixir left = "which way" right = "whitch way" Akin.compare(left, right, algorithms: ["double_metaphone"], level: "weak") ``` ```elixir Akin.compare(left, right, algorithms: ["double_metaphone"], level: "strict") ``` #### Stems Compare the stemmed version of two strings. ```elixir not_gerund = "write" gerund = "writing" Akin.compare(not_gerund, gerund, algorithms: ["bag_distance", "double_metaphone"]) ``` ```elixir Akin.compare(not_gerund, gerund, algorithms: ["bag_distance", "double_metaphone"], stem: true) ``` ### Preprocessing Before being compared, strings are converted to downcase and unicode standard, whitespace is standardized, nontext (like punctuation & emojis) is replaced, and accents are converted. The string is then composed into a struct representing the corpus of data used by the comparison algorithms. ```elixir name = "Alice Liddell" Akin.Util.compose(name) ``` ### Accents ```elixir name_a = "Hubert Łępicki" Akin.Util.compose(name_a) ``` ```elixir name_b = "Hubert Lepicki" Akin.compare(name_a, name_b) ``` ### Phonemes ```elixir Akin.phonemes(name) ``` ```elixir Akin.phonemes("wonderland") ``` ## Independent Algorithms Each algorithm can be called directly. Module names are camelcased versions of the the snakecased algorithm names returned by `list_algorithms/0`. ```elixir a = Akin.Util.compose("weird") b = Akin.Util.compose("wierd") Akin.BagDistance.compare(a, b) ``` Hamming Distance is excluded from `list_algorithms/0` and the combined algorithm metrics as it only compares strings of equal length. To use the Hamming Distance algorithm, call it directly. ```elixir Akin.Hamming.compare("weird", "wierd") ```
See source

Have you already installed Livebook?

If you already installed Livebook, you can configure the default Livebook location where you want to open notebooks.
Livebook up Checking status We can't reach this Livebook (but we saved your preference anyway)
Run notebook

Not yet? Install Livebook in just a minute

Livebook is open source, free, and ready to run anywhere.

Run on your machine

with Livebook Desktop

Run in the cloud

on select platforms

To run on Linux, Docker, embedded devices, or Elixir’s Mix, check our README.

PLATINUM SPONSORS
SPONSORS
Code navigation with go to definition of modules and functions Read More ×