R Programming - Approximate String Matching (Fuzzy Matching)

Edward Chan

Published Feb 8, 2016

Thought I would share this with my (nerd) friends...

Approximate String Matching (Fuzzy Matching)

Description

Searches for approximate matches to (the first argument) within each element of the string (the second argument) using the generalized Levenshtein edit distance (the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another).

Usage

Arguments

Details

The Levenshtein edit distance is used as measure of approximateness: it is the (possibly cost-weighted) total number of insertions, deletions and substitutions required to transform one string into another.

This uses by Ville Laurikari (http://laurikari.net/tre/), which supports MBCS character matching.

The main effect of is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales. It inhibits the conversion of inputs with marked encodings, and is forced if any input is found which is marked as (see ).

Value

returns a vector giving the indices of the elements that yielded a match, or, if is , the matched elements (after coercion, preserving names but no other attributes).

returns a logical vector.

Note

Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of (just as does) and not whole elements. See also in package utils, which optionally returns the offsets of the matched substrings.

Author(s)

Original version in R < 2.10.0 by David Meyer. Current version by Brian Ripley and Kurt Hornik.

Examples

[Package base version 3.3.0 Index]

R Programming - Approximate String Matching (Fuzzy Matching)

Edward Chan

Approximate String Matching (Fuzzy Matching)

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

More articles by this author

Explore content categories

Approximate String Matching (Fuzzy Matching)

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

NSW Government is investing $111 billion in projects. The question is whether it will reach the people who need it most.

Mar 19, 2026

Realising Benefits of Government Investment

Nov 17, 2022

Reflecting on infrastructure projects and technology – how should we encourage innovation in infrastructure projects?

Mar 16, 2019

Some thoughts about the NDIS Inquiry

Dec 10, 2018

Five lessons Learnt on Project Development and Service Design

Nov 3, 2018

Using data and information to drive better performance: some takeaway and trends from my studies in Stanford

Aug 26, 2018

A bit of reflection on 2017 – lessons learnt on data analytics

Dec 18, 2017

A bit of personal reflection about work

Feb 1, 2017

Very excited about Tableau 9.3 - Union Function

Mar 6, 2016

5 Excel Add-Ins Every Data Scientist Should Install

Feb 14, 2016

Explore content categories