R Programming - Approximate String Matching (Fuzzy Matching)

R Programming - Approximate String Matching (Fuzzy Matching)

Thought I would share this with my (nerd) friends...

Approximate String Matching (Fuzzy Matching)

Description

Searches for approximate matches to (the first argument) within each element of the string (the second argument) using the generalized Levenshtein edit distance (the minimal possibly weighted number of insertions, deletions and substitutions needed to transform one string into another).

Usage

Arguments

Details

The Levenshtein edit distance is used as measure of approximateness: it is the (possibly cost-weighted) total number of insertions, deletions and substitutions required to transform one string into another.

This uses by Ville Laurikari (http://laurikari.net/tre/), which supports MBCS character matching.

The main effect of is to avoid errors/warnings about invalid inputs and spurious matches in multibyte locales. It inhibits the conversion of inputs with marked encodings, and is forced if any input is found which is marked as (see ).

Value

returns a vector giving the indices of the elements that yielded a match, or, if is , the matched elements (after coercion, preserving names but no other attributes).

returns a logical vector.

Note

Since someone who read the description carelessly even filed a bug report on it, do note that this matches substrings of each element of (just as does) and not whole elements. See also in package utils, which optionally returns the offsets of the matched substrings.

Author(s)

Original version in R < 2.10.0 by David Meyer. Current version by Brian Ripley and Kurt Hornik.

See Also

, .

Examples


[Package base version 3.3.0 Index]

To view or add a comment, sign in

Explore content categories