A Comparison of Personal Name Matching: Techniques and Practical Issues

@article{Christen2006ACO,
  title={A Comparison of Personal Name Matching: Techniques and Practical Issues},
  author={Peter Christen},
  journal={Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)},
  year={2006},
  pages={290-294}
}
  • P. Christen
  • Published 2006
  • Computer Science
  • Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06)
Finding and matching personal names is at the core of an increasing number of applications: from text and Web mining, search engines, to information extraction, deduplication and data linkage systems. [...] Key Method We then overview a comprehensive number of commonly used, as well as some recently developed name matching techniques. Experimental comparisons using four large name data sets indicate that there is no clear best matching techniqueExpand
A comparison of techniques for name matching.
TLDR
This paper analyses and evaluates a set of popular name matching techniques on several carefully designed different datasets and confirms the statement that there is no clear best technique. Expand
Approximate String Matching for Geographic Names and Personal Names
TLDR
A novel method for approximate string matching, developed for the recognition of geographic and personal names, deals with abbreviations, name inversions, stopwords, and omission of parts. Expand
Approximate String Matching Techniques
TLDR
This paper analyses and evaluates a set of popular token-based string matching techniques on several carefully designed different datasets and confirms the statement that there is no clear overall best technique. Expand
Comparative Study of Name Matching Algorithms
TLDR
The types of name variations and the basic description of different name matching algorithms under Phonetics, Pattern and Dictionary are described and the issues faced in name matching to improve data accuracy are depicted by implementing Soundex, Double Metaphone and Levenshtein on sample data. Expand
Matching person names through name transformation
TLDR
A novel person name matching model is presented, common name variations in the English speaking world are formalized, and the concept of name transformation paths is introduced; name similarity is measured after the best transformation path has been selected. Expand
Hybrid Matching Algorithm for Personal Names
TLDR
A hybrid matching algorithm (PNRS) which employs phonetic encoding, string matching and statistical facts to provide a possible candidate for misspelled names is developed and the efficiency of the proposed algorithm is compared with other well known spelling correction techniques. Expand
A Large Scale Name Matching and Search Framework
TLDR
The purpose is to develop a framework that address the name match and search problem, combining together different strategies, and is able to consider also the semantic of the string representing a name. Expand
Classification of personal names with application to DBLP
  • M. Biryukov, Yafang Wang
  • Computer Science
  • 2008 Third International Conference on Digital Information Management
  • 2008
TLDR
A statistical tool for the automatic language detection of personal names that is fine tuned to achieve precision and recall above 90% for many languages which proves better performance than some other systems aiming at the language identification ofpersonal names. Expand
Similarity measures for title matching
TLDR
This paper evaluates 21 measures with the aim of detecting the most appropriate measure for matching the titles of entity names and shows that Soft-TFIDF performs the best. Expand
Use of latent semantic indexing to identify name variants in large data collections
  • R. Bradford
  • Computer Science
  • 2013 IEEE International Conference on Intelligence and Security Informatics
  • 2013
TLDR
An approach to attaining both high precision and high recall for name variant identification in large text collections by exploiting the technique of latent semantic indexing (LSI). Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 48 REFERENCES
An Assessment of Name Matching Algorithms
TLDR
A comparative analysis of a number of algorithms developed for name matching and, based on an analysis of their comparative strengths and weaknesses, a new and improved name matching algorithm is proposed, which is called the Phonex algorithm. Expand
A Comparison of String Distance Metrics for Name-Matching Tasks
TLDR
This work investigates a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods, and finds the best-performing method is a hybrid scheme combining a TFIDF weighting scheme with the Jaro-Winkler string-distance scheme. Expand
Approximate String Joins in a Database (Almost) for Free
TLDR
This paper develops a technique for building approximate string join capabilities on top of commercial databases by exploiting facilities already available in them, and demonstrates experimentally the benefits of the technique over the direct use of UDFs. Expand
The Eld Matching Problem: Algorithms and Applications
To combine information from heterogeneous sources, equivalent data in the multiple sources must be identi-ed. This task is the eld matching problem. Specii-cally, the task is to determine whether orExpand
The Field Matching Problem: Algorithms and Applications
TLDR
Three field matching algorithms are described, one of which is the well-known Smith-Waterman algorithm for comparing DNA and protein sequences, and their performance on real-world datasets is evaluated. Expand
Getty's Synoname™ and its cousins: A survey of applications of personal name‐matching algorithms
TLDR
Personal name‐matching techniques may be included in name authority work, information retrieval, or duplicate detection, with some applications matching on name only, and others combining personal names with other data elements in record linkage techniques. Expand
Adaptive duplicate detection using learnable string similarity measures
TLDR
This paper proposes to employ learnable text distance functions for each database field, and shows that such measures are capable of adapting to the specific notion of similarity that is appropriate for the field's domain. Expand
Approximate String Comparison and its Effect on an Advanced Record Linkage System
TLDR
Overall matching efficacy is further improved by linear assignment algorithm that forces 1-1 matching. Expand
Names: A New Frontier in Text Mining
TLDR
This paper surveys existing technologies for name matching and proposes a direction for future work in which existing entity extraction, coreference, and database name matching technologies would be harnessed for cross-document coreference and linking capabilities. Expand
String Edit Analysis for Merging Databases
TLDR
A flexible approach to string edit distance is presented, which can be automatically tuned to different data sets and can use synonym dictionaries, and significantly increases the algorithm’s accuracy, when costs are correctly tuned. Expand
...
1
2
3
4
5
...