IRS for Computer Character Sequences Filtration: a new software tool and algorithm to support the IRS at tokenization process

Abstract

Tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. A token is an instance of token a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. New software tool and algorithm to support the IRS at… (More)

6 Figures and Tables

Topics

  • Presentations referencing similar topics