Simple tools for processing strings in russian (choose proper form for plurals, in-words representation of numerals, dates in russian without locales, transliteration, etc)
Sally is a small tool for mapping a set of strings to a set of vectors. This mapping is referred to as embedding and allows for applying techniques of machine learning and data mining for analysis of string data. Sally implements a standard technique for mapping strings to a vector space that is
... [More] often referred to as vector space model or bag-of-words model. The strings are characterized by a set of features, where each feature is associated with one dimension of the vector space. Sally proceeds by counting the occurrences of the specified features in each string and generating a sparse vector of count values. The tool then normalizes the vectors and outputs them in a given format. [Less]
Yet Another String Library for C
yasl started its life as a fork of sds, which is an extracted version of the dynamic string library used in the Redis codebase. Due to maintainer being unresponsive about sds yasl was made.
Harry is a small tool for comparing strings. The tool supports several common distance and kernel functions for strings as well as some excotic similarity measures. The focus of Harry lies on implicit similarity measures, that is, comparison functions that do not give rise to an explicit vector
... [More] space. Examples of such similarity measures are the Levenshtein distance, the Jaro-Winkler distance or the sectrum kernel.
Harry is implemented using OpenMP, such that the computation time for a set of strings scales linear with the number of available CPU cores. Moreover, efficient implementations of several similarity measures, effective caching of similarity values and low-overhead locking further speedup the computation. [Less]
cl-heredoc is an implementation of "here documents" that allow the user to embed literal strings into code or data without any need for quoting, something that is missing in both ANSI CL and popular implementations.
This site uses cookies to give you the best possible experience.
By using the site, you consent to our use of cookies.
For more information, please see our
Privacy Policy