|Overview Usage Performance Notes Links||tst 0.68 - 23 Jan 2009|
Ternary Search Tree containers to replace
Table of contents
Download: Latest version (0.684) http://abc.se/~re/code/tst/ternary_tree.zip
Of course there is a price to pay: structured containers use much more memory than other containers: Around 6-8 bytes per letter inserted (whether
wchar_t); an English 150 k word dictionary uses eg 7.3 MB to store 1.2 MB words (2.4 MB of
The container classes in this library can be used as drop-in replacements for
While the STL standard associative containers are normally backed by a binary tree structure, Structured Containers are backed by a Ternary Search Tree, as presented by Jon Bentley and Robert Sedgewick in .
Class ternary_tree<Key, Value, Comp, Alloc> provides the implementation backend. Due to its internals, its interface cannot easily be made to conform with standard STL concepts, so it is used internally by the structured* wrapper classes (much like STL's internal
Basically, if you have code using sets or maps, you have code to use structured containers. And with 1-3 lines of code, you're ready to make advanced imprecise searches in your dictionaries.
See the usage section for examples of how to use these classes.
|Compatibility:||Note that the file tst_concept_checks.cpp is currently broken. Will investigate.|
|version 0.684: (Jan 2009)||Fix standard-breakage in multimap/multiset return from |
|version 0.683: (March 2007)||Fix portability issues for GCC and non-STLport libraries. Fix longest_match.|
Thanks to Arjen Wagenaar for several reports, fixes and encouragement. Thanks also to Michel Tourn for reports.
|version 0.68: (Dec 2006)||Implement TST_NODE_COUNT_TYPE macro, which can be used to control node size on 64-bit systems. See class ternary_tree|
|version 0.68 (alpha):||Reimplemented node type. Do proper management of value type (was inconsistent, partly unimplemented - duh!) |
(a new interface for these searches will be specified in the future)
Ternary trees allow searches that match parts of keys and ignores mismatches in other parts.
In the current interface we specify a small number of searches facilitated by the tree structure; the Partial Match and Hamming searches are defined in several other implementations (showcased in Bentley and Sedgewick code). The Levenshtein and combinatorial searches are not found in other ternary trees (that I know of).
|Name (function name)||Description|
|Prefix match (prefix_range)||Finds keys sharing a common prefix, returns a pair of iterators.|
|Longest match (longest_match)||Finds the longest key that matches beginning of search string. A typical application is to tokenize a string using the ternary tree as dictionary.|
|Partial match, or wildcard search (partial_match_search)||Accepts a search string with wildcard characters that will match any letter, eg "b?nd" would match "band", "bend", "bind", "bond" in an English dictionary.|
|Search allowing ||Accepts a search string and an integer |
The version here, following DDJ code, extends the strict Hamming search by also allowing shorter and longer strings; a search for "band", dist = 1, also finds "ban" and "bandy" etc.
See also http://wikipedia.org/wiki/Hamming_distance
|Levenshtein distance search (levenshtein_search - consider descriptive name)|
Hamming search matches characters in fixed position, allowing substitution of dist chars. Levenshtein search also allows shifting parts of the search string by insertion or skipping chars (in dist places). So
|Combinatorial or "scrabble" search (combinatorial_search)||Finds all keys using the characters in search string. |
See advanced search overview in the tutorial.
These searches are defined for all containers in this library. But they are also marked as deprecated (to be replaced by generic algorithms with same interface). For a relative performance comparison of imprecise searches, see the second table in Performance Notes.
The searches currently defined are clearly special cases in a sea of search possibilities. We have only defined searches that are relatively efficient, compared to other combinations of containers and algorithms. But there can be many variations on the available searches: increasing Hamming/Levenshtein distance at the end of words, or matching limited ranges of characters (eg allowing mismatches only in vowels), etc.
The next step for this project is to support a more flexible low-level interface for traversing and filtering tree nodes. The interface for these "structured searches" is open for consideration, but it will basically define sub-key iterators, conversion of full-key from sub-key iterators, and a small collection of algorithms operating on these sub-key iterators.
At least the following operations are needed:
is_key(subkey_iterator pos): true if end-of-key exists at iterator position.
count_elements(subkey_iterator pos): returns number of available key elements at position.
|ternary_tree 0.68 -- by rasmus ekman -- Page generated with Doxygen 1.5.6 on 23 Jan 2009||12934|