A collection of functions that are frequently used in string handling programs. STL is used when possible. Locale and internationalization are not considered. This is not intended to replace or be like the excellent "strtk" library. Supports stripping whitespace, delimited splitting, set operations, delimited field searches, case handling, comparisons and delimited file parsing.
My mock use-cases (benchmarks) show that dsv file splitting is faster with this library than with StrTk, Boost, Python and Java.
make test
make memtest
make bench
make bench_python
make bench_pypy
make bench_strtk
make bench_boost
make bench_java
make clean
sudo make install
make dist VER=X.Y.Z
# Environment:
ubuntu 12.04
g++ 4.6.3
Python 2.7.3
OpenJDK 1.6.0_24 64bit 6b24-1.11.1-4ubuntu2, 20.0-b12, mixed
Boost 1.46
i7 x990, 8 x 3.47GHz, 24GB RAM, 256 GB SSD
# Getting 20th column in a 1GB DSV:
1) c++ fsplit_stream with sync=false, vector of vectors (1.52s)
2) c++ fsplit_stream with sync=false, deque of vectors (1.74s)
3) c++ fsplit_stream with sync=true (2.0s)
4) c++ fsplit_fgets (2.0s)
5) c++ strtk, 10th column limit (20th N/A so could be worse) (2.0s)
6) pypy str.index (3.4s)
7) c++ fsplit_fread (3.9s)
8) c++ fsplit_strtok (3.9s)
9) pypy str.split (5.5s)
10) python str.split (8.8s)
11) python csv module (11.7s)
12) c++ boost split (19s)
13) python str.index (24s)
14) pypy csv module (37s)
15) c++ boost tokenize (39s)
16) java scanner (95s)
# Writing 1GB DSV:
1) c++ fprintf (11s)
2) c++ stream (19s)
3) pypy (19s)
4) python (70s)
5) java (113s)
make html
make clean
make dist VER=0.5.4
ssh -t rsz,ezstringutil@shell.sourceforge.net create
scp html/* rsz,ezstringutil@shell.sourceforge.net:/home/project-web/ezstringutil/htdocs
scp ../ezstringutil-0.5.4.tar.gz rsz,ezstringutil@shell.sourceforge.net:/home/frs/project/e/ez/ezstringutil
v0.5.4 2012-8-1
Copyright (C) 2011,2012 Remik Ziemlinski (see LICENSE.MIT)