Overview

A collection of functions that are frequently used in string handling programs. STL is used when possible. Locale and internationalization are not considered. This is not intended to replace or be like the excellent "strtk" library. Supports stripping whitespace, delimited splitting, set operations, delimited field searches, case handling, comparisons and delimited file parsing.

My mock use-cases (benchmarks) show that dsv file splitting is faster with this library than with StrTk, Boost, Python and Java.

Download

Source code

Git

Installation

make test
make memtest
make bench
make bench_python
make bench_pypy
make bench_strtk
make bench_boost
make bench_java
make clean
sudo make install
make dist VER=X.Y.Z

Performance

# Environment:
ubuntu 12.04
g++ 4.6.3
Python 2.7.3
OpenJDK 1.6.0_24 64bit 6b24-1.11.1-4ubuntu2, 20.0-b12, mixed
Boost 1.46
i7 x990, 8 x 3.47GHz, 24GB RAM, 256 GB SSD

# Getting 20th column in a 1GB DSV:
1) c++ fsplit_stream with sync=false, vector of vectors (1.52s)
2) c++ fsplit_stream with sync=false, deque of vectors (1.74s)
3) c++ fsplit_stream with sync=true (2.0s)
4) c++ fsplit_fgets (2.0s)
5) c++ strtk, 10th column limit (20th N/A so could be worse) (2.0s)
6) pypy str.index (3.4s)
7) c++ fsplit_fread (3.9s)
8) c++ fsplit_strtok (3.9s)
9) pypy str.split (5.5s)
10) python str.split (8.8s)
11) python csv module (11.7s)
12) c++ boost split (19s)
13) python str.index (24s)
14) pypy csv module (37s)
15) c++ boost tokenize (39s)
16) java scanner (95s)

# Writing 1GB DSV:
1) c++ fprintf (11s)
2) c++ stream (19s)
3) pypy (19s)
4) python (70s)
5) java (113s)

Distribution

make html
make clean
make dist VER=0.5.4

Publishing

ssh -t rsz,ezstringutil@shell.sourceforge.net create
scp html/* rsz,ezstringutil@shell.sourceforge.net:/home/project-web/ezstringutil/htdocs
scp ../ezstringutil-0.5.4.tar.gz rsz,ezstringutil@shell.sourceforge.net:/home/frs/project/e/ez/ezstringutil

Changelog

v0.5.4 2012-8-1

License

Copyright (C) 2011,2012 Remik Ziemlinski (see LICENSE.MIT)