Performance of std::string_view vs. std::string From C++17
Take a look at a comparison of the performance and speed of std::string_view versus std::string in C++17 and how it can be optimized.
Join the DZone community and get the full member experience.
Join For FreeHow much is std::string_view
faster than standard std::string
operations?
Have a look at a few examples where I compare std::string_view
against std::string
.
Intro
I was looking for some examples of string_view
, and after a while, I got curious about the performance gain we might get.
string_view
is conceptually only a view of the string: usually implemented as[ptr, length]
. When a string_view
is created there's no need to copy the data (as opposite when you create a copy of a string). What's more string_view
is smaller than std::string
- regarding the size on the stack/heap.
For example when we look at a possible (pseudo) implementation:
string_view { size_t _len; const CharT* _str; }
Depending on the architecture the total size is 8 or 16 bytes.
Due to Small String Optimizations, std::string
is usually 24 or 32 bytes so doubles or triple the size of string_view
. In that form, such a string can hold between 15 (GCC, MSVC) and 22 characters (Clang) without having to allocate memory on the heap. Of course larger string will use more memory, but 24/32 bytes is the minimal size of the std::string
.
You can read more details about SSO in this excellent post Exploring std::string. Or here: SSO-23 (as suggested in a comment).
Obviously returning string views, creating string views, using substr
is definitely much faster than deep copies of std::string
. However, the initial performance tests showed that std::string
is usually highly optimized and sometimes string_view
doesn't win that much.
The Series
This article is part of my series about C++17 Library Utilities. Here's the list of the other topics that I'll cover:
- Using
std::optional
- Error handling and
std::optional
- About
std::variant
- About
std::any
- In place construction for
std::optional
,std::variant
andstd::any
std::string_view
Performance (this post)- C++17 string searchers & conversion utilities
- Working with
std::filesystem
- Something more?
Resources about C++17 STL:
- C++17 - The Complete Guide by Nicolai Josuttis
- C++ Fundamentals Including C++ 17 by Kate Gregory
- Practical C++14 and C++17 Features - by Giovanni Dicanio
- C++17 STL Cookbook by Jacek Galowicz
string_view
Operations
string_view
is modelled to be very similar to std::string
. Yet, the view is non-owning, so any operation that modifies the data cannot go into the API. Here's a brief list of methods that you can use with this new type:
operator[]
at
front
back
data
size
/length
max_size
empty
remove_prefix
remove_suffix
swap
copy
(notconstexpr
)substr
- complexityO(1)
and notO(n)
as instd::string
compare
find
rfind
find_first_of
find_last_of
find_first_not_of
find_last_not_of
- operators for lexicography compare:
==, !=, <=, >=, <, >
operator <<
One important note is that all of the above methods (except for copy
and operator <<
) are also constexpr
! With this capability, you might now work with strings in constant expressions.
What's more, for C++20, we'll get at least two new methods:
starts_with
ends_with
That are implemented both for std::string_view
and std::string
. As of now (July 2018) Clang 6.0 supports those functions. So you can experiment with them.
A Basic Test — substr
substr
gives probably the best advantage over the standard string substr
. It has the complexity of O(1)
and not O(n)
as with regular strings.
I've created a basic test using Quick C++ Benchmark and got the following results:
Using Clang 6.0.0, -O3, libc++
The code:
static void StringSubStr(benchmark::State& state) { std::string s = "Hello Super Extra Programming World"; for (auto _ : state) { auto oneStr = s.substr(0, 5); auto twoStr = s.substr(6, 5); auto threeStr = s.substr(12, 5); auto fourStr = s.substr(18, 11); auto fiveStr = s.substr(30, 5); // Make sure the variable is not optimized away by compiler benchmark::DoNotOptimize(oneStr); benchmark::DoNotOptimize(twoStr); benchmark::DoNotOptimize(threeStr); benchmark::DoNotOptimize(fourStr); benchmark::DoNotOptimize(fiveStr); } }
And for string_view
:
static void StringViewSubStr(benchmark::State& state) { // Code before the loop is not measured std::string s = "Hello Super Extra Programming World"; for (auto _ : state) { std::string_view sv = s; auto oneSv = sv.substr(0, 5); auto twoSv = sv.substr(6, 5); auto threeSv = sv.substr(12, 5); auto fourSv = sv.substr(18, 11); auto fiveSv = sv.substr(30, 5); benchmark::DoNotOptimize(oneSv); benchmark::DoNotOptimize(twoSv); benchmark::DoNotOptimize(threeSv); benchmark::DoNotOptimize(fourSv); benchmark::DoNotOptimize(fiveSv); } }
Here's the full experiment: @Quick C++ Bench
For this test, we have a 10x speed-up!
Can we achieve similar results in other cases?
String Split
After the basic tests we can do one more step and try to compose a more complicated algorithm: let's take string splitting.
For this experiment I've gathered code from these resources:
Here are the two versions, one for std::string
and the second one for std::string_view
:
std::vector<std::string> split(const std::string& str, const std::string& delims = " ") { std::vector<std::string> output; auto first = std::cbegin(str); while (first != std::cend(str)) { const auto second = std::find_first_of(first, std::cend(str), std::cbegin(delims), std::cend(delims)); if (first != second) output.emplace_back(first, second); if (second == std::cend(str)) break; first = std::next(second); } return output; }
Now, the string_view
version:
std::vector<std::string_view> splitSV(std::string_view strv, std::string_view delims = " ") { std::vector<std::string_view> output; size_t first = 0; while (first < strv.size()) { const auto second = strv.find_first_of(delims, first); if (first != second) output.emplace_back(strv.substr(first, second-first)); if (second == std::string_view::npos) break; first = second + 1; } return output; }
And here's the benchmark:
const std::string_view LoremIpsumStrv{ /*one paragraph of lorem ipsum */ }; static void StringSplit(benchmark::State& state) { std::string str { LoremIpsumStrv }; for (auto _ : state) { auto v = split(str); benchmark::DoNotOptimize(v); } } // Register the function as a benchmark BENCHMARK(StringSplit); static void StringViewSplit(benchmark::State& state) { for (auto _ : state) { auto v = splitSV(LoremIpsumStrv); benchmark::DoNotOptimize(v); } } BENCHMARK(StringViewSplit);
Will we get the same 10X perf speed as in the previous benchmark...? Hmmm:
This is GCC 8.1, -O3.
A bit better with Clang 6.0.0, -O3:
A slightly better result when I run it locally in MSVC 2017:
string length: 486 test iterations: 10000 string split: 36.7115 ms string_view split: 30.2734 ms
Here's the benchmark: @Quick C++ Bench
Do you have any ideas why we do not see 10X speed up as with the initial experiment?
Of course, we cannot assume 10X is realistic in this case.
First of all, we have a container — std::vector
— that the algorithm uses to output the results. The memory allocations inside std::vector
will affect the overall speed.
If we run the iteration once, and when I override operator new
I can see the following numbers (MSVC):
string length: 486 test iterations: 1 string split: 0.011448 ms, Allocation count: 15, size 6912 string_view split: 0.006316 ms, Allocation count: 12, size 2272
We have 69 words in that string, the string
version generated 15 memory allocations (both for strings and to increase the vector
space), and in total it allocated 6912 bytes.
The strng_view
version used 12 memory allocations (only for vector
as there's no need to allocate memory for string_view
) and in total it used 2272 bytes (3x less than the std::string
version).
Some Ideas to Improve
See the comment by JFT where he reimplemented the split algorithms using raw pointers rather than iterators, and he got much more performance improvements.
Another possibility is to reserve some space up-front in the vector (and later we can use shrink_to_fit
- that way we save a lot of memory allocations.
Comparing With boost::split
:
For completeness, I also run the benchmark against boost::split
(1.67), and both of our versions are much faster:
Running on WandBox, GCC 8.1
string length: 489 test iterations: 10000 string split: 42.8627 ms, Allocation count: 110000, size 82330000 string_view split: 45.6841 ms, Allocation count: 80000, size 40800000 boost split: 117.521 ms, Allocation count: 160000, size 83930000
So the hand-crafted version is almost 3x faster than the boost.split
algorithm!
String Split and Loading From a File
You might notice that my test string is just one paragraph of "lorem ipsum." Such a simple test case might cause some additional optimizations in the compiler and produce unrealistic results.
I've found a nice post from Rainer Grimm: C++17 - Avoid Copying with std::string_view - ModernesCpp.com. In the article, he used TXT files to process strings. It's a much better idea to work on some real and large text files, rather than simple strings.
Instead of my lorem ipsum paragraph, I'm just loading a file, for example, ~540kb of text (Gutenberg project). Here's a result from a test run over that file:
string length: 547412 test iterations: 100 string split: 564.215 ms, Allocation count: 191800, size 669900000 string_view split: 363.506 ms, Allocation count: 2900, size 221262300
The test is run 100 times, so for one iteration we have 191800/100 = 1918
memory allocations (in total we use 669900000/100 = 6699000 bytes
per iteration) for std::string
.
For string_view
we have only 2900/100 = 29
memory allocations and 221262300/100 = 2212623 bytes
used per iteration.
While it's still not 10x gain, we have 3x less memory used and around a 1.5x perf boost.
Sorry for a little interruption in the flow — I've prepared a little bonus if you're interested in C++17, check it out here.
Risks With Using string_view
I think that every article about string_view
should also mention the potential risks involved with this new type:
- Taking care of the (non)null-terminated strings —
string_view
may not contain NULL at the end of the string. So you have to be prepared for such a case.- Problematic when calling functions like
atoi
,printf
that accepts null-terminated strings - Conversion into strings
- Problematic when calling functions like
- References and Temporary objects —
string_view
doesn't own the memory, so you have to be very careful when working with temporary objects.- When returning
string_view
from a function - Storing
string_view
in objects or container.
- When returning
Wrap Up
By leveraging string_view
, you can achieve a lot of performance boost in many use cases. However, it's important to know that there are caveats and sometimes the perf might be even slower as compare to std::string
!
The first thing is that string_view
doesn't own the data - thus you need to be careful, so you don't end up with references to deleted memory!
The second thing is that compilers are very smart when handling strings, especially when strings are short (so they work nicely with SSO - Small String Optimization), and in that case, the perf boost might not be that visible.
A Few Questions for You
What is your experience with string_view
performance?
Can you share some results and examples?
Published at DZone with permission of Bartłomiej Filipek, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments