How to Check String or String View Prefixes and Suffixes in C++20
In this article, see how to check string or string view prefixes and suffixes in C++20.
Join the DZone community and get the full member experience.
Join For FreeUp to (and including) C++17 if you wanted to check the start or the end in a string you have to use custom solutions, boost or other third-party libraries. Fortunately, this changes with C++20.
See the article where I'll show you the new functionalities and discuss a couple of examples.
This article was originally published at bfilipek.com.
Intro
Here's the main proposal that was added into C++20:
std::string/std::string_view .starts_with() and .ends_with() P0457
In the new C++ Standard, we'll get the following member functions for std::string
and std::string_view
:
xxxxxxxxxx
constexpr bool starts_with(string_view sv) const noexcept;
constexpr bool starts_with(CharT c ) const noexcept;
constexpr bool starts_with(const CharT* s ) const;
And also for suffix checking:
xxxxxxxxxx
constexpr bool ends_with(string_view sv )const noexcept;
constexpr bool ends_with(CharT c ) const noexcept;
constexpr bool ends_with(const CharT* s ) const;
As you can see, they have three overloads: for a string_view
, a single character and a string literal.
Simple example:
xxxxxxxxxx
const std::string url { "https://isocpp.org" };
// string literals
if (url.starts_with("https") && url.ends_with(".org"))
std::cout << "you're using the correct site!\n";
// a single char:
if (url.starts_with('h') && url.ends_with('g'))
std::cout << "letters matched!\n";
You can play with this basic example @Wandbox
Token Processing Example
Below, you can find an example which takes a set of HTML tokens and extracts only the text that would be rendered on that page. It skips the HTML tags and leaves only the content and also tries to preserve the line endings.
xxxxxxxxxx
int main() {
const std::vector<std::string> tokens {
"<header>",
"<h1>",
"Hello World",
"</h1>",
"<p>",
"This is my super cool new web site.",
"</p>",
"<p>",
"Have a look and try!",
"</p>",
"</header>"
};
const auto convertToEol = [](const std::string& s) {
if (s.starts_with("</h") || s.starts_with("</p"))
return std::string("\n");
return s;
};
std::vector<std::string> tokensTemp;
std::transform(tokens.cbegin(), tokens.cend(),
std::back_inserter(tokensTemp),
convertToEol);
const auto isHtmlToken = [](const std::string& s) {
return s.starts_with('<') && s.ends_with('>');
};
std::erase_if(tokensTemp, isHtmlToken); // cpp20!
for (const auto& str : tokensTemp)
std::cout << str;
return 0;
}
You can play with the code at @Wandbox
The most interesting parts:
- there's a lambda
convertToEol
which takes astring
and then returns the same string or converts that to EOL if it detects the closing HTML tag.- the lambda is then used in the
std::transform
call that converts the initial set of tokens into the temporary version.
- the lambda is then used in the
- later the temporary tokens are removed from the vector by using another predicate lambda. This time we have a simple text for an HTML token.
- you can also see the use of
std::erase_if
which works nicely on our vector, this functionality is also new to C++20. There's no need to use remove/erase pattern. - at the end we can display the final tokens that are left
Prefix and a (Sorted) Container
Let's try another use case. For example, if you have a container of strings, then you might want to search for all elements that start with a prefix.
A simple example with unsorted vector:
xxxxxxxxxx
#include <algorithm>
#include <iostream>
#include <iterator>
#include <string>
#include <string_view>
#include <vector>
int main() {
const std::vector<std::string> names { "Edith", "Soraya", "Nenita",
"Lanny", "Marina", "Clarine", "Cinda", "Mike", "Valentin",
"Sylvester", "Lois", "Yoshie", "Trinidad", "Wilton", "Horace",
"Willie", "Aleshia", "Erminia", "Maybelle", "Brittany", "Breanne"
"Kerri", "Dakota", "Roseanna", "Edra", "Estell", "Fabian"
"Arlen", "Madeleine", "Genia" }; // listofrandomnames.com
const std::string_view prefix { "M" };
const std::vector<std::string> foundNames = [&names, &prefix]{
std::vector<std::string> tmp;
std::copy_if(names.begin(), names.end(),
std::back_inserter(tmp), [&prefix](const std::string& str){
return str.starts_with(prefix);
});
return tmp;
}();
std::cout << "Names starting with \"" << prefix << "\":\n";
for (const auto& str : foundNames)
std::cout << str << ", ";
}
xxxxxxxxxx
int main() {
const std::vector<std::string> names { "Edith", "Soraya", "Nenita",
"Lanny", "Marina", "Clarine", "Cinda", "Mike", "Valentin",
"Sylvester", "Lois", "Yoshie", "Trinidad", "Wilton", "Horace",
"Willie", "Aleshia", "Erminia", "Maybelle", "Brittany", "Breanne"
"Kerri", "Dakota", "Roseanna", "Edra", "Estell", "Fabian"
"Arlen", "Madeleine", "Genia" }; // listofrandomnames.com
const std::string_view prefix { "M" };
const std::vector<std::string> foundNames = [&names, &prefix]{
std::vector<std::string> tmp;
std::copy_if(names.begin(), names.end(),
std::back_inserter(tmp), [&prefix](const std::string& str){
return str.starts_with(prefix);
});
return tmp;
}();
std::cout << "Names starting with \"" << prefix << "\":\n";
for (const auto& str : foundNames)
std::cout << str << ", ";
}
Play with code @Wandbox
In the sample code, I'm computing the foundNames
vector, which contains entries from names
that starts with a given prefix
. The code uses copy_if
with a predicated that leverages the starts_wth()
function.
On the other hand, if you want to have better complexity for this kind of queries, then it might be wiser to store those strings (or string views) in a sorted container. This happens when you have a std::map
, std::set
, or you sort your container. Then, we can use lower_bound
to quickly (logarithmically) find the first element that should match the prefix and then perform a linear search for neighbour elements.
xxxxxxxxxx
int main() {
const std::set<std::string> names { "Edith", "Soraya", "Nenita",
"Lanny", "Marina", "Clarine", "Cinda", "Mike", "Valentin",
"Sylvester", "Lois", "Yoshie", "Trinidad", "Wilton", "Horace",
"Willie", "Aleshia", "Erminia", "Maybelle", "Brittany", "Breanne"
"Kerri", "Dakota", "Roseanna", "Edra", "Estell", "Fabian"
"Arlen", "Madeleine", "Genia", "Mile", "Ala", "Edd" };
xxxxxxxxxx
// listofrandomnames.com
const std::string prefix { "Ed" };
const auto startIt = names.lower_bound(prefix);
const std::vector<std::string> foundNames = [&names, &startIt, &prefix]{
std::vector<std::string> tmp;
for (auto it = startIt; it != names.end(); ++it)
if ((*it).starts_with(prefix))
tmp.emplace_back(*it);
else
break;
return tmp;
}();
std::cout << "Names starting with \"" << prefix << "\":\n";
for (const auto& str : foundNames)
std::cout << str << ", ";
}
Play with the code @Wandbox
As a side note, you might also try a different approach which should be even faster. Rather than checking elements one by one starting from the lower bound iterator, we can also modify the last letter of the pattern in that way that it's "later" in the order. And then you can also find lower_bound from that modified pattern. Then you have two ranges and better complexity (two log(n) searchers). I'll leave that experiment for you as a "homework".
Case (in)Sensitivity
All examples that I've shown so far used regular std::string
objects and thus we could only compare strings case-sensitively. But what if you want to compare it case-insensitive?
For example, in boost there are separate functions that do the job:
In QT, similar functions take an additional argument that selects the comparison technique ( QString Class - starts_with).
In the Standard Library, we can do another way... and write your trait for the string object.
As you can recall std::string
is just a specialisation of the following template:
xxxxxxxxxx
template<class charT,
class traits = char_traits<charT>,
class Allocator = allocator<charT>>
class basic_string;
The traits
class is used for all core operations that you can perform on characters. You can implement a trait that compares strings case-insensitively.
You can find the examples in the following websites:
After implementing the trait you'll end up with a string type that is different than std::string
:
xxxxxxxxxx
using istring = std::basic_string<char, case_insensitive_trait>;
// assuming case_insensitive_trait is a proper char trait
Is that a limitation? For example, you won't be able to easily copy from std::string
into your new istring
. For some designs, it might be fine, but on the other hand, it can also be handy to have just a simple runtime parameter or a separate function that checks case-insensitively. What's your opinion on that?
Another option is to "normalise" the string and the pattern - for example, make it lowercase. This approach, unfortunately, requires to create extra copies of the strings, so might not be the best.
Sorry for a little interruption in the flow :)
I've prepared a little bonus if you're interested in Modern C++, check it out here.
Compiler Support
Most of the recent compiler vendors already support the new functionality!
Summary
In this article, you've seen how to leverage new functionality that we get with C++20: string prefix and suffix checking member functions.
You've seen a few examples, and we also discussed options if you want your comparisons to be case insensitive.
And you can read about other techniques of prefix and suffix checking in:
- How to Check If a String Is a Prefix of Another One in C++ - Fluent C++
- C++ : Check if a String starts with an another given String – thispointer.com
More from the Author:
Bartek recently published a book - "C++17 In Detail"- learn the new C++ Standard in an efficient and practical way. The book contains more than 360 pages filled with C++17 content!
Published at DZone with permission of Bartłomiej Filipek, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments