Quantcast
Channel: Eric Niebler
Viewing all articles
Browse latest Browse all 31

Container Algorithms

$
0
0

The recent meeting of the C++ Standardization Committee in Urbana-Champaign was a watershed moment for my work on ranges. Ultimately, my presentation was well received (Herb Sutter used the phrase “palpable excitement” to describe the feeling in the room), but it wasn’t at all certain that things would go that way, and in fact an eleventh-hour addition pushed the proposal over the top: container algorithms.

Ranges, as of N4128

The existing algorithms in the C++ standard library operate eagerly. After std::transform returns, for instance, you can be sure that all the transform-y stuff is done. Some algorithms are also mutating. When you call std::sort, the data has been sorted — in place.

Not so with the range views that N4128 proposes. These are like lazily-evaluated, non-mutating algorithms that present custom views of data stored elsewhere. For instance, when you say:

std::vector<int> ints{1,2,3,4};
auto squared = ints
    | view::transform([](int i){return i*i;});

… not one iota of transforming has happened. You have just created a view that, when it’s iterated, does transformation on-the-fly, without mutating the underlying sequence.

The algorithms and the views differ in another important way: the views easily compose — filter a transformed slice? No problem! — but the algorithms don’t. Doing that sort of thing with the algorithms requires fiddling about with iterators and named temporaries, and takes several lines of chatty code.

The Missing Piece

So to sum up, in the world of N4128, we have this:

  1. Eager algorithms that can mutate but that don’t compose.
  2. Lazy algorithms that can’t mutate but do compose.
  3. ??!!!!

Whoops! Something is missing. If I want to read a bunch of ints, sort them, and make them unique, here’s what that would look like in N4128:

extern std::vector<int> read_ints();
std::vector<int> ints = read_ints();
std::sort(ints);
auto i = std::unique(ints);
ints.erase(i, ints.end());

Blech! A few people noticed this shortcoming of my proposal. A week before the meeting, I was seriously worried that this issue would derail the whole effort. I needed a solution, and quick.

Container Algorithms

The solution I presented in Urbana is container algorithms. These are composable algorithms that operate eagerly on container-like things, mutating them in-place, then forwarding them on for further processing. For instance, the read+sort+unique example looks like this with container algorithms:

std::vector<int> ints =
    read_ints() | cont::sort | cont::unique;

Much nicer. Since the container algorithm executes eagerly, it can take a vector and return a vector. The range views can’t do that.

A Moving Example

Move semantics makes all of this work smoothly. A temporary container is moved into a chain of mutating container algorithms, where it is munged and moved out, ready to be slurped up by the next container algorithm. (Naturally, performance would suffer if container algorithms were used with a container that wasn’t efficiently movable, like a big std::array. Don’t do that.)

Since container algorithms accept and return containers by value, I worried that people might do this and be surprised by the result:

std::vector<int> v{/*...*/};
// Oops, this doesn't sort v:
v | cont::sort;

The author of this code might expect this to sort v. Instead, v would be copied, the copy would be sorted, and then the result would be ignored.

Also, there’s a potential performance bug in code like below if we allow people to pass lvalues to container algorithms:

// Oops, this isn't very efficient:
std::vector<BigObject> bigvec{/*...*/};
bigvec = bigvec | cont::sort | cont::unique;

bigvec is copied when it’s passed to cont::sort by value. That’s bad! The alternative would be to have container algorithms do perfect forwarding — in which case what gets returned is a reference to bigvec. That then gets assigned back to bigvec! Assigning a container to itself is … weird. It’s guaranteed to work, but it’s not guaranteed to be efficient. An interface that makes it easy to make this mistake is a bad interface.

Instead, in my current thinking, the above code should fail to compile. The container algorithms require rvalue containers; you should move or copy a container into the chain. With range-v3, that looks like this:

using namespace ranges;
bigvec = std::move(bigvec) | cont::sort | cont::unique;

That fixes the performance problem, and also makes it pretty obvious that you ignore the return type of move(v) | cont::sort at your own peril.

I also offer this short form to apply a chain of mutating operations to a container:

bigvec |= cont::sort | cont::unique;

If you’re not a fan of the pipe syntax, this works too:

cont::unique(cont::sort(bigvec));

Both of these syntaxes will refuse to operate on temporary containers.

What is a Container?

Consider this line of code from above, which applies a chain of mutating operations to a container:

bigvec |= cont::sort | cont::unique;

How is this implemented? One simple answer is to make it a synonym for the following:

bigvec = std::move(bigvec) | cont::sort | cont::unique;

But not all containers are efficiently movable, so this would be needlessly inefficient. Instead, what gets passed around is a reference-wrapped container. Essentially, it’s implemented like this:

std::ref(bigvec) | cont::sort | cont::unique;

But cont::sort and cont::unique are container algorithms. Is a reference-wrapped container also a container, then? Un-possible!

Containers own their elements and copy them when the container is copied. A reference-wrapped container doesn’t have those semantics. It’s a range: an Iterable object that refers to elements stored elsewhere. But ref(v) | cont::sort sure seems like a reasonable thing to do.

In other words, container algorithms are misnamed! They work just fine when they are passed ranges, so long as the range provides the right operations. cont::sort needs an Iterable with elements it can permute, and that’s it. It doesn’t care at all who owns the elements.

cont::unique is also indifferent to element ownership, so long as it has a way to remove the non-unique elements. Rather than relying on an erase member function to do the erasing, we can define erase as a customization point — a free function — that any Iterable type can overload. With the appropriate overload of erase for reference-wrapped containers, std::ref(v) | cont::unique will Just Work.

The interesting (to me, at least) result of this is that containers are not interesting. Instead, we get much farther with refinements of the Iterable concept that add specific behaviors, like EraseableIterable. The container algorithms accept any Iterable that offers the right set of behaviors. They don’t care one whit who owns the elements.

Summary

Over the past month, I’ve been adding a full suite of container algorithms to my range-v3 library for things like sorting, removing elements, slicing, inserting, and more. These are eager algorithms that compose. I call them “container algorithms” since “eager, composable algorithms” doesn’t roll off the tongue — they are perfectly happy working ranges. If you want to send a non-owning slice view to cont::sort, knock yourself out.

Container algorithms fill a gaping hole in N4128. They went a long, long way to appeasing many of the committee members who dearly want ranges to solve the usability problems with the current standard algorithms. I can only assume that, had I left container algorithms out of my presentation, the reception in Urbana would have been a few degrees colder.

Acknowledgements

The design of container algorithms presented here benefited tremendously by feedback from Sean Parent.

UPDATE:

I’ve heard you! “Container algorithm” is a confusing name. They aren’t restricted to containers, and that isn’t the interesting bit anyway. The interesting bit is that they are eager, mutating, composable algorithms. There is no one pithy word that conveys all of that (AFAICT), but so far “action” has come closest. So we now have view::transform (lazy, non-mutating) and action::transform (eager, mutating). Not perfect, but better, certainly.


Viewing all articles
Browse latest Browse all 31

Trending Articles