Way back in 1999, when the ink on the first C++ standard was still damp, Herb Sutter posed a GoTW puzzler in the still extant C++ Report (RIP): When Is a Container Not a Container? In that article, Herb described the problems of the now-infamous vector<bool>
. According to the standard’s own container requirements, vector<bool>
is not a container.
In a nutshell, it’s because vector<bool>
‘s iterators claim to be random-access, but they’re not. Random-access iterators, when you dereference them, must return a real reference. They can only do that if the thing they point to really exists somewhere. But the bool
that a vector<bool>::iterator
points to does not exist anywhere. It’s actually a bit in a packed integer, and dereferencing a vector<bool>
‘s iterator returns an object of some type that merely acts like a bool&
without actually being a bool&
.
Herb goes so far as to say this:
[…] although a proxied collection can be an important and useful tool, by definition it must violate the standard’s container requirements and therefore can never be a conforming container.
At the end of his article, Herb suggests that people stop using vector<bool>
and use std::bitset
if they want bit-packing. But that just pushes the problem around. Why shouldn’t std::bitset
be a conforming container with random-access iterators? If proxied collections are so useful, why should we content ourselves with a standard library that treats them like second-class citizens?
A Brief History of Proxy Iterators
Herb wrote his article in 1999, so we’ve been living with this problem for a long time. Many have tried to fix it and ultimately failed for one reason or another. Mostly it’s because all the solutions have tried to be backwards compatible, shoehorning a richer iterator hierarchy into a standard that doesn’t easily allow it, or else breaking iterators themselves apart into separate objects that control traversal and element access. Each time the committee has balked, preferring instead the devil it knew.
An interesting historical note: the original STL design didn’t have the “true reference” requirement that causes the problem. Take a look at the SGI docs for the Forward Iterator concept. Nowhere does it say that *it
should be a real reference. The docs for Trivial Iterators specifically mention proxy references and say they’re legit.
Recently, a who’s who of C++ luminaries put their names on N3351, the so-called Palo Alto TR, which proposes a concept-based redesign of the STL, using the syntax of Concepts Lite. Interestingly, the Palo Alto TR is a throw-back to the original SGI design: there is no “true-reference” requirement on the return type of *it
; it merely must be convertible to const ValueType<I> &
:
// This must work, according to the Palo Alto TR const ValueType<I> & val = *it;
It’s not hard for a proxy reference type to provide such a conversion. For instance, the following compiles today:
std::vector<bool> vb{true, false, true, false}; auto it = vb.begin(); const bool & val = *it;
*it
has an implicit conversion to bool
, which binds to a const bool&
. Awesome! So the problem is solved, right? Not quite.
A Panoply of Proxy Problems
To better see the problems with proxy iterators, let’s look at a more interesting example: a zip
view. When you zip two sequences together, you get a single sequence where each element is a std::pair
of elements from the two source sequences. This can be done lazily, creating pairs on demand as the zip view is iterated:
std::vector<int> v1 { 1,2,3 }; std::vector<int> v2 { 9,8,7 }; auto z = view::zip( v1, v2 ); auto it = z.begin(); assert( *it == std::make_pair(1,9) ); assert( *++it == std::make_pair(2,8) ); assert( *++it == std::make_pair(3,7) );
Since the zip view is generating the pairs on demand, they don’t exist anywhere in memory. But the elements they refer to do! See?
std::pair<int&,int&> p = *z.begin(); assert( &p.first == &v1[0] ); assert( &p.second == &v2[0] );
The zip view is a very interesting beastie. Its reference type is pair<T&,U&>
and its value type is pair<T,U>
. This poses some very interesting challenges for the iterator concepts.
1. Values and References
Recall that the Palo Alto TR requires *it
to be convertible to const ValueType<I>&
. So we should be able to do this:
auto z = view::zip( v1, v2 ); const pair<int,int>& val = *z.begin();
That works! As it so happens, there is a conversion from std::pair<T&,U&>
to std::pair<T,U>
— but there’s a catch: it only works if T
and U
are copyable! And even when they’re not, it’s clear that copying is not the behavior one would expect when using *it
to initialize a const reference. If T
or U
is expensive to copy, you’re not going to get the performance or the behavior you expect, and if it’s unique_ptr
it’s not going to compile at all.
Requiring that an iterator’s reference type be convertible to const ValueType<I>&
is over-constraining. But then what useful thing can we say about the relationship between these two types?
2. Algorithm Constraints
All the algorithm signatures in the Palo Alto TR use ValueType
in the concept checks in order to constrain the templates. For example, here’s the constrained signature of for_each
:
template<InputIterator I, Semiregular F> requires Function<F, ValueType<I>> F for_each(I first, I last, F f);
If you’re not familiar with C++ concepts, what lines 1 and 2 say is: first
and last
must satisfy the requirements of the InputIterator
concept, F
must be Semiregular
(I’ll gloss over this bit), and it must be callable with one argument of the iterator’s value type.
Now imagine code like this:
// As before, v1 and v2 are vectors of ints: auto z = view::zip( v1, v2 ); // Let Ref be the zip iterator's reference type: using Ref = decltype(*z.begin()); // Use for_each to increment all the ints: for_each( z.begin(), z.end(), [](Ref r) { ++r.first; ++r.second; });
This seems perfectly reasonable. The lambda accepts an object of the zip view’s reference type, which is a pair<int&,int&>
, and then it increments both first and second members. But this doesn’t type-check. Why?
Remember the concept check: Function<F, ValueType<I>>
. The function we pass to for_each
must be callable with an object of the iterator’s value type. In this case, the value type is pair<int,int>
. There is no conversion from that to the type the function expects, which is pair<int&,int&>
. Bummer.
If we change the lambda to take a pair<int,int>&
, then the concept check passes, but the template will fail to instantiate correctly. It’s easy to see why when you look at a typical for_each
implementation:
template<InputIterator I, Semiregular F> requires Function<F, ValueType<I>> F for_each(I first, I last, F f) { for(; first != last; ++first) f(*first); return f; }
The lambda is called with *first
which has type pair<int&,int&>
, but that doesn’t convert to pair<int,int>&
. Gah!!!
The most galling bit is that the code we wrote above — the code with the lambda that takes the reference type — works just fine if we simply delete the requires Function<F, ValueType<I>>
constraint. Clearly something is wrong with the constraints, the concepts, or our expectations.
I should add that the problem is not specific to the zip
view. Any sequence with a proxy reference type has this problem, vector<bool>
included. If we just slap these constraints on the existing algorithms, some code that works today will break, and the only “fix” would be to stop using the standard algorithms.
3. Permutability of Move-Only Types
Unfortunately, the problems don’t end there. The sort
algorithm requires a sequence to be permutable; that is, you should be able to shuffle its elements around. And since it should support move-only types, that means that the sequence’s iterators should be indirectly-movable. The Palo Alto TR has this to say about it:
The
IndirectlyMovable
andIndirectlyCopyable
concepts describe copy and move relationships between the values of an input iterator,I
, and an output iteratorOut
. For an output iteratorout
and an input iteratorin
, their syntactic requirements expand to:
IndirectlyMovable
requires*out = move(*in)
But what if *in
returns a proxy? Then move(*in)
is moving the proxy, not the object to which the proxy refers. In the case of sorting a zip view, we’re trying to move a (temporary) pair<T&,U&>
into a pair<T&,U&>
. As with issue (1), that won’t work at all for move-only types. But you would probably fail before that, at the sort
requires clause, because of issue (2). Sheesh!
Summary, For Now…
Even though the Palo Alto TR lifts the over-constraining requirement that ForwardIterator
s return real references, the proxy iterator problem remains. On the one hand, it says that proxy iterators are OK. On the other hand, some interesting proxy iterators fail to model the Iterator
concept or satisfy the algorithm constraints, and those that do don’t have the right semantics or performance characteristics. What are our options?
- The
zip
view,vector<bool>
, and its ilk are useful, but are not legitimate containers and ranges, and the STL can’t support them, full stop; or - The iterator concepts (and probably the algorithm constraints) as specified in the Palo Alto TR need to be tweaked somehow to support proxy iterators, and some algorithm implementations probably need to change, too; or
- The language needs to change to better support proxy references (an idea from Sean Parent); or
- Something else.
I really don’t like option (1); there are too many interesting forward iterators that can’t return true references, and I’m tired of doing without. I have some rudimentary ideas about option (2) that I plan to describe in my next post. Option (3) can’t be ruled out, but IANALL (I Am Not A Language Lawyer) and have no idea what would be involved. It’s clear that with C++17 shaping up, and with the Concepts Lite TR finally reaching PDTS status <fanfare>, and a range-ified, concept-ified STL in the works, the time to start making decisions about this stuff is now.