Quantcast
Channel: Eric Niebler
Viewing all articles
Browse latest Browse all 31

A Slice of Python in C++

$
0
0

This post describes a fun piece of hackery that went into my Range-v3 library recently: a Python-like range slicing facility with cute, short syntax. It’s nothing earth-shattering from a functionality point of view, but it’s a fun little case study in library design, and it nicely illustrates my philosophy of library design.

Python Slicing

In Python, you can slice a container — that is, create a view of a contiguous subrange — using a very concise syntax. For instance:

>>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> letters
['a', 'b', 'c', 'd', 'e', 'f', 'g']
>>> # access a subrange with a slice operation
>>> letters[2:5]
['c', 'd', 'e']
>>> # replace some values
>>> letters[2:5] = ['C', 'D', 'E']
>>> letters
['a', 'b', 'C', 'D', 'E', 'f', 'g']

On line 5, we access elements of the list letters in the half-open sequence [2,5) using the syntax letters[2:5]. Short and sweet. On line 8, we assign through the slice, which mutates the underlying letters list. That proves that Python slices have reference semantics.

That’s not all that the Python slice operator can do. You can leave off slice offsets, in which case Python takes a smart default:

>>> # A missing first offset means "from the beginning"
>>> letters[:5]
['a','b','C', 'D', 'E']
>>> # A missing end offset means "to the end"
>>> letters[5:]
['f','g']

You can even slice from the end with negative offsets:

>>> # Take the last two elements:
>>> letters[-2:]

This is all pretty handy and really cool.

Old-Style Slicing in C++ with Range-v3

My range-v3 library has had a slice operation for a long time now, but it wasn’t as powerful and the syntax wasn’t as cool:

using namespace ranges;
auto letters = view::iota('a','g');
std::cout << letters << '\n';
// prints: {a,b,c,d,e,f,g}
std::cout << (letters | view::slice(2,5)) << '\n';
// prints: {c,d,e}

In the above code, view::iota is a view that generates all the characters from 'a' to 'g' (inclusive), and view::slice is a view of the elements from offset 2 through 5 (exclusive). As with Python’s slice, this slice is lightweight and non-owning.

This syntax is not terrible per se, but it’s certainly not as fun as Python’s. And view::slice didn’t accept negative offsets to slice from the end, so it wasn’t as powerful, either.

New-Style Slicing in C++ with Range-v3

First, I wanted to find a nice short-form for creating slices, so I took a page from the array_view proposal, which has a really, really clever syntax for indexing into a multi-dimensional array. Here’s an example lifted straight from the proposal:

char a[3][1][4] {{{'H', 'i'}}};
auto av = array_view<char, 3>{a};
// the following assertions hold:
assert((av.bounds() == bounds<3>{3, 1, 4}));
assert((av[{0, 0, 0}] == 'H'));

Lines 1-2 declare a 3-D array of characters and then creates a 3-D view of it. Line 5 is where the magic happens. It accesses the element at the (0,0,0) position with the slightly alien-looking av[{0,0,0}] syntax. What the heck is this?!

It’s really very simple: a novel use of uniform initialization syntax. Consider this type:

struct indices
{
    std::size_t i, j, k;
};
struct my_array_view
{
    double & operator[](indices x);
};

Now I can index into a my_array_view object with the av[{0,0,0}] syntax. Neat-o!

I realized I could use this trick to give people a super-short and cute syntax for slicing ranges. Check it out:

using namespace ranges;
auto letters = view::iota('a','g');
std::cout << letters << '\n';
// prints: {a,b,c,d,e,f,g}
std::cout << letters[{2,5}] << '\n';
// prints: {c,d,e}

Hey, that’s not half bad!

Slicing From the End, A Dilemma

But that’s not enough. I want the handy slice-from-the-end functionality. But here’s where things get a bit … interesting … from a library design perspective. Not all range types support slicing from the end. To see what I mean, consider a range of ints read from an istream. This is an input range. You don’t know the end until you reach it, which means that you don’t know the last-minus-N element until you’re N elements past it!

In other words, the following code makes no sense:

using namespace ranges;
// An input range of ints read from cin
auto ints = istream<int>(std::cin);
// I'm sorry, I can't do that, Dave:
std::cout << ints[{0,-2}] << '\n';

The istream range returned by istream totally knows at compile time that it can’t be sliced from the end. But whether the offsets are negative or positive is a runtime property, so it can’t be checked at compile time. That would make this a runtime failure. Ugh.

To make matters worse, the rules about what categories of ranges accept negative offsets are surprisingly subtle. Consider this variation of the code above:

using namespace ranges;
// Take the first 10 ints read from cin:
auto ints = istream<int>(std::cin) | view::take(10);
// This should work! It should take the first 8 ints:
std::cout << ints[{0,-2}] << '\n';

In this case, we’ve taken the first 10 integers from an istream. The ints range is still an input range, but it’s a sized input range. Now we can slice from the end because we know where the end is.

And if we have a forward range, we can always slice from the end, even if we don’t know where that is (e.g. a null-terminated string), by computing the length of the sequence and then advancing distance-minus-N from the front (although that’s not always the most efficient way to do it).

And you should never specify a negative offset if the range is infinite. Never, ever, ever.

It gets even more subtle still: if both offsets are negative, or if both offsets are non-negative, then the resulting slice knows its size in O(1); otherwise, it only knows its size if the underlying range knows its size. When the O(1)-sized-ness of a range is part of the type system, it enables all sorts of optimizations. If we don’t know the sign of the offsets until runtime, we can’t ever return a type that advertises itself as sized.

My point is that the rules for when it’s OK to slice from the end are subtle — far too subtle to leave the error reporting until runtime. And doing so leaves valuable optimizations on the floor.

Slicing From the End, A Solution

The solution I came up with was to disallow negative offsets with an unconditional assert. But wait before you flame me! I added an alternate syntax for denoting an offset from the end. Check it out:

using namespace ranges;
auto letters = view::iota('a','g');
std::cout << letters << '\n';
// prints: {a,b,c,d,e,f,g}
std::cout << letters[{2,end-2}] << '\n';
// prints: {c,d,e}

Instead of using a negative offset, we say end-2 to mean the 2nd from the end. What is end here? It’s the same end function that you call to get the end of an Iterable (think std::end), only in my library it’s not a function; it’s a function object. (For more about why I chose to make begin and end global function objects instead of free functions, see my blog post about Customization Point Design.) Since end is an object, I can define an overloaded operator- that takes end on the left-hand-side and an int on the right. That can return an object of some type that makes the from-the-end-ness of the offset a part of the type system.

struct from_end { int i; };

from_end operator-( decltype(ranges::end), int i )
{
    assert(i >= 0); // No funny business, please
    return {i};
}

Now I can define an overloaded operator[] on my range type that accepts a std::pair<int,from_end>:

struct my_range
{
    // callable as rng[{2,end-2}]
    slice_view<my_range>
    operator[](std::pair<int, from_end> p)
    {
        // ... slicing happens here
    }
};

Voilà! Now I get slicing from the end with a short, readable syntax and compile-time type checking without leaving any optimization opportunities on the floor.

Yes, But…

That’s great and all, but code like “rng[{2,-2}]” still compiles and fails at runtime. How is the situation any better? The difference now is that passing a negative offset to slice is always a runtime error. There is no situation in which it will succeed and do what you want, even if the range type could conceivably support it. Users will quickly learn that that isn’t the way to do it.

Had we allowed negative offsets in a way that sometimes worked and sometimes didn’t, it makes the interface far more dangerous. Users will try it, meet with some success, and conclude incorrectly that it will always work. They’ll discover their error the hard way after their application has been deployed.

Which brings me to my Philosophy of Library Design:

I can’t keep people from writing bad code. But I’m guilty of collusion if I make it easy.

And a corollary that relates to this problem:

If you can’t make something succeed consistently, it’s better to make it fail consistently.

Hope you enjoyed this little case study in library design.

Acknowledgements

I’d like to thank Chandler Carruth for drawing my attention to the pithy coolness of Python’s slice operator.

Footnote:

In the C++ containers, the indexing operation is only allowed for random-access containers, where the element can be accessed in O(1). Here, I’m allowing users to slice ranges with an indexing-like notation, even though it could be an O(N) operation. I’m currently undecided if slicing is sufficiently different from indexing to justify this decision. Thoughts welcome.


Viewing all articles
Browse latest Browse all 31

Trending Articles