Towards the end of last millennium I went to my first job interview and was asked the following question:
Given that
a,b,canddare of typestring, what does the following line of code do?string x = a + b + c + d;
I answered that (a + b) will return a temporary string object which will be concatenated to c creating a second temporary object which in turn will be concatenated to d. The final temporary object will then be used to initialize x. This means that there will be at least 3 memory allocations.
The follow-up question was how can I make this same line perform no more than one memory allocation[1].
That was my introduction to expression templates. If you are familiar with expression templates you can skip to the point (yes there actually is a point for a change).
The idea behind expression templates is that instead of performing an expensive operation eagerly you can return a proxy object which will be able to compute the required value in a cheaper way than naïvely performing the operation.
Examples for this can be:
- Matrix and vector arithmetics, e.g. given vectors
AandBthe expression(A+B)[2]can compute only one cell of the resulting vector by addingA[2]toB[2]rather than calculating the value of all the cells. - In the string example above[2] do not create temporary string objects. Instead calculate the length of the resulting string and perform one allocation to create it when it is used.
Lets pretend that we’re writing a string class (the full code is available here). In order to facilitate the expression template I gave the string class a copy_to(buff) method which copies the string to a char* buffer and a way to construct a string by giving it a buffer so it doesn’t need to allocate anything.
Our helper class (concat) will be returned from operator+, it is temlated on its first parameter[3] in order to support arbitrary long concatenations. concat exposes a similar static interface to that of string (it has functions with the same names) so that when we say lhs.length() it will work whether lhs is a string, a concat<string> or a concat<concat<concat<string>>>.
private: template <class T> class concat { const T& lhs; const string& rhs; public: concat(const T& left, const string& right) : lhs(left) , rhs(right) {} size_t length() const { return lhs.length() + rhs.length(); } void copy_to(char* dest) const { helper::copy(lhs, dest); helper::copy(rhs, dest + lhs.length()); } operator string() const { char* buf = new char[length() + 1]; copy_to(buf); return helper::create(buf, length()); } concat<concat<T> > operator+(const string& str) const { return concat<concat<T> >(*this, str); } }; public: concat<string> operator+(const string& str) const { return concat<string>(*this, str); }
Since concat has a cast operator to string it will not break existing code that uses string concatenation.
string a("hello"), b(" "), c("world"), d("!"); string s = a + b + c + d; // OK std::cout << a + b + c + d; // Also OK
If we place a breakpoint in concat::operator string() we will see the following picture:
We can see that the type of this encodes the expression which created it (hence the name expression templates).
Since concat is a private class inside string we know that a variable of its type will never be created making it safe to hold on to references to temporary objects. We’re guaranteed that an expression template is only used within an expression (typically up to the next semicolon) and any temporaries it references will be valid as long as it exists.
string::concat x = a + b + c; // compilation error
Enter Type Inference
Wait, what was that last part again? Hasn’t C++0x added type inference? What happens if I write auto x = a + b + c;?
Now I’m screwed, not only isn’t this a compilation error, when building with no optimizations (in both VC10 and g++4.5) it seems to behave correctly even thought this is the mother of undefined behaviour, calling an object after it has been destructed is rarely never a Good Thing™.
When compiling it with optimizations in VS10 the output was rather ironic (who knew the optimizer had a sense of humour)
In order to prove to myself that this is really what’s happening (also in debug mode) I added a valid boolean member to concat which is set to true in the constructor and to false in the destructor, I then added the following lines to the beginning of copy_to
if (!valid) throw "Here there be dragons^H^H^H undefined behaviour";
Sure enough now when I use auto instead of string (and only then) the program terminates with an unhanded exception.
What does this mean?
This means that library writers can no longer depend on the fact that their private classes are never stored for longer than the expression in which they were created. This problem isn’t confined to outlandish optimizations. The standard defines that std::vector<bool> is a special case which packs the elements into bits and not real bools, this means that operator[] of vector<bool> must return a proxy type.
The following program:
#include <vector> #include <iostream> #include <limits> std::vector<bool> to_bits(unsigned int n) { const int bits = std::numeric_limits<unsigned int>::digits; std::vector<bool> ret(bits); for(int i = 0, mask = 1; i < bits; ++i, mask *= 2) ret[i] = (n & mask) != 0; return ret; } int main() { bool b = to_bits(42)[3]; auto a = to_bits(42)[3]; std::cout << std::boolalpha << b << std::endl; std::cout << std::boolalpha << a << std::endl; }
Produces the same undefined behaviour, when tested in g++ and VC10 it outputs:
true
false
And when it’s compiled in debug mode in VC10 an assert is fired from debug iterators
Unfortunately (in this context) if C#’s var keyword is any indicator people will start using auto everywhere, I know people who use var for loop variables rather than int. The fact that tools like ReSharper recommend this line of action will add to the trend.
What can we do?
I don’t think there’s much we can do, it seems too late in the standardization process to actually change anything. Perhaps future versions of C++ will allow specifying how a type behaves in a type inference (auto/decltype) situation.
string operator auto() const { return *this; }
Edit
After posting a link to this on reddit I got a lot of comments and I would like to address some of them.
Had I madeconcatnon-copy constructable theautoline would fail to compile (just like the C++03 code) and the problem wouldn’t appear. I hadn’t thought of this and it certainly alleviates the severity of the “problem” I describe.- Actually this is not true, as Geoff points out if
concathas a deleted copy constructor it can’t be returned fromoperator+.
- Actually this is not true, as Geoff points out if
- Even
ifthough the problemwasis as bad as I originally thought it was it’s still just one more gotcha in the language. All languages have these pitfalls and I don’t think that one more is a reason to invalidate the entire language (as some people seem to think).
Footnotes
- Assuming RVO which is out of scope for this post. ↩
- This optimization can’t be used on
std::stringsince the C++ standard mandates that the return value ofoperator+will be anstd::stringand not some other type. ↩ - Ignoring the fact that
a + (b+c)will have aconcatas a second parameter allows me to significantly simplify the code in this post. ↩


I think that the compilers are wrong if they accept the instantiation of a private class through type inference. Auto or not, if a class is private, it should not be available outside of its scope.
As vtj explained on reddit :
But the inferred type has a name; and a name has a scope and an access.
The inferred type’s name is already used in other cases, where the auto variable is used.
For example, if I do the following:
auto x = a + b + c;
x.bar();
Then the compiler will say to me that type Foo (assuming the type inference results to Foo type) does not have member bar().
Have you brought up this problem with anyone on the standards committee?
Nope, I saw that someone raised this in 2008 and nobody paid much attention and was too meek to go to bother the committee.
Can’t you patch it up by making concat non-copyable and non-assignable? (Can’t fill in the missing details so it actually works.) One thing is the eagerness of auto, another thing is the responsibility of the author of brittle code. (You don’t need auto to store instances of concat, because template deduction works the same way.)
Good point, library writers should be more careful.Actually it’s not such a good point as pointed out by Geoff.
As for the point about template deduction working the same way, this is true but it’s much harder to get yourself into trouble using templates to deduce returned types than with auto. If you had a function accepting a template type and passed `a+b+c` all the temporary objects would still be in scope for the duration of the function call.
I don’t think this will work. First of all, the type needs to be copyable (or at least movable), or you can’t return it from operator+. You might be able to fix this by making the copy/move constructor private, but that’s getting pretty subtle.
The bigger problem is that even if you could prevent copying, assignment, and moving, you still have the following problem:
const auto& x = a + b + c;
Since x is a reference, there’s no copying, assigning, or moving going on. Since it’s a const reference, the lifetime of the temporary returned by the expression is extended to be the lifetime of the reference, but I don’t think the lifetime of the temporary returned by (a + b) is similarly extended, so you get the same dangling-reference issue.
Excellent point! I’ve edited my post accordingly.
After reading this post I am motivated to go back and redouble my efforts at learning OCaml – a language which had type-inferrence from the beginning. It also seems like a good candidate for use in place of C++ for a lot of the projects I’m going to be working on (has the required speed, etc.)
While §7.1.6.4 of the FCD does not constrain the deduced type as if it had appeared in the original source, §11/1 still applies so that a private nested type’s “name can be used only by members and friends.” IOW, I think you found a compiler bug rather than a defect in the standard.
Funny. I had the exact same train of thought one or two years ago and I came even up with “operator auto” in the search for a fix. I started a thread to that effect in one of the c++-related usenet groups if I remember correctly. But nobody seemed to care/understand. Just recently I wrote a little library that employs expression templates. But this time I used reference-counted copy-on-write to avoid unnecessary copying and to guarantee that data was kept alive long enough. So, writing “auto expr = …;” would be at least save.
Are you slavkop? In (or around) July 2009 was the first I thought of this and was surprised to find a posting in comp.lang.c++.moderated with my thoughts stolen and taken back to December 2008…
No, but I was the first one to answer slavkop in that thread
Seems like there is more awareness of this issue than I expected/remembered. Cheers! – p
if you have function returning private object, in c++0x you can instance that object on your own:
http://codepad.org/KjpFj5rA
As for your string code, it could be fixed using simple private copy constructor and befriending string class from concat. Then auto s = a + b; will no longer be valid.
In addition to techniques like reference-counted copy-on-write one might want to try to make use of move semantics where it makes sense. std::string is a good example for that. Alternativly, one could make expression templates explicit. By that I mean that the user would have to explicitly enable them using function templates named “expr” and “eval” (or something like that). Example: auto x = eval(expr(a)+b+c+d);
This problem actually exist before the auto keyword. When you pass the expression template to a template function it deduces the type in the same way, unfortunately. I actually had the same idea of having an operator auto() for developing proxy types. This will make it capable of doing a psuedo-dot overloading also.
I use proxy types to separate getting and setting in maps(because they could be referencing memory from disk or locking memory for concurrent access). In this situation overloading the dot operator helps when you want to chain methods. Now since i cant do that i have a get and set property on each element. So I access an element like this:
auto x = map[i].get;
map[i+1].set = x;
This still lets you set values without having to use a storm of parenthesis. The same can be done for expression templates using an eval function like what pizer said.
The proxy object has a private constructor so it cant be pass through auto type deduction, but it still can be referenced in a template function. And this can still be a potential problem. One idea i’m working on around this is kinda of an operator auto() hack. A template function goes through two calls and uses a metafuntion to adjust the deduction types:
template
void foo(T& x)
{
foo_deduced<typename adjust_deduction::type>(x);
}
template
void foo_deduced(T& x)
{
//Put my logic here
}
So then you would just override the adjust_deduction metafunction, or maybe provide some kind of tagging mechanism for it. I’m trying to think of some kind of macro that would help in creating these functions.
Anyways it seems that if several people are coming up with this idea independently for an operator auto() overload, then it would make sense for the c++ committee to have a serious look at it.
Actually the line should be:
foo_deduced<typename adjust_deduction::type>(x);
cords to the rescue: https://github.com/baryluk/cords
Unfortunately I do not updated them for some time, so probably not compile on current 2,x compiler.
OH. Sorry, I somehow missed it is about C++, not D. Any way there is a similar library for C++, even in STL.
I admit not taking the time to read the implementation of your example in detail. auto though, is just supposed to be syntactic sugar. Its not supposed to change the semantics. Are you sure you could not write out the same type (that auto evaluates to) in the C++03 example? If that is possible (even if it is cumbersome) then one could argue that the example has always had this latent bug in it.
It seems to me that move semantics would probably allow you to make this work without the use of a special type. If the string’s operator+ had a version that took a std::string&& on the left-hand, then it could simply build the string in the std::string&& and therefore avoid creating extra temporaries.
I haven’t studied the deep magic around move semantics, but I think that this should work.
@Alfonse: The use of rvalue references as a function’s return type to save some temporary object creations is generally frowned upon. See http://stackoverflow.com/questions/6006527/
I wasn’t thinking about returning an r-value reference. The regular operator+ for std::string looks something like this:
std::string operator+(const std::string &lhs, const std::string &rhs)
{
std::string result(lhs);
result += rhs;
return result;
}
That first command is a copy operation. It allocates heap memory will it will then potentially reallocate thanks to rhs’s data. Instead, one could have:
std::string operator+(std::string &&lhs, const std::string &rhs)
{
std::string result(std::move(lhs));
result += rhs;
return result;
}
Yes, you have created a temporary by doing this. But thanks to decent return value optimizations in compilers, returning a std::string won’t create a new copy. You will have saved one memory allocation and a copy operation.
It ultimately won’t be as efficient as your code, as it cannot calculate the final string length. But it is still an improvement.
@Alfonse (since I can’t reply directly to his reply to pizer’s post)
The proper C++03 implementation of the operator is:
std::string
operator +(std::string _lhs, std::string const & _rhs)
{
_lhs += _rhs;
return _lhs;
}
See http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/.
While it’s true that using an rvalue for the left hand side would give you better performance than the naive implementation it will still be less optimal than the expression templates implementation. The implementation I show computes the size in advance and allocates memory only once, if we do the lhs rvalue then we reallocate the buffer for lhs every time the capacity is used up.
In any case the point I tried to make is general and the string example is just that, an example. How would you solve matrix multiplication?
I thought by “avoiding temporaries” you guys meant returning an rvalue reference to an older temporary instead of creating a new one. For a decent compiler with NRVO and other copy elisions one could apply “swaptimization”:
~~~~~~~~~~
string operator+(string x, string const& y)
{
string result; // default ctor is expected to be fast
result.swap(x); // swapping is expected to be fast
result += y;
return result; // NRVO aplicable
}
~~~~~~~~~~
Of course, I’m aware of Dave’s “want speed? pass by value” article.