Towards the end of last millennium I went to my first job interview and was asked the following question:
Given that
a,b,canddare of typestring, what does the following line of code do?string x = a + b + c + d;
I answered that (a + b) will return a temporary string object which will be concatenated to c creating a second temporary object which in turn will be concatenated to d. The final temporary object will then be used to initialize x. This means that there will be at least 3 memory allocations.
The follow-up question was how can I make this same line perform no more than one memory allocation[1].
That was my introduction to expression templates. If you are familiar with expression templates you can skip to the point (yes there actually is a point for a change).
The idea behind expression templates is that instead of performing an expensive operation eagerly you can return a proxy object which will be able to compute the required value in a cheaper way than naïvely performing the operation.
Examples for this can be:
- Matrix and vector arithmetics, e.g. given vectors
AandBthe expression(A+B)[2]can compute only one cell of the resulting vector by addingA[2]toB[2]rather than calculating the value of all the cells. - In the string example above[2] do not create temporary string objects. Instead calculate the length of the resulting string and perform one allocation to create it when it is used.
Lets pretend that we’re writing a string class (code here). In order to facilitate the expression template I gave the string class a copy_to(buff) method which copies the string to a char* buffer and a way to construct a string by giving it a buffer so it doesn’t need to allocate anything.
Our helper class (concat) will be returned from operator+, it is templated on its first parameter[3] in order to support arbitrary long concatenations. concat exposes a similar static interface to that of string (it has functions with the same names) so that when we say lhs.length() it will work whether lhs is a string, a concat<string> or a concat<concat<concat<string>>>.
private: template <class T> class concat { const T& lhs; const string& rhs; public: concat(const T& left, const string& right) : lhs(left) , rhs(right) {} size_t length() const { return lhs.length() + rhs.length(); } void copy_to(char* dest) const { helper::copy(lhs, dest); helper::copy(rhs, dest + lhs.length()); } operator string() const { char* buf = new char[length() + 1]; copy_to(buf); return helper::create(buf, length()); } concat<concat<T> > operator+(const string& str) const { return concat<concat<T> >(*this, str); } }; public: concat<string> operator+(const string& str) const { return concat<string>(*this, str); }
Since concat has a cast operator to string it will not break existing code that uses string concatenation.
string a("hello"), b(""), c("world"), d("!"); string s = a + b + c + d; // OK std::cout << a + b + c + d; // Also OK
If we place a breakpoint in concat::operator string() we will see the following picture:
We can see that the type of this encodes the expression which created it (hence the name expression templates).
Since concat is a private class inside string we know that a variable of its type will never be created making it safe to hold on to references to temporary objects. We’re guaranteed that an expression template is only used within an expression (typically up to the next semicolon) and any temporaries it references will be valid as long as it exists.
string::concat x = a + b + c; // compilation error
Enter Type Inference
Wait, what was that last part again? Hasn’t C++0x added type inference? What happens if I write auto x = a + b + c;?
Now I’m screwed, not only isn’t this a compilation error, when building with no optimizations (in both VC10 and g++4.5) it seems to behave correctly even thought this is the mother of undefined behaviour, calling an object after it has been destructed is rarely never a Good Thing™.
When compiling it with optimizations in VS10 the output was rather ironic (who knew the optimizer had a sense of humour)
In order to prove to myself that this is really what’s happening (also in debug mode) I added a valid boolean member to concat which is set to true in the constructor and to false in the destructor, I then added the following lines to the beginning of copy_to
if (!valid) throw "Here there be dragons^H^H^H undefined behaviour";
Sure enough now when I use auto instead of string (and only then) the program terminates with an unhanded exception.
What does this mean?
This means that library writers can no longer depend on the fact that their private classes are never stored for longer than the expression in which they were created. This problem isn’t confined to outlandish optimizations. The standard defines that std::vector<bool> is a special case which packs the elements into bits and not real bools, this means that operator[] of vector<bool> must return a proxy type.
The following program:
#include <vector> #include <iostream> #include <limits> std::vector<bool> to_bits(unsigned int n) { const int bits = std::numeric_limits<unsigned int>::digits; std::vector<bool> ret(bits); for(int i = 0, mask = 1; i < bits; ++i, mask *= 2) ret[i] = (n & mask) != 0; return ret; } int main() { bool b = to_bits(42)[3]; auto a = to_bits(42)[3]; std::cout << std::boolalpha << b << std::endl; std::cout << std::boolalpha << a << std::endl; }
Produces the same undefined behaviour, when tested in g++ and VC10 it outputs:
true
false
And when it’s compiled in debug mode in VC10 an assert is fired from debug iterators
Unfortunately (in this context) if C#’s var keyword is any indicator people will start using auto everywhere, I know people who use var for loop variables rather than int. The fact that tools like ReSharper recommend this line of action will add to the trend.
What can we do?
I don’t think there’s much we can do, it seems too late in the standardization process to actually change anything. Perhaps future versions of C++ will allow specifying how a type behaves in a type inference (auto/decltype) situation.
string operator auto() const { return *this; }
Edit
After posting a link to this on reddit I got a lot of comments and I would like to address some of them.
Had I madeconcatnon-copy constructable theautoline would fail to compile (just like the C++03 code) and the problem wouldn’t appear. I hadn’t thought of this and it certainly alleviates the severity of the “problem” I describe.- Actually this is not true, as Geoff points out if
concathas a deleted copy constructor it can’t be returned fromoperator+.
- Actually this is not true, as Geoff points out if
- Even
ifthough the problemwasis as bad as I originally thought it was it’s still just one more gotcha in the language. All languages have these pitfalls and I don’t think that one more is a reason to invalidate the entire language (as some people seem to think).
Footnotes
- Assuming RVO which is out of scope for this post. ↩
- This optimization can’t be used on
std::stringsince the C++ standard mandates that the return value ofoperator+will be anstd::stringand not some other type. ↩ - Ignoring the fact that
a + (b+c)will have aconcatas a second parameter allows me to significantly simplify the code in this post. ↩


I think that the compilers are wrong if they accept the instantiation of a private class through type inference. Auto or not, if a class is private, it should not be available outside of its scope.
As vtj explained on reddit :
But the inferred type has a name; and a name has a scope and an access.
The inferred type’s name is already used in other cases, where the auto variable is used.
For example, if I do the following:
auto x = a + b + c;
x.bar();
Then the compiler will say to me that type Foo (assuming the type inference results to Foo type) does not have member bar().
Have you brought up this problem with anyone on the standards committee?
Nope, I saw that someone raised this in 2008 and nobody paid much attention and was too meek to go to bother the committee.
Can’t you patch it up by making concat non-copyable and non-assignable? (Can’t fill in the missing details so it actually works.) One thing is the eagerness of auto, another thing is the responsibility of the author of brittle code. (You don’t need auto to store instances of concat, because template deduction works the same way.)
Good point, library writers should be more careful.Actually it’s not such a good point as pointed out by Geoff.
As for the point about template deduction working the same way, this is true but it’s much harder to get yourself into trouble using templates to deduce returned types than with auto. If you had a function accepting a template type and passed `a+b+c` all the temporary objects would still be in scope for the duration of the function call.
I don’t think this will work. First of all, the type needs to be copyable (or at least movable), or you can’t return it from operator+. You might be able to fix this by making the copy/move constructor private, but that’s getting pretty subtle.
The bigger problem is that even if you could prevent copying, assignment, and moving, you still have the following problem:
const auto& x = a + b + c;
Since x is a reference, there’s no copying, assigning, or moving going on. Since it’s a const reference, the lifetime of the temporary returned by the expression is extended to be the lifetime of the reference, but I don’t think the lifetime of the temporary returned by (a + b) is similarly extended, so you get the same dangling-reference issue.
Excellent point! I’ve edited my post accordingly.
After reading this post I am motivated to go back and redouble my efforts at learning OCaml – a language which had type-inferrence from the beginning. It also seems like a good candidate for use in place of C++ for a lot of the projects I’m going to be working on (has the required speed, etc.)
While §7.1.6.4 of the FCD does not constrain the deduced type as if it had appeared in the original source, §11/1 still applies so that a private nested type’s “name can be used only by members and friends.” IOW, I think you found a compiler bug rather than a defect in the standard.
Funny. I had the exact same train of thought one or two years ago and I came even up with “operator auto” in the search for a fix. I started a thread to that effect in one of the c++-related usenet groups if I remember correctly. But nobody seemed to care/understand. Just recently I wrote a little library that employs expression templates. But this time I used reference-counted copy-on-write to avoid unnecessary copying and to guarantee that data was kept alive long enough. So, writing “auto expr = …;” would be at least save.
Are you slavkop? In (or around) July 2009 was the first I thought of this and was surprised to find a posting in comp.lang.c++.moderated with my thoughts stolen and taken back to December 2008…
No, but I was the first one to answer slavkop in that thread 🙂 Seems like there is more awareness of this issue than I expected/remembered. Cheers! – p
if you have function returning private object, in c++0x you can instance that object on your own:
http://codepad.org/KjpFj5rA
As for your string code, it could be fixed using simple private copy constructor and befriending string class from concat. Then auto s = a + b; will no longer be valid.
In addition to techniques like reference-counted copy-on-write one might want to try to make use of move semantics where it makes sense. std::string is a good example for that. Alternativly, one could make expression templates explicit. By that I mean that the user would have to explicitly enable them using function templates named “expr” and “eval” (or something like that). Example: auto x = eval(expr(a)+b+c+d);
This problem actually exist before the auto keyword. When you pass the expression template to a template function it deduces the type in the same way, unfortunately. I actually had the same idea of having an operator auto() for developing proxy types. This will make it capable of doing a psuedo-dot overloading also.
I use proxy types to separate getting and setting in maps(because they could be referencing memory from disk or locking memory for concurrent access). In this situation overloading the dot operator helps when you want to chain methods. Now since i cant do that i have a get and set property on each element. So I access an element like this:
auto x = map[i].get;
map[i+1].set = x;
This still lets you set values without having to use a storm of parenthesis. The same can be done for expression templates using an eval function like what pizer said.
The proxy object has a private constructor so it cant be pass through auto type deduction, but it still can be referenced in a template function. And this can still be a potential problem. One idea i’m working on around this is kinda of an operator auto() hack. A template function goes through two calls and uses a metafuntion to adjust the deduction types:
template
void foo(T& x)
{
foo_deduced<typename adjust_deduction::type>(x);
}
template
void foo_deduced(T& x)
{
//Put my logic here
}
So then you would just override the adjust_deduction metafunction, or maybe provide some kind of tagging mechanism for it. I’m trying to think of some kind of macro that would help in creating these functions.
Anyways it seems that if several people are coming up with this idea independently for an operator auto() overload, then it would make sense for the c++ committee to have a serious look at it.
Actually the line should be:
foo_deduced<typename adjust_deduction::type>(x);
cords to the rescue: https://github.com/baryluk/cords
Unfortunately I do not updated them for some time, so probably not compile on current 2,x compiler.
OH. Sorry, I somehow missed it is about C++, not D. Any way there is a similar library for C++, even in STL.
I admit not taking the time to read the implementation of your example in detail. auto though, is just supposed to be syntactic sugar. Its not supposed to change the semantics. Are you sure you could not write out the same type (that auto evaluates to) in the C++03 example? If that is possible (even if it is cumbersome) then one could argue that the example has always had this latent bug in it.
It seems to me that move semantics would probably allow you to make this work without the use of a special type. If the string’s operator+ had a version that took a std::string&& on the left-hand, then it could simply build the string in the std::string&& and therefore avoid creating extra temporaries.
I haven’t studied the deep magic around move semantics, but I think that this should work.
@Alfonse: The use of rvalue references as a function’s return type to save some temporary object creations is generally frowned upon. See http://stackoverflow.com/questions/6006527/
I wasn’t thinking about returning an r-value reference. The regular operator+ for std::string looks something like this:
std::string operator+(const std::string &lhs, const std::string &rhs)
{
std::string result(lhs);
result += rhs;
return result;
}
That first command is a copy operation. It allocates heap memory will it will then potentially reallocate thanks to rhs’s data. Instead, one could have:
std::string operator+(std::string &&lhs, const std::string &rhs)
{
std::string result(std::move(lhs));
result += rhs;
return result;
}
Yes, you have created a temporary by doing this. But thanks to decent return value optimizations in compilers, returning a std::string won’t create a new copy. You will have saved one memory allocation and a copy operation.
It ultimately won’t be as efficient as your code, as it cannot calculate the final string length. But it is still an improvement.
@Alfonse (since I can’t reply directly to his reply to pizer’s post)
The proper C++03 implementation of the operator is:
std::string
operator +(std::string _lhs, std::string const & _rhs)
{
_lhs += _rhs;
return _lhs;
}
See http://cpp-next.com/archive/2009/08/want-speed-pass-by-value/.
While it’s true that using an rvalue for the left hand side would give you better performance than the naive implementation it will still be less optimal than the expression templates implementation. The implementation I show computes the size in advance and allocates memory only once, if we do the lhs rvalue then we reallocate the buffer for lhs every time the capacity is used up.
In any case the point I tried to make is general and the string example is just that, an example. How would you solve matrix multiplication?
I thought by “avoiding temporaries” you guys meant returning an rvalue reference to an older temporary instead of creating a new one. For a decent compiler with NRVO and other copy elisions one could apply “swaptimization”:
~~~~~~~~~~
string operator+(string x, string const& y)
{
string result; // default ctor is expected to be fast
result.swap(x); // swapping is expected to be fast
result += y;
return result; // NRVO aplicable
}
~~~~~~~~~~
Of course, I’m aware of Dave’s “want speed? pass by value” article.
I think this is a bit of a flaw in the standard, which unfortunately now will be hard to fix.
I think the best way would have been is to disallow:
const auto x& = f();
AND
auto x&& = f();
Unless f() had a valid move or copy constructor.
That would mean that they would only be valid when
auto x = f() was valid.
Expression template classes could have then had private copy/move constructors.
Perhaps now, the best way forward to prevent issues with future code is the operator auto() like you suggested, or simply allowing classes to be tagged as “temporary”, which means they can not be copied or moved to anything which outlives their own lifetime.
Keep in mind, with move constructors now, I don’t really see the need to extend the lifetime of temporaries bound to rvalue references.
Why write:
auto x&& = f();
When you can write
auto x = f();
Which is probably clearer, and doesn’t need a move or a copy due to RVO.
I understand how const blah& = f() was useful for non-copyable objects, but I can’t really think of an object which is not copyable or movable, and even if it was, why would you want to assign it to an rvalue reference (which strongly suggests it’s moveable)?!
At the very least, blah&& x = f() should require a move or copy constructor (even if it’s not used in the end). Implementing rvalue references like that in the first place wouldn’t have broke backward compatibility, and would have allowed expression templates to be type safe in the following manner:
(1) Private move/copy constructors.
(2) Conversion operator that only takes an rvalue reference.
Then
auto x = f() will fail (no copy constructor).
auto x = eval(f()) will succeed.
auto x&& = f() would fail.
const auto& x = f() would succeed but
auto x2 = eval(f()) would fail (eval only takes an rvalue reference).
And
auto x2 = eval(std::move(f()) would fail (std::move on a const reference is still const).
However, sadly with the current standard, you can subvert the type system in the following manner.
auto x&& = f()
auto x2 = eval(std::move(f()));
And then you’ve got references to expired temporary issues. I can’t work out a way to avoid this problem.
Thanks for your detailed comments.
First thing to keep in mind is that move constructors aren’t a magic bullet, in some cases a move constructor is just as expensive as a copy constructor (e.g. std::array).
In any case I’m not sure that your examples capturing rvalue references are correct, I think that using rvalue references in places other than function parameters is usually not what one would want.
The standard actually dictates assigning to rvalue references in some places, see here:
http://stackoverflow.com/questions/9657708/c11-the-range-based-for-statement-range-init-lifetime
Personally, I don’t think assigning to rvalue references should be allowed, as assigning to a value is just as efficient due to RVO (no move or copy required).
blah x = f(), while not requiring a move or copy due to RVO, at least requires a move or copy constructor to exist (even though it will not be used).
However blah&& x = f() can be applied if f() returns a non-movable or non-copyable type, so the trick of making expression templates non-movable and non-copyable doesn’t work here.
I didn’t know you could bind a temporary to any kind of reference, I thought it was only to a const reference. You learn something new every day!
[…] Yes, and even exactly that spelling has been suggested. I’ll take that as a +1 for discoverability if we name it that! (For people reading this comment, if it doesn’t make any sense I wrote about it last year in my blog https://lanzkron.wordpress.com/2011/02/21/inferring-too-much/) […]
[…] Lanzkron, “Inferring too much.” Rate this: Like this:LikeBe the first to like this post. This entry was posted in […]
I think there’s an additional issue besides the references to objects that have expired. Consider
string prefix("Mr."); string first("Michael"); string last("Stipe"); string separator(" "); auto fullName = prefix + separator + first + separator + last; string n1 = fullName; last = string("Mills"); string n2 = fullName;Lets assume that string::concat’s references lifespan’s are extended for the duration of fullName’s. Since fullName has references to the strings, modifying ‘last’ would lead to n1 and n2 being different, not what we’d expect. The change to last would have the surprising effect of changing the value of n2 in the following line.
So this leads me to a question. Which constructor is used for fullName? is it concat<concat >(concat<concat > oldVal) (i.e. copy constructor)? or is it just assigned the last concat<concat > returned from the last operator+(conact, string) in the expression? If it’s a constructor with a single argument, the string concatenation could take place in that constructor and store that in concat instead of setting up the references to the building pieces. (same for operator=() ) But I might be missing something.
Good point DK, this will also generate different results that writing string fullName =....
As for which constructor is used of fullName, I don’t understand the question, fullName will be initialized according to the regular rules of C++, i.e. it will be either copy constructed from the last concat object returned by the + expression or the copy may be elided. In any case since we didn’t specify a copy constructor the default implementation would just copy references and not perform a deep copy.
Even if we did define a copy constructor, to perform deep copy, I don’t think we could depend on it since the compiler is allowed to elide copy constructor/destructor pairs.
Sorry for being less than clear with my question. (Didn’t help that the ‘<T>’s got treated as html). Yes, I was wondering if copy constructors could be used to enforce the string concatenation as fullName was being created, or copy elision could still happen.
To my understanding the compiler can assume that a copy constructor/destructor pair do nothing and can therefore elide both (RVO). I don’t think that C++11 has made any changes in this aspect. In fact I think I saw people say that a compiler can elide a move-constructor for RVO since such elision is more performant even than move construction.
[…] Fuente: https://lanzkron.wordpress.com/2011/02/21/inferring-too-much/ […]