One of the most powerful features that a programming language can provide is the ability to handle heterogeneous data at run time, that is values that can alternatively assume different types during program execution. This allows for a great deal of flexibility and is typical of dynamically typed languages, where variables can be treated as being of any type while the application is running. This is obviously possible thanks to dynamic type binding.
# Python
var = 1 # type(var) is 'int'
var = 1.5 # type(var) is now 'float'
var = "ciao" # type(var) is now 'str'
var = mobject() # type(var) is now 'myobject'
C++, being a statically typed language, does not allow that. The only way would be to implement some custom solution or use a third-party implementation, such as std::any
. But what if we only want a variable to represent not any type but just a fixed number of them? This would be a more optimal solution if we know the types we’ll deal with in advance as we can avoid the overhead of dynamic allocations.
The concept of a value that can hold different types simultaneously is present in several languages. Some of them have specific types for this purpose, typically called “variant”. For example, the ML language (and its variants, pun not intended) implements the notion of Algebraic Data Type to create composite types. In C++ there is an old feature that implements a similar concept: the union
type.
Unions
Unions are a special kind of structure that can hold different types of data at the same memory address. That is, all the members of the union share the same memory space. The size of the shared space is the largest size of its members. So, for example, in the following union holding a character type and an integer type
union U {
char c;
int i;
};
both of them reside in the same memory space, the size of which is the size of an int
. The values can then be assigned to the union by setting its individual members, as with a standard struct/class. By assigning a value to a member, it is actually “selected” (i.e. it becomes the active one) and can be safely referenced. Referencing other members that are not active is generally an ill-formed operation that may lead to undefined behavior, but many implementations allow that without side effects if the members are PODs.
U u{};
u.c=1 // The union is a 'char'
u.i=2 // The union is now an 'int'
Note that in a default-constructed union the first alternative is selected, but we wanted to be more explicit with the assignments for the sake of clarity. Such an implementation is not really useful for what we want to achieve. Furthermore, if the members of the union are not PODs (i.e. non-trivial objects) things start getting more baroque. For example, the following union containing an integer and a string cannot be treated the same way as in the above POD case
union U {
int i;
std::string s;
};
Trying to compile the above code will result in errors. This is because the union is now non-trivial due to s
(which is a non-trivial type) and the compiler by default deletes all non-trivial special member functions in the union that are defined by its members. This means that at least the destructor must be explicitly defined in order for the union to be destroyed.
Also, making a member active is a quirky operation. Unlike with PODs, non-trivial objects must be explicitly reconstructed each time they are activated (using “placement new
“) because overwriting them will corrupt the object layout, and their destructors must be explicitly called before activating another member since they’re neither automatic nor delete
‘ed
union U {
int i;
std::string s;
~U(){} // must be defined because deleted by default due to 's'
};
// The below code will fail
U u{};
u.i=2; // activating 'i' overwrites 's'
u.s="CIAO!"; // 's' is now corrupted
// Correct code
U u{};
u.i=2;
new (&u.s) std::string{}; // reconstruct 's'
u.s="CIAO!";
u.s.~basic_string(); // call the destructor
Another big issue is the fact that there is no trace whatsoever of which type is currently active in the union, making this solution incredibly error-prone.
Tagged Unions
Tagged unions are basically unions with a state, the state holding information on which type is currently active. They are commonly used to implement “variant” data types. This mechanism provides some sort of type-safety. Following is the tagged version of the union used above
struct tunion {
enum {
CHAR,
INT
}
tag;
union U {
char c;
int i;
}
val;
};
Using the tag, it is possible to implement some type-checking by implementing accessors and signaling the error, for example by throwing an exception
char get_c(const tunion& u) {
if(u.tag != tunion::CHAR) {
throw std::runtime_error("Invalid type access")
}
return u.val.c;
}
void set_c(tunion& u, char c) {
u.tag = tunion::CHAR;
u.val.c = c;
}
There are other solutions to implement alternative value types that do not use unions, but they are either very type-unsafe (such as those using the dreaded void*, reinterpret_cast
pair) or inefficient (e.g. those based on run-time type identification).
C++17 has introduced a new template type std::variant<T...>
that implements the concept of “variant”, sparing the programmer the headache of implementing clumsy and unsafe solutions.
The modern C++ solution: std::variant
Starting from C++17 the programmers have been given yet another tool that will make their lives easier: the std::variant
type. This new template class provides C++ with a type that allows working with heterogeneous values in a consistent and type-safe manner. A std::variant object can hold many alternative (predetermined) types, defined by its template arguments. So, for the examples mentioned earlier we would simply write
#include <variant>
class myobject{};
std::variant<int, float, std::string, myobject> v{};
v = 1; // v is now 'int' (it was by default-construction, however)
v = 1.5f; // v is now 'float'
v = "ciao"; // v is now 'string'
v = myobject(); // v is now 'myobject'
To get the value of the variant, the std::get<T>(v)
template function can be used to access its value in a safe manner. In fact, a check is performed on the active type to verify that it matches the requested one and an exception is thrown if the check fails, thus preventing any possible unexpected behavior
std::variant<int, float> v{};
v = 1;
auto iv{ std::get<int>(v) }; // ok, 'int' is the active type
auto fv{ std::get<float>(v) }; // error, 'float' is not the active type
// std::bad_variant_access is thrown
Another way to access the variant alternatives is by their index. In the above example, the int alternative would have index 0 and float index 1, so we can also use the get function like so
std::variant<int, float> v{};
v = 1;
auto iv{ std::get<0>(v) };
v = 1.5f;
auto fv{ std::get<1>(v) };
To check which alternative type the variant is currently holding the variant::index()
method is provided. Or, better for clarity, the template function std::holds_alternative<T>
if (v.index() == 0) {
// The variant is an 'int'
}
// or
if (std::holds_alternative<int>(v)) {
// The variant is an 'int'
}
Additionally, there is an even safer way to access the variant values by using the visitor pattern through std::visit
. This function accepts a callable visitor object that handles all the alternatives in the variant passed as the second argument. But unlike std::get
, it performs compile-time checks to verify that all types are handled, which provides for a more safe (and clean) programming experience.
To make things clearer, consider the following example where we want to perform different actions based on the type of alternatives in a variant
std::variant<int, std::string> v{};
...
if (std::holds_alternative<int>(v)) {
auto val{ std::get<int>(v) };
// Do something if v is an int
}
else if (std::holds_alternative<std::string>(v)) {
auto val{ std::get<std::string>(v) };
// Do something if v is a string
}
This code, aside from not being pretty (but that’s just my personal opinion), suffers from a more serious problem: if an additional alternative is added to the variant, we need to remember to handle its case. If we don’t, there is no warning issued, and bugs can start sprouting without notice. Instead, we could rewrite it as follows
struct visitor {
void operator()(std::string& a){
// Do something if v is a string
}
void operator()(int a){
// Do something if v is an int
}
};
...
std::variant<int, std::string> v{};
...
std::visit(visitor{}, v);
The active type of the variant is matched against the types handled by the visitor, like in a switch statement. However, if we add a new alternative to the variant, then we also need to add a handler for it in the visitor class. Not doing so will cause a compilation error. This idiom implements a form of “pattern matching” between data types, which benefits of compile-time type checks.
A variant should always hold a value of one of its alternative types. However, in case of exceptions during the assignment (or move) operations the variant may be left in an invalid state where it doesn’t hold anything. This can be verified using the valueless_by_exception()
method. Note that this is not always guaranteed though.
It is possible to construct an “empty” variant explicitly by using std::monostate
as one of the alternatives
std::variant<int, std::monostate> v{};
std::cout << v.index() << "\n"; // prints '0'
v = std::monostate{}; // v is now 'empty'
std::cout << v.index() << "\n"; // prints '1'
It’s also important to know that in order to create a default-constructed variant, its first alternative must be default-constructible. If it is not, then a compilation error will occur, as shown in the following code
class myobject{
public: myobject(int v) {}
};
std::variant<myobject, int> v{}; // Error: 'myobject' is not default-constructible
This can be solved by adding std::monostate
as the first alternative.
At this point, one may have noted some similarity with the concept of “polymorphism” in inheritance, where an object can assume the different forms of the derived classes. Variants do allow a sort of generic polymorphism that is not restricted to only the members of a class hierarchy but can be extended to any type.
Conclusion
The variant type introduced with C++17 provides a safe and optimized way to consistently handle heterogeneous data. This does not mean, however, that it should be used to create all-purpose variables, like in a dynamically typed language. This is still C++, and it’s also not the intent of the present article. For that purpose and if dynamic allocations don’t bother you, std::any
is probably all you need.
Instead, a variant is very useful in all those situations where there is the need to efficiently handle a value that can assume different (but predetermined) types at runtime, such as parsing input values (e.g. in a REST API, command line application, etc.) or processing events in an event-driven system by using the visitor to match events to their handlers.