The efficiency of C++17 in embedded development, Part 2
In the first article I shown that in C++17 it's efficient to return object by value and use std::optional instead of success flag. Another very common case is than the function can return some value or different errors could happen during the execution. On the "big" system the normal approach is "return result, if a error happens throw an exception". It's a good approach helps to make code clean, but in the embedded world the exceptions are disabled for the majority of applications. So the standard scenario - again - is to pass error code as a result and to assign actual result to the parameter passed by reference. And again, C++17 provides a better way: std::variant. It's a type-safe union constructed to handle one of many types at once, but for this scenario only two are needed: the error type and the result type. The following example shows how it can be used:
#include <variant>
enum class ErrorCode {
Ok,
Error,
AnotherError
};
using ResultType = int;
bool externalCondition();
int externalValue();
void externalConsumer(int value);
void externalLog(Result result);
ErrorCode getResultOld(ResultType& value)
{
if (externalCondition()) {
value = externalValue();
return ResultCode::Ok;
}
return ErrorCode::Error;
}
void consumerOld()
{
ResultType value;
if (const auto result = getResultOld(value); result == Result::Ok)
{
externalConsumer(value);
}
else
{
externalLog(result);
}
}
std::variant<ErrorCode, ResultType> getResultNew()
{
std::variant<ErrorCode, ResultType> result;
if (externalCondition()) {
result = externalValue();
}
result = ErrorCode::Error;
return result;
}
void consumerNew()
{
if (const auto result = getResultNew(); std::holds_alternative<ResultType>(result))
{
externalConsumer(std::get<ResultType>(result));
}
else
{
externalLog(std::get<ErrorCode>(result));
}
}
After compilation by GCC 12.2 for ESP32 it will be:
getResultOld(int&):
entry sp, 32
call8 _Z17externalConditionv
movi.n a8, 1
beqz.n a10, .L6
call8 _Z13externalValuev
s32i.n a10, a2, 0
movi.n a8, 0
.L6:
mov.n a2, a8
retw.n
consumerOld():
entry sp, 32
call8 _Z17externalConditionv
bnez.n a10, .L11
movi.n a10, 1
call8 _Z11externalLog9ErrorCode
j .L10
.L11:
call8 _Z13externalValuev
call8 _Z16externalConsumeri
.L10:
retw.n
getResultNew():
entry sp, 48
call8 _Z17externalConditionv
movi.n a8, 0
movi.n a2, 1
beq a10, a8, .L14
call8 _Z13externalValuev
mov.n a2, a10
movi.n a8, 1
.L14:
s8i a8, sp, 4
l32i.n a3, sp, 4
retw.n
consumerNew():
entry sp, 32
call8 _Z17externalConditionv
beqz.n a10, .L18
call8 _Z13externalValuev
call8 _Z16externalConsumeri
j .L17
.L18:
movi.n a10, 1
call8 _Z11externalLog9ErrorCode
.L17:
retw.n
The final assembler code is very similar for both styles. The C++ code for getResultNew() function is a bit too explicit because current compilers - especially GCC - doesn't process non-mandatory copy elision well enough for the target embedded platforms. But if you don't fight for each byte of ROM and CPU circle, the shorter form also works well:
std::variant<ErrorCode, ResultType> getResultNew()
{
if (externalCondition()) {
return externalValue();
}
return Result::Error1;
}
And finally, it's time to take a look on the another part of function call: parameters passing. Once again, it was well known that the objects shall be passed by reference. Otherwise, they will be pushed to the stack and popped back. But current C++ ABI allows to pass small objects by value in registers. While it depends on C++ABI of the target platform, for both ESP32 and STM32 the size of an object that can be passed in registers is about 4 32-bit words. Very common and obvious example of such objects are std::string_view and std::span. Both contain a pointer and a size value and fit in 2 32-bit words. Of course, to directly use std::span we need C++20. But there are no language restriction to create the same thing using C++17. There is an example from my GitHub repository:
#include <cstddef>
#include <cstdint>
#include <array>
#include <string_view>
template <typename T>
class MemoryView
{
public:
typedef T value_type;
typedef T* pointer;
typedef typename std::add_const<typename std::remove_const<T>::type>::type* const_pointer;
typedef T &reference;
typedef typename std::add_const<typename std::remove_const<T>::type>::type &const_reference;
typedef pointer iterator;
typedef const_pointer const_iterator;
typedef std::size_t size_type;
typedef std::ptrdiff_t difference_type;
constexpr MemoryView() : begin_(nullptr), size_(0) {}
constexpr MemoryView(T* p, size_type s) : begin_(p), size_(s) {}
template<typename C = std::remove_const<T>, std::enable_if_t<std::is_same_v<T, std::add_const_t<C>> || std::is_same_v<T, C>, bool> = true>
constexpr MemoryView(const MemoryView<C> &other) : begin_(other.begin()), size_(other.size()) {}
template<std::size_t N, typename C = std::remove_const<T>, std::enable_if_t<std::is_same_v<T, std::add_const_t<C>> || std::is_same_v<T, C>, bool> = true>
constexpr MemoryView(std::array<C, N> &ar) : begin_(ar.begin()), size_(ar.size()) {}
template<std::size_t N>
constexpr MemoryView(T(&ar)[N]) : begin_(ar), size_(N) {}
template<typename C = T, std::enable_if_t<std::is_same_v<char, std::remove_const_t<C>>, bool> = true>
constexpr operator std::string_view() const{ return { begin_, size_ }; }
T &operator[](size_type i) { return begin_[i]; }
constexpr T &operator[](size_type i) const { return begin_[i]; }
constexpr auto begin() const { return begin_; }
constexpr auto end() const { return begin_ + size_; }
constexpr auto size() const { return size_; }
private:
T* begin_;
size_type size_;
};
The conversions from C-style array and std::array are made implicit to simplify usage and the same is done for conversion from the non-const to const values intervals. Also the implicit conversion to std::string_view is provided if the types are compatible. And the efficiency can be shown by a following trivial case:
const char* getEnd(const char* begin, size_t size)
{
return begin + size;
}
const char* getEnd(std::string_view str)
{
return str.begin() + str.size();
}
GCC 12.2 for ESP32:
getEnd(unsigned int, char const*):
entry sp, 32
add.n a2, a3, a2
retw.n
getEnd(std::basic_string_view<char, std::char_traits<char> >):
entry sp, 48
add.n a2, a3, a2
retw.n
So, this is a point. While effectively it's the same, there is a significant difference in logic. The (properly constructed) view object is internally consistent. You have no chance occasionally miss the size or calculate the end incorrectly. You can find other examples of the embedded code written in modern way on my GitHub page: https://github.com/gdex1974