Saturday, March 8, 2014

Algorithms should read like pseudocode

Algorithms should read like pseudocode
When an algorithm is described like pseudo-code, there is hardly ever any mention about the underlying data-structures. No statement is made on whether a list, a dynamic array, or a binary tree should be used.
With C++, it is possible to code in such a way.

Let’s compare how you would write an algorithm in C, that accumulates numbers in an array:
int accumulate (int numbers[], int sz, int sum)
{
    int i;
    for (i=0; i<sz; i++) {
        sum += numbers[i];
    }
    return sum;
}
now, what would you do if you had to accumulate the entries of a binary tree? or a linked list?
  • would you re-write a version of accumulate() for the tree or the list? or
  • would you copy the tree or list into an array and then call accumulate()?
well, people usually choose the path of least pain:
– if you’re in a hurry, you copy the data into suitable data structure, and then move to the next task.
– if you have time, you rewrite the algorithm, find a clumsy name, like accumulate_list(), or accumulate2(), taking the risk of introducing bugs.
– If the algorithm is simple enough, you rewrite it in-place, where you need it.
a really conscious programmer might even consider writing a macro for it – for reuse. Try writing a macro for as simple an algorithm as this one! Then, try to provide a nice API to call it!
Done that? Now what would you do if you needed to accumulate over a container of doubles?
But really, if you wanted to describe the algorithm to your colleague, you would write it like:
T accumulate (IT from, IT to, T sum) 
{
    foreach entry in [from:to]
        sum += entry;
    return sum
}
well, the equivalent in C++ is:
template <typename T, typename IT>
inline
T accumulate (IT from, IT to, T sum) 
{
    for (IT iter = from; iter != to; ++iter)
        sum += *iter;
    return sum;
}
Look how you can call it:
vector<int>  v;   int    sum = 5;   sum = accumulate (v.begin(), v.end(), sum);
list<double> v;   double sum = 5;   sum = accumulate (v.begin(), v.end(), sum);
set<float>   v;   float  sum = 5;   sum = accumulate (v.begin(), v.end(), sum);
this allows you to replace the container types without changing user code.

How does that compare to a C function ?

  • readability
    apart from the first line that declares the template types, the rest is good-old C/C++ code – there are no special syntax, or characters like \, # or ## that you see in macros.
  • speed
    execution speed is definitely much faster than copying the container to an array. The function is marked as inline, allowing the optimizer to decide whether it should be inlined, or not.
    In C,
    • if accumulate() was an extern function, the optimizer would not be able to optimize.
    • if a macro was used, the function would be inlined by force, but that might not necessarily be the best action to take, especially for functions longer than 2-3 lines.
  • size of executable
    if accumulate() is implemented as an extern function:
    • in C, the code remains in the executable, even if it is not called.
    • in C++, the template function is instantiated only when it is used.
    if accumulate() is implemented as macro:
    • in C, the code is copied in every macro use.
    • In C++, some calls of the template function may be inlined, while others will share a single instance of the function.
  • code reuse
    only by using a macro (a well known bad practice) can you reuse an algorithm.
    Template provides a much better option.
Written with StackEdit.

No comments:

Post a Comment