The late, great Dennis Ritchie once famously quipped, “C has the power of assembly language and the convenience of … assembly language.” I will leave a discussion of the genius of Dennis Ritchie and his manifold contribution to the world of computing for another time. My point today is to muse somewhat on why back-end systems are typically built using scripting languages or Java. Or more accurately (since I ultimately don’t know why other people make the decisions they do), to discuss my personal experiences with various backend languages, and to then to explicate some of the reasons why we decided to write our clean sheet backend in C++.
According to the most recent statistics readily proffered by a simple Google search, PHP is used in 75% of websites. Java is used in the vast majority of complex high-traffic websites. Other popular choices are of course Ruby, Python, and Node.js. And I won’t even mention the (far too many) people who use the .NET stack. I have used most of these languages/stacks/frameworks on projects of various complexities in my lifetime. In fact, the ZipZap 1.0 backend was written in Python. However, every time I dive into some usually entropy-ridden scripting codebase, I ask myself what is the value of Python, or PHP or whatever, over C? These languages are all written in C to begin with. Even the Java Virtual machine is written in C. And so, some dissenting opinions notwithstanding, it is a tautology that, for the exact same task, C will always be faster than PHP, Java, or any other language that is written in C. So why don’t people use C? There are a couple of obvious answers, and some not so obvious:
- Expertise: At some times and in some places, it is much easier to find a good PHP programmer than a good C programmer.
- Library Support: Java has a ton of good libraries. Frameworks abound for PHP. Many of libraries that scripting languages call are in fact written in C, so a bulk of the executing code is not at a disadvantage.
- Development Time: It’s way faster to write a program in a scripting language. Both because of (2) above, and because of conveniences like the “for..in” construct, or support for string-subscriptable arrays (i.e. dictionaries).
- Database: The database is often times the rate-limiting step in a web system, so optimizing code isn’t all that helpful.
- The Obvious: Character arrays (strings) and pointers.
I’ll take these concerns in reverse order. A good programmer will not have an issue with pointers. In fact they’re extremely powerful, and if you’re careful, they will significantly increase the speed and memory efficiency of your program. String limitations are, alas, a serious consideration. Even if you’re careful, you are likely to create buffer overflow potentials. The lack of higher level constructs (e.g. dictionaries) is also a limiting factor. These two limitations were enough to convince us that C is not the ideal choice for our new project. However, two newish developments in the world of C++ (C++2011, and the Boost libraries), combined with the tried and true STL and the stalwart jsoncpp library, almost entirely mitigates these limitations. We therefore decided that C++ is a better choice than C. There is a small runtime price to be paid for using these libraries over straight C, but it still outperforms the alternatives in almost all cases, and usually by a wide margin.
So what do you get for writing in C++ as a backend language? Primarly, you get runtime improvements. Almost as importantly, you get more powerful tools to manage increasing entropy. C++ is a very well structured language, and it has a rich set of OO capabilities. When properly used it is a very powerful language for organizing complex ideas. Much more so (in my experience) that PHP or Python, and even more so than Java. It is still true that writing “hello world” in PHP is way faster than writing it in C++. In fact, the first thousand lines or so of a typically backend program will be more painstaking to write in C++ than they will be to write in the scripting language of your choice. However, once you have the structure established, C++ code rolls off your keyboard like little red balls in a Verizon commercial. Number (3) really is non-linear — it’s a little harder to get started, but the curve as a function of lines of code is not nearly as steep.
So that leaves two issues: expertise and the SQL (or NoSQL) bottleneck. Finding expertise “is what it is”, as they say. ZipZap is based in Southern Arizona, where there is a lot of defense contracting. Ergo C/C++ programmers are relatively common. Finally the SQL bottleneck. I’ve taken its existence to be a given, until a few months ago when I was debugging a slow web service. This particular service uses Python/MySQL. The SQL call took 14 seconds (it’s a pretty big database, for a different project). Each processing iteration took 0.3-0.5 seconds. No big deal if you have five results. If you have 200 results (which I had), then software runtime overwhelmingly dominates! That experience validated what most software engineers intuitively know — runtime always matters.
Solomon said that the end of a matter is better than it’s beginning. We are encouraged with the early results of our architectural decisions. But in the spirit of Solomon, we’ll remain cautiously optimistic, knowing that there is still a long road ahead of us.