Archive for January, 2009|Monthly archive page
How to code – again
Well, those were some really busy weeks. Despite all that was said in my previous post, I ultimately did do truck loads of work over the last couple of months. A sense of responsibility keeps me from leaving jobs that I’ve taken up, though I sure ain’t getting any younger.
Contrary to the widespread delusion that when you work as a team, you need to work less, and you tend to have someone to cover your back when you need it covered, I rather found that it could as well be a really tough task. I took some time to analyze the partnerships I’ve been in, and thought I’d put out a few pointers that might help people code better, and may serve as a reference to any future teammates I may have
.
Most of the instances cited in this post are real. I name no names in this post, but there might be references or examples that may identify the person/persons I’m talking about. But please keep in mind that the post is not targeted at the people, but rather at the practices or lack of them thereof. The post is intended to be nothing but constructive, and I sincerely hope it doesn’t discourage anybody from doing what they do best. I’m just exploring better ways to do a few things. Few turn into coding gurus overnight.
Code in English
You become a better programmer when you realize that, like any other language, programming languages are simply a means of expressing yourself, to slightly more retarded contraptions. Whether you use C, C++, Visual Basic, Python or Ruby, you need to find ways to express your intents to their fullest. A few examples might throw some light on the issue.
Consider a function named quizcomplete, which takes a Quiz ID and User ID as argument. What does the function do? Does it check whether the user has completed the quiz, or does it mark a user’s quiz as completed? Or, as a third alternative, is it meant to be called when a user completes a quiz? Simply put, the name of the function tells you absolutely nothing you might want to know about the function. Next time, ask yourself whether you’re naming things right. A function that checks whether a quiz has been completed needs to be called something along the lines of checkQuizCompleted (or isQuizCompleted) and one that marks a quiz as completed needs to be called markQuizCompleted (or setQuizCompleted).
The same applies for variables. It’s generally good practice to keep global variable names (that is, if you really really needed global variables in the first place) long and ugly, and local variables that last for no longer than five to ten lines of code short and sweet. Don’t be scared of long names: a few extra keystrokes today will save you hours worth of debugging time later, and longer variable names don’t contribute to the memory footprint of your application
.
To take it a step further, the rule even applies to the loops you write, and the way you structure code. For example, prefer a for loop over a while when, say, you’re iterating through a list of items, or performing some computations a predetermined number of times, and prefer a while over a for in such cases where you wait for a condition to break in order to stop doing something. do .. whiles have their own set of uses.
Not so good:
int i = 0; while (i < 10) { // Do things ++i; } |
Better:
for (int i = 0; i < n; ++i) { // Do things } |
Not so good:
for ( ; a == b; ) { // Do things } |
Better:
while (a == b) { // Do things } |
Not so good:
// Do things with a while (next_permutation(a.begin(),a.end())) { // Do things with a } |
Better:
do { // Do things with a } while (next_permutation(a.begin(), a.end())); |
As another example of how you express your intents, consider how you’d reverse bits in a number:
int res = 0; while (m) { res *= 2; res = res + (m % 2); m /= 2; } |
int res = 0; while (m) { res <<= 1; res |= m & 1; m >>= 1; } |
One of my friends said she preferred the first version, and the second looks complicated. To me, the second one expresses your intentions more clearly.
Consider how a programmer (who is well acquainted with bitwise operations and operators, mind you) would read the two snippets of code:
Snippet 2: “Set res to be 0, keep shifting res left, appending the last bit from m to res, and shifting m right, till m becomes 0. Hmm, keep removing the last bit from m and attaching them to res. Reverse m’s bits! Yay!”
Snippet 1: “Set res to be 0, keep multiplying res by 2, add 1 to res if m is odd, and divide m by 2 till m becomes 0. WTF?”
If something is repetitive, it probably deserves to be in a loop
Loop is rather metaphoric here. Make use of language constructs to handle collections of objects, and operations that occur repeatedly, and represent their inter-relationships.
Just so you don’t limit this rule to loops and arrays, I’ll cite a slightly deeper example. I had to work on an online treasure hunt not too many days (hours) ago. My task was to simply go through the code, and look for things that might go wrong, cases where the code might fail.
An online treasure hunt is a rather simplistic application where a user is given some kind of a puzzle, and is expected to make his way to the next one by solving this one. Effectively, it all boils down to keeping track of users, and the level they’re on, showing them the problem, and validating their solutions. The existing system consisted of one file for each level in the game, with each file checking whether the submitted answer is correct, and if it was, redirecting to the next file, and displaying the current file’s contents if it wasn’t.
What’s wrong with this approach? For one thing, you couldn’t add or remove levels, or reorder them without considerable effort. For another, if you were to find some bug in any of the duplicated code, or were to think of a better way to do some of it, you couldn’t get away with anything less than changing all the twenty five or so files you’ve made.
How do we come up with a better solution? We try and characterize all the data that would define a level comprehensively, and try to develop a system where the levels and the process of traversing them remain independent. This way, we can add and remove levels without ever bothering about the code that guides a user through these levels. When you find something going wrong with a user’s progress through the game, you know you’re supposed to look at the user management code, rather than the level definitions. Both of our problems are solved.
It turned out that each level could be defined by just a few fields, such as the name of the level, the content to be displayed to the user viewing the level, some kind of function to check whether the answer submitted was correct, and a pointer to the next level to which the user must be directed. Moreover, most levels either posted the user’s submissions to the server, or expected the user to type in a url directly to go to the next level. An associative array that represents the field names, and their expected values could be tied to every level that submitted solutions via POST, and a string representing the url of the right answer could be tied to levels that expected urls, and two simple functions could be written to check if the user has found the right solution.
Mostly, I would’ve made a table with these many fields, and added levels there. Because I had only one night to get the whole thing running, I turned to simple PHP arrays to hold all the data. In the end, the whole system was reduced to just two files weighing in at less than a hundred and fifty lines each, and one file to hold information about levels as a PHP array, all written in just one night’s time.
Copy and Paste is not a form of Code Reuse
From the previous example, I’d gather that if you ever needed to write the same piece of code twice, you know you’re doing something wrong. Copy and paste is not code reuse, it’s quite the opposite. Code reuse is when you have just one piece of code that serves your purpose at a variety of different places. When you copy and paste something, that’s called redundancy.
Before getting to code reuse, I’m going to ask, how bad is redundancy?
Here’s the deal: in a relatively small web application I was developing a while ago, we had a registration form. So what? All web applications have registration forms. Simply define a few fields that need to be filled in, validate them, and insert them into a database. I didn’t write the form or the handler, but I was asked to add asterisks (*) to all fields that are mandatory. Hardly a few seconds’ task for anybody. Unfortunately, despite making the change, and refreshing my browser window several times, the asterisks were simply refusing to show up.
Not understanding what the problem was, I try typing in some arbitrary pieces of text into the form’s code. Still nothing. I check if I’m working on the same copy that I was looking at, whether the pages were getting cached because of url-rewriting, whether the file was write-protected and my editor didn’t realize that it couldn’t write to the file, and, after exploring many such possibilities, it turns out there were two copies of the registration form: one for the user’s first view and one for when the user has submitted some content, but it was rejected by the server because it failed validation. For a small change such as adding an asterisk, you probably wouldn’t mind doing it at two places. But what if you were adding three fields, removing two, and adding JavaScript validation rules for all fields in the form? Now that’s double the amount of work you need to do.
Coincidentally, at one point of time, the application also ended up having one copy of common.php in every folder, till I painstakingly cleaned it up. After all, why would you call something common, if there was going to be several different copies of it, all of which would be changed in different ways over time?
Frankly, having several different copies of the same code in your application is going to be one heck of a maintenance nightmare. Avoid it at all costs. When you start off, generalizing a piece of code might seem like a very tedious task, but it will be well worth the time and effort when you need to make changes later on.
And how exactly can you reuse code? You don’t need to understand templates, inheritance, or even classes to make sure you write the least amount of code possible. Besides, different languages differ in the mechanisms they provide or encourage for code reuse. In most cases, smart use of functions alone can take you a long way in this regard.
Whitespace Matters
When you write code, write it neatly, or don’t write it at all. There is nothing I hate more than having to read through horribly intended (or, well, non-indented) code that looks like an egg splattered on your screen. Free-formatting wasn’t a feature any language developer worked so hard on, that you should feel obliged to make full use of it.
“Eek! What is wrong with you?”:
int N, R; cin >> N >> R; vector< vector<int> > graph(N, vector<int>(N, INT_MAX)); for (int i = 0; i < R; ++i) { int a, b, l; cin >> a >> b >> l; --a;--b; graph[a][b] = graph[b][a] = l; }
“Where does the } start?”:
int N, R; cin >> N >> R; vector< vector<int> > graph(N, vector<int>(N, INT_MAX)); for (int i = 0; i < R; ++i) {int a, b, l; cin >> a >> b >> l; --a;--b; graph[a][b] = graph[b][a] = l;}
“Ok, now clean up that whitespace”:
int N, R; cin >> N >> R; vector< vector<int> > graph(N, vector<int>(N, INT_MAX)); for (int i = 0; i < R; ++i) { int a, b, l; cin >> a >> b >> l; --a; --b; graph[a][b] = graph[b][a] = l; }
And that’s how you write neat code.
Set standards for yourself: either use tabs consistently, or use spaces, but not both. I personally prefer using tabs to indent my code, because most editors let you choose the width of a tab, so you can change the level of indentation at any time. Since tab width varies from editor to editor, or with editor settings, you can never rely on spaces to indent code right if you’ve already used tabs somewhere.
Think Ahead
Think about all the functionality your code needs to implement before starting to write code. This way, you’ll start seeing similarities that you can exploit to write only as much code as is necessary.
For example, let’s say you’re implementing a quiz system. A quiz is divided into sections, and each section holds a set of questions. Ordering of sections and questions is important, so you need to provide Move Up and Move Down buttons for the administrator.
You might think of writing four functions, moveSectionUp, moveSectionDown, moveQuestionUp and moveQuestionDown for this, but you’ll soon see that moving something up or down is a very general problem that you can solve with a very generic piece of code. Once your generic function is ready, you can probably write some single-line wrappers to use the generic function to make your code more readable.
Extensibility
Unlike in most other trades, when it comes to developing a system, present-day needs ought to be the least of your worries. Before beginning to type away to glory, ask yourself what your system needs to do today, and what it might be expected to do in the future.
If you’re writing a CMS, you might want to extend it later by adding more types of pages, layouts, templates, support for a variety of database systems etc. Ask yourself how probable each of these kinds of additions would be, and how easy or hard it will be at that time to incorporate it. The amount of time you spend on making a system extensible would depend on how probable you think that such an extension might happen in the future.
Ideally, extending an application should mean no more than adding your new code at a certain, well-documented location, and perhaps changing a configuration setting somewhere.
Stick to Conventions
Make rules for yourself, and stick to them. These rules ought to cover everything from the way you indent code to the kinds of names you give to your types and objects, to how you split code into files. Be a fanatic. Being predictable is a virtue in this regard.
Don’t Reinvent the Wheels
What else is a virtue? Laziness.
Make use of existing libraries and frameworks to build your application. There are several plusses in doing so.
A library simply provides an abstraction over existing API and gives you functionality that you either use regularly, or that might have been quite tedious to implement using the existing API. They let you write clean code, with complexities hidden away under the more pleasant interface that it provides to your application (this is true especially in the case of JavaScript libraries, where a lot of browser-dependent code is buried deep in the bowels of the library, and you get to write a single piece of code that works well on all the most common browsers, and some cross-platform C or C++ libraries such as boost, where you get a version of the same library for different platforms, so you don’t need to maintain different versions of your code for different operating systems.).
The library would probably have been tested more extensively than you can afford for your application, and it would have seen several mistakes and fixes in its development cycle, all of which you are bound to reproduce, along with your own few innovative bugs when you rewrite the entire thing for yourself.
You get to focus your attention on your application logic, rather than having to waste time with implementation details someone else has already worked so hard to take care of for you.
Epilogue
I’ve seen many good coders crash and burn when it comes to writing a real application that people are actually going to use. Getting an application design right the very first time is something that takes a great deal of skill, vision, and a whole lot of luck, and I’ve had the chance of knowing just one guy in all these years, who could.
Designs need to bear a close correlation with the problem they’re trying to solve, while being down to Earth, and addressing all the fine implementation details. Designs can make what might be very trivial tasks outrageously difficult, or even outright impossible. You need to keep the big picture in mind, while making sure that you make room for the grass-root level components that form the veins of your system.
Sad as it may seem, I learnt most of what I know today solely from mistakes, mine or otherwise. I guess that’s the only way to learn, in this business.
- Rugged Rat
Comments (14)