Thursday, October 2, 2008

The Py-t(h)onic

Few weeks ago, out of sheer curiosity, I started to have a look at this new scripting-cum-object-oriented language capable of doing almost everything that C++ can do, and more. And I can't tell how mesmerized I am with this fantastic language. Thank god, I thought about giving Python a go. A whole new world has opened up for me. Because of the virtuosity of Python, now I am feeling very confident and capable of actually leveraging various ideas that I keep thinking about. Previously, with C++ it was a bit difficult because its core set of functionality is limited, and it's a pain to go searching for good third-party libraries for each and every aspect of desired functionality, may it be text processing, regexes, web-development, xml processing, or simply, connecting to - say - a mySQL database. Count the overhead of typing and thinking so much just for some small (maybe a PoC) snippet, and you subtly lose the all-important enthu to get it going.

With Python, it all becomes very much palpable. The language itself comes with tones of important 'batteries' of functionality, already included. You can do almost everything conceivable with Python, in a very small amount of time, and that too in a concise and goody-goody way in all aspects. Because of this, you tend to spend almost all your time on solutioning, rather than wasting it in that unnecessary wrestling-with-finer-aspects of the base language. If you are a C++ programmer, and have developed a huge liking for the language (like I have), and hence are adamant enough to even think about any other language (like I was), I would strongly recommend you to spend some time with Python. You won't regret it.

Here I don't intend to project Python as an alternative for C++ (haah!) as there will always remain a performance penalty in an interpreted language as compared to a compiled language (which happens to be closer to the system). But it also depends upon how many of our ideas are actually performance-centric. If you think a little performance gain (think quad-core) can be squandered in favor of an impressive line-up of features, Python is for you. Few of the features offered by Python are – very clean syntax, loads of functionality already included in the language, strong support for OOP paradigm, better portability. Python offers an all-in-one development platform which can be used to develop everything right from the lower-level database persistence mechanism to the web-based GUI. All this, in a very graceful way. That's one of its biggest advantages.

As you might already be aware, the new C++0x standard is coming up with a host of 'new' features for C++, like lambda functions, tuples, regex processing, multi-threading support etc. But if one has to compare, for Python, this is not new at all, it is already present in there, and is very easy to put to work. That is again one more advantage in using Python – no overhead involved for using heavy-duty functionality, with minimal error handling requirement. For example, if you want to use a dictionary in Python, all you have to do is –

Mydict = {<key1>:<value1>, <key2>:<value2>}

That's it. As simple as that. Now you can access the dictionary like a dynamic array in C/C++.

What is more, you can pack the punch in you code in a very concise manner, by clubbing a lot of functionality together. Take a look at following code –

While most of the code is easy to read and understand, for those finding it difficult to understand, let me state my intention behind writing this small snippet of code. Generally, in the unix world, most of the utilities expose their (powerful) functionality through command-line-arguments. Hence, processing these command-line-arguments is one of the most basic requirements, we can say, for a unix utility. Now neither C nor C++ offer any help in getting this done quickly. So what entails this requirement, is the head-ache of string processing, multilevel switches, and a lot of error handling. If you have worked in such a situation before, I am sure, you would empathize with me.

Now in that context, have a look at the python code snippet above. The program defines couple of functions to handle different command line args. Then it defines a dictionary with the command-line-arg and its corresponding handler as the key-value pair. After that, there is no switch, no string processing, nothing. All it does is, pass the 1st argument as a key to the dictionary, get the corresponding handler function, and call it. If required, pass on the remaining command-line-args (if any) to the handler function. All of this in one single statement! It is not only quick, it is elegant too, as you can see. If you come back to your code even after few months, you won't get lost in the quagmire of huge amount code that has very little to do with the actual intention of writing that program.

It is this utter simplicity and elegance offered by Python, that hooks you to this language.

I can tell you about one practical example at my workplace, where I could leverage Python to do an important job of analyzing a log file, very effectively, and that too very quickly.

I wanted to analyze a log file, generated by a time-profiler class which I had written to keep a watch on performance gain achieved through various tuning activities. The time-profiler utility logged method-name, number of records processed and the time taken for it all. Now since the number of calls were in the range of around 40K, visual analysis was not on the cards. So I had to write an utility which would find out the first five calls taking maximum amount of time and corresponding number of records processed, for a given method-name.

For this, I required to process the command line to understand which method is to be analyzed. Then I needed to read the log file, find out all the logs from the logfile which correspond to the given method-name and store the records-processed and time-taken figures with a one-to-one relationship. After that, one sort operation would be required to get the first five records, with maximum time-taken figures. After this sort, the one-to-one relation between the number-of-records-processed and time-taken shouldn't be lost at any cost.

Had I have implemented this in C++, I would have wasted a lot of time in writing a lot of code dealing with the nitty-gritty's of above mentioned things. Instead, I decided to give Python a try, even though I wasn't very much familiar with Python back then.

I came up with a working utility in almost half an hour doing everything mentioned above!!!

This became possible because, as I have been stating again and again, Python offers an impressive set of inbuilt functions and modules, which can effectively do away with the need to spend time on the basic needs. For example, I needn't spend much time on things like –

  1. Processing the command-line
  2. File reading and string processing to find out which lines belong to given method name
  3. Storing the time-taken and number-of-records-processed as key-value pairs in a dictionary (didn't require any STL manipulation overheads)
  4. Sorting the keys (time-taken) in descending order (didn't require any operator overloading)
  5. A LOT of error handling at every stage

Everything was intuitive and hence I was able to find out references for desired functionality, very quickly. I didn't need to spend a lot of time into reading the manuals to find the right method for the job, then check for type-compatibility (due to dynamic typing, another 'controversial' feature of Python) etc etc

That was when I realized, how productive one can get with this powerful language. Allowing the user to focus on the solution, rather than the underlying language, is the most celebrated feature of Python. And I would strongly suggest you taste (and test) it once.