Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Thursday, August 23, 2012

A NewConfigParser Module for Python


Python standard library contains an excellent ConfigParser module. It allows one to directly read the configuration variables from an INI file, without much ado and overhead. Simple and beautiful. Here is Doug Hellman's PyMOTW blog about the module, for more info.

Now so far as you have a simple requirement of independent configuration variables, you are good to go right away. But if you are used to shell scripting, and are spoilt by variable resolution there (one variable resolving from another previously defined variable, like this - $URL=$PROTOCOL://$DOMAIN:$PORT/), you sure are not satisfied. And if you are spoilt by python's opulence, you sure cannot just let it go and do with whatever is available.

I couldn't either. I had a complex configuration file, and I just couldn't think doing without variable resolution. Here is a subset of what I wanted. I had a common configuration section about a server and different set of URLs which I wanted to fire HTTP requests at (get/post/any such), on the server. Now these domains would be listed in a different 'urls' section, while server details will be listed in the 'server' section. Something like this -
[server]
domain = www.pyarabola.in
port = 80

[request-config]
protocol = http
type = get

[urls]
konkan = ${request-config.protocol}://${sever.domain}:${server.port}/coastal-konkan-day01.html
apple = ${request-config.protocol}://${sever.domain}:${server.port}/applyhypepotatopoteto.html

[requests]
get_konkan = GET ${urls.konkan}
get_apple = GET ${urls.apple}

Note: I know there will be other (better) ways to get the URLs constructed, but this is a made-up example just to make the use-case clear.

The requirement is quite straight-forward - if I want to change the server or port number or the type of request I am making, I dont want to end up changing all [requests].

As you can see, the parameters are constructed by back-referencing other parameters. And the back-referencing is multi-level - one parameter referencing another, which in turn is referencing yet another, and so on, as can be seen for the get_konkan parameter in [requests] section.

As far as I know, the current ConfigParser module (or its ConfigParser.SafeConfigParser class) doesn't allow such referencing in the INI file it parses. Enter NewConfigParser.


NewConfigParser

So I ended up extending the ConfigParser.SafeConfigParser class and overriding its get(...) method to make space for such back-referencing.

Features -
  1. Define-anywhere-use-anywhere freedom, while defining dependent options
  2. This means even if the [requests] section in the above example, is defined at the top of the INI file, before any other parameter definition, it will not complain. This is contrary to the way linux shell variables are resolved. There the used variable must have been defined apriori.

  3. Caching of resolved parameters
  4. A small optimization. It caches any resolved parameters. As a result, if get_konkan request, from above example, is resolved before get_apple request, the NewSafeConfigParser.get("requests", "get_apple") call wont again resolve the same protocol, domain or port. It would directly use it from cache.

  5. Any amount of depth in dependancies (*only limited by global recursion limit)
  6. The parameters' dependancies are resolved recursively. So the depth of dependancies is only limited by python's global recursion limit.

  7. Detection of circular dependancies
  8. All circular dependancies are detected and they result into a CircularDependancyException. It also prints the dependancy graph for you to make hay. Take a look at the circular dependacy example in the Examples section below.

  9. Same usage semantics as SafeConfigParser, i.e. variable resolution is transparant to the user
  10. Since NewSafeConfigParser is extended from SafeConfigParser, the usage semantics are same as SafeConfigParser. You would call a NewSafeConfigParser.get(...) method in the same way you would call a SafeConfigParser.get(...) method. All the referencing and parameter resolution and caching and circular dependancy detection happens behind the closed curtains. User No Bother.

  11. No section means current section
  12. Back-references are to be specified in ${section.parameter} manner for resolution. If no section is specified, e.g. new_param = ${param}, then current section (same as the new_param) is assumed, and the back referenced parameter - param - is resolved in the same section as that of new_param.


Check the Examples section below for more details.


Limitations
  1. Works only for strings as of now
  2. Only SafeConfigParser class extended as of now. Not implemented for ConfigParser.


Code: Where and How

The code is available on GitHub - https://github.com/shreyas/NewConfigParser
I was too lazy to pack it into a legitimate module and post it on PyPI for pip install (it's only a smal tweak really). So it's just a single file (as of now). You might want to copy it into you tools folder and import it as a module to start using it.


License

This was done quite long back, only to realise that I need it quite often. So thought it might be useful to someone else as well. One shouldn't just consume open-source, but contribute to it too. So am releasing it in the public domain under Apache Software License version 2.0.

Do what you want with it, just maintain that copyright notice at the top, despite the (obvious) disclaimer that I wont be responsible if your rocket hits someone's ass instead of the moon just because you resolved its controller config using this parser.


Examples
shreyas@tochukasui:~/devel/python$ python
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from NewConfigParser import SafeConfigParser as NewSafeConfigParser
>>> from ConfigParser import SafeConfigParser
>>> newparser = NewSafeConfigParser()
>>> origparser = SafeConfigParser()
>>> newparser.read("config.ini")
['config.ini']
>>> origparser.read("config.ini")
['config.ini']

>>> newparser.get("request", "request_type")
'get'
>>> origparser.get("request", "request_type")
'get'

>>> newparser.get("request", "request");
'get http://pyarabola.blogspot.in'
>>> origparser.get("request", "request");
'${request_type} ${server.url}'

>>> newparser.get("server", "url");
'http://pyarabola.blogspot.in'
>>> origparser.get("server", "url");
'${protocol}://${website}'

>>> origparser.get("server", "protocol");
'http'

>>> origparser.get("server", "website");
'${subdomain}.${host}.${tld}'
>>> newparser.get("server", "website");
'pyarabola.blogspot.in'
>>>


Circular Dependancy Detection
shreyas@tochukasui:~/devel/python$ cat circdep.ini
[humans]
drinking_water = ${govt.pipeline}

[animals]
drinking_water = ${natural.reservoire}

[govt]
pipeline = ${natural.reservoire}

[natural]
reservoire = ${river}
river = ${heavens.rains}

[heavens]
rains = ${clouds}
clouds = ${vapor}
vapor = ${natural.reservoire}

shreyas@tochukasui:~/devel/python$ cat cdep.py
#!/usr/bin/python

import NewConfigParser

p = NewConfigParser.SafeConfigParser()
p.read("circdep.ini")

try:
p.get("humans", "drinking_water")
except Exception as e:
print e

shreyas@tochukasui:~/devel/python$ ./cdep.py
Cicrular dependancy detected while resolving parameter 'natural.reservoire': natural.reservoire -> natural.river -> heavens.rains -> heavens.clouds -> heavens.vapor -> natural.reservoire
shreyas@tochukasui:~/devel/python$

Monday, July 9, 2012

Why People Choose Java over Python



I love Python for the freedom it gives to a developer, and for the way it allows one to concentrate purely on the business logic part of the solution, rather than on the nitty-gritties of the language. Coming over from C/C++ world, it's a breath of fresh air, no matter how much you love C/C++.

Despite all this coolness, this awesomeness of Python, and this black-magic-like problem solving ability that it offers, it's heart-breaking to see the adaptability of this superb language staying much lower than it deserves to be. The industry is still banking on Java for addressing its requirements.

Over some time recently, I have been trying to understand what makes people prefer Java over Python for new development, despite Python being such an excellent choice.

So from whatever little I have understood about the technological landscape so far, here are some points (without any priority as such) that I could put together, as to why java might still be a go-to rather than Python -
  • Java is compiled while Python is interpreted. When you want to deploy a webapp on a third-party web host, with Python you have to deploy all your source on the server, while in case of Java, you deploy compiled bytecode (classes and jars). And though bytecode reverse engineering may not be impossible or uncommon, it just makes access to source-code much more difficult, as compared to Python where you have to deploy production source-code as it is. People are paranoid about exposing their source code.

    CPython / Jython / IronPython might offset this shortcoming to a certain extent, but then this is an added layer, and it might come with its own set of drawbacks, limitations and bugs.

  • A Java webapp, when deployed under an app server like Tomcat, allows request pooling and hence will possibly be more responsive. For a WSGI/mod-python/FCGI hosted Python web-app on the other hand, all requests result in a new fresh invocation of the interpreter and it will be, I suspect, much slower in comparison with a request-pooled instance which can maintain its state.

  • The biggest thing, in my opinion, that goes in favor of java, is the huge ecosystem, of proven tools (development, debugging, profiling, build-management, documentation etc) and frameworks, that has been developed around it over the last decade or so. And 'proven' is really the keyword here.

  • When it comes to language constructs, Python doesnt enfoce anything. It's a come-all-do-all language where even following OOP paradigm is *voluntary*. for exmaple, encapsulation, which is a very important OOP building block, is not enforced. It's voluntary. You can use _underscores_ and __double_underscores__ if you will to specify class variable accesses, but then there is no restriction on their being accessed from outside the class definition - private/protected/public notwithstanding. So chances of a developer making a subtle but critical mistake are much more than say in java.

    Managers, product owners, and anybody for that matter, want to have the peace of mind that the software development process they are overlooking, will have stringent checks in place at the grass root level itself, so that a subtle mistake from an inexperienced developer wont go on wreaking havoc on production.

    Java to a large extent has those checks.Pointers, or rather lack of it, was one of the reasons why it gained acceptance above C++ after all. Java enforces OOP paradigms. It has static typing as opposed to duck-typing or dynamic-typing in Python. Due to enforcement of OOP, it opens up large possibilities for development methodologies, like interfaces and contracts, which, though possible in Python, will be bypass-able, due to its weak OOP support.

Again, to reiterate, I have jotted this down as per my understanding of the technological landscape. I am nowhere near being an expert in either Java or Python, and I do not claim to be one either. So there might be some errors in the way I perceive these things, and if you spot one, please point it out.

With that, I open it up to you guys. I am sure there will be lots of viewpoints as to why Java clicks, since we have such a large experienced Java population here. And there will be counter-points as to why Python clicks despite these points or why some of these points I mentioned aren't valid anymore. So let those viewpoints flow.

Thursday, October 2, 2008

The Py-t(h)onic

Few weeks ago, out of sheer curiosity, I started to have a look at this new scripting-cum-object-oriented language capable of doing almost everything that C++ can do, and more. And I can't tell how mesmerized I am with this fantastic language. Thank god, I thought about giving Python a go. A whole new world has opened up for me. Because of the virtuosity of Python, now I am feeling very confident and capable of actually leveraging various ideas that I keep thinking about. Previously, with C++ it was a bit difficult because its core set of functionality is limited, and it's a pain to go searching for good third-party libraries for each and every aspect of desired functionality, may it be text processing, regexes, web-development, xml processing, or simply, connecting to - say - a mySQL database. Count the overhead of typing and thinking so much just for some small (maybe a PoC) snippet, and you subtly lose the all-important enthu to get it going.

With Python, it all becomes very much palpable. The language itself comes with tones of important 'batteries' of functionality, already included. You can do almost everything conceivable with Python, in a very small amount of time, and that too in a concise and goody-goody way in all aspects. Because of this, you tend to spend almost all your time on solutioning, rather than wasting it in that unnecessary wrestling-with-finer-aspects of the base language. If you are a C++ programmer, and have developed a huge liking for the language (like I have), and hence are adamant enough to even think about any other language (like I was), I would strongly recommend you to spend some time with Python. You won't regret it.

Here I don't intend to project Python as an alternative for C++ (haah!) as there will always remain a performance penalty in an interpreted language as compared to a compiled language (which happens to be closer to the system). But it also depends upon how many of our ideas are actually performance-centric. If you think a little performance gain (think quad-core) can be squandered in favor of an impressive line-up of features, Python is for you. Few of the features offered by Python are – very clean syntax, loads of functionality already included in the language, strong support for OOP paradigm, better portability. Python offers an all-in-one development platform which can be used to develop everything right from the lower-level database persistence mechanism to the web-based GUI. All this, in a very graceful way. That's one of its biggest advantages.

As you might already be aware, the new C++0x standard is coming up with a host of 'new' features for C++, like lambda functions, tuples, regex processing, multi-threading support etc. But if one has to compare, for Python, this is not new at all, it is already present in there, and is very easy to put to work. That is again one more advantage in using Python – no overhead involved for using heavy-duty functionality, with minimal error handling requirement. For example, if you want to use a dictionary in Python, all you have to do is –

Mydict = {<key1>:<value1>, <key2>:<value2>}

That's it. As simple as that. Now you can access the dictionary like a dynamic array in C/C++.

What is more, you can pack the punch in you code in a very concise manner, by clubbing a lot of functionality together. Take a look at following code –

While most of the code is easy to read and understand, for those finding it difficult to understand, let me state my intention behind writing this small snippet of code. Generally, in the unix world, most of the utilities expose their (powerful) functionality through command-line-arguments. Hence, processing these command-line-arguments is one of the most basic requirements, we can say, for a unix utility. Now neither C nor C++ offer any help in getting this done quickly. So what entails this requirement, is the head-ache of string processing, multilevel switches, and a lot of error handling. If you have worked in such a situation before, I am sure, you would empathize with me.

Now in that context, have a look at the python code snippet above. The program defines couple of functions to handle different command line args. Then it defines a dictionary with the command-line-arg and its corresponding handler as the key-value pair. After that, there is no switch, no string processing, nothing. All it does is, pass the 1st argument as a key to the dictionary, get the corresponding handler function, and call it. If required, pass on the remaining command-line-args (if any) to the handler function. All of this in one single statement! It is not only quick, it is elegant too, as you can see. If you come back to your code even after few months, you won't get lost in the quagmire of huge amount code that has very little to do with the actual intention of writing that program.

It is this utter simplicity and elegance offered by Python, that hooks you to this language.

I can tell you about one practical example at my workplace, where I could leverage Python to do an important job of analyzing a log file, very effectively, and that too very quickly.

I wanted to analyze a log file, generated by a time-profiler class which I had written to keep a watch on performance gain achieved through various tuning activities. The time-profiler utility logged method-name, number of records processed and the time taken for it all. Now since the number of calls were in the range of around 40K, visual analysis was not on the cards. So I had to write an utility which would find out the first five calls taking maximum amount of time and corresponding number of records processed, for a given method-name.

For this, I required to process the command line to understand which method is to be analyzed. Then I needed to read the log file, find out all the logs from the logfile which correspond to the given method-name and store the records-processed and time-taken figures with a one-to-one relationship. After that, one sort operation would be required to get the first five records, with maximum time-taken figures. After this sort, the one-to-one relation between the number-of-records-processed and time-taken shouldn't be lost at any cost.

Had I have implemented this in C++, I would have wasted a lot of time in writing a lot of code dealing with the nitty-gritty's of above mentioned things. Instead, I decided to give Python a try, even though I wasn't very much familiar with Python back then.

I came up with a working utility in almost half an hour doing everything mentioned above!!!

This became possible because, as I have been stating again and again, Python offers an impressive set of inbuilt functions and modules, which can effectively do away with the need to spend time on the basic needs. For example, I needn't spend much time on things like –

  1. Processing the command-line
  2. File reading and string processing to find out which lines belong to given method name
  3. Storing the time-taken and number-of-records-processed as key-value pairs in a dictionary (didn't require any STL manipulation overheads)
  4. Sorting the keys (time-taken) in descending order (didn't require any operator overloading)
  5. A LOT of error handling at every stage

Everything was intuitive and hence I was able to find out references for desired functionality, very quickly. I didn't need to spend a lot of time into reading the manuals to find the right method for the job, then check for type-compatibility (due to dynamic typing, another 'controversial' feature of Python) etc etc

That was when I realized, how productive one can get with this powerful language. Allowing the user to focus on the solution, rather than the underlying language, is the most celebrated feature of Python. And I would strongly suggest you taste (and test) it once.