Thursday, August 23, 2012

A NewConfigParser Module for Python


Python standard library contains an excellent ConfigParser module. It allows one to directly read the configuration variables from an INI file, without much ado and overhead. Simple and beautiful. Here is Doug Hellman's PyMOTW blog about the module, for more info.

Now so far as you have a simple requirement of independent configuration variables, you are good to go right away. But if you are used to shell scripting, and are spoilt by variable resolution there (one variable resolving from another previously defined variable, like this - $URL=$PROTOCOL://$DOMAIN:$PORT/), you sure are not satisfied. And if you are spoilt by python's opulence, you sure cannot just let it go and do with whatever is available.

I couldn't either. I had a complex configuration file, and I just couldn't think doing without variable resolution. Here is a subset of what I wanted. I had a common configuration section about a server and different set of URLs which I wanted to fire HTTP requests at (get/post/any such), on the server. Now these domains would be listed in a different 'urls' section, while server details will be listed in the 'server' section. Something like this -
[server]
domain = www.pyarabola.in
port = 80

[request-config]
protocol = http
type = get

[urls]
konkan = ${request-config.protocol}://${sever.domain}:${server.port}/coastal-konkan-day01.html
apple = ${request-config.protocol}://${sever.domain}:${server.port}/applyhypepotatopoteto.html

[requests]
get_konkan = GET ${urls.konkan}
get_apple = GET ${urls.apple}

Note: I know there will be other (better) ways to get the URLs constructed, but this is a made-up example just to make the use-case clear.

The requirement is quite straight-forward - if I want to change the server or port number or the type of request I am making, I dont want to end up changing all [requests].

As you can see, the parameters are constructed by back-referencing other parameters. And the back-referencing is multi-level - one parameter referencing another, which in turn is referencing yet another, and so on, as can be seen for the get_konkan parameter in [requests] section.

As far as I know, the current ConfigParser module (or its ConfigParser.SafeConfigParser class) doesn't allow such referencing in the INI file it parses. Enter NewConfigParser.


NewConfigParser

So I ended up extending the ConfigParser.SafeConfigParser class and overriding its get(...) method to make space for such back-referencing.

Features -
  1. Define-anywhere-use-anywhere freedom, while defining dependent options
  2. This means even if the [requests] section in the above example, is defined at the top of the INI file, before any other parameter definition, it will not complain. This is contrary to the way linux shell variables are resolved. There the used variable must have been defined apriori.

  3. Caching of resolved parameters
  4. A small optimization. It caches any resolved parameters. As a result, if get_konkan request, from above example, is resolved before get_apple request, the NewSafeConfigParser.get("requests", "get_apple") call wont again resolve the same protocol, domain or port. It would directly use it from cache.

  5. Any amount of depth in dependancies (*only limited by global recursion limit)
  6. The parameters' dependancies are resolved recursively. So the depth of dependancies is only limited by python's global recursion limit.

  7. Detection of circular dependancies
  8. All circular dependancies are detected and they result into a CircularDependancyException. It also prints the dependancy graph for you to make hay. Take a look at the circular dependacy example in the Examples section below.

  9. Same usage semantics as SafeConfigParser, i.e. variable resolution is transparant to the user
  10. Since NewSafeConfigParser is extended from SafeConfigParser, the usage semantics are same as SafeConfigParser. You would call a NewSafeConfigParser.get(...) method in the same way you would call a SafeConfigParser.get(...) method. All the referencing and parameter resolution and caching and circular dependancy detection happens behind the closed curtains. User No Bother.

  11. No section means current section
  12. Back-references are to be specified in ${section.parameter} manner for resolution. If no section is specified, e.g. new_param = ${param}, then current section (same as the new_param) is assumed, and the back referenced parameter - param - is resolved in the same section as that of new_param.


Check the Examples section below for more details.


Limitations
  1. Works only for strings as of now
  2. Only SafeConfigParser class extended as of now. Not implemented for ConfigParser.


Code: Where and How

The code is available on GitHub - https://github.com/shreyas/NewConfigParser
I was too lazy to pack it into a legitimate module and post it on PyPI for pip install (it's only a smal tweak really). So it's just a single file (as of now). You might want to copy it into you tools folder and import it as a module to start using it.


License

This was done quite long back, only to realise that I need it quite often. So thought it might be useful to someone else as well. One shouldn't just consume open-source, but contribute to it too. So am releasing it in the public domain under Apache Software License version 2.0.

Do what you want with it, just maintain that copyright notice at the top, despite the (obvious) disclaimer that I wont be responsible if your rocket hits someone's ass instead of the moon just because you resolved its controller config using this parser.


Examples
shreyas@tochukasui:~/devel/python$ python
Python 2.7.3 (default, Aug 1 2012, 05:16:07)
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from NewConfigParser import SafeConfigParser as NewSafeConfigParser
>>> from ConfigParser import SafeConfigParser
>>> newparser = NewSafeConfigParser()
>>> origparser = SafeConfigParser()
>>> newparser.read("config.ini")
['config.ini']
>>> origparser.read("config.ini")
['config.ini']

>>> newparser.get("request", "request_type")
'get'
>>> origparser.get("request", "request_type")
'get'

>>> newparser.get("request", "request");
'get http://pyarabola.blogspot.in'
>>> origparser.get("request", "request");
'${request_type} ${server.url}'

>>> newparser.get("server", "url");
'http://pyarabola.blogspot.in'
>>> origparser.get("server", "url");
'${protocol}://${website}'

>>> origparser.get("server", "protocol");
'http'

>>> origparser.get("server", "website");
'${subdomain}.${host}.${tld}'
>>> newparser.get("server", "website");
'pyarabola.blogspot.in'
>>>


Circular Dependancy Detection
shreyas@tochukasui:~/devel/python$ cat circdep.ini
[humans]
drinking_water = ${govt.pipeline}

[animals]
drinking_water = ${natural.reservoire}

[govt]
pipeline = ${natural.reservoire}

[natural]
reservoire = ${river}
river = ${heavens.rains}

[heavens]
rains = ${clouds}
clouds = ${vapor}
vapor = ${natural.reservoire}

shreyas@tochukasui:~/devel/python$ cat cdep.py
#!/usr/bin/python

import NewConfigParser

p = NewConfigParser.SafeConfigParser()
p.read("circdep.ini")

try:
p.get("humans", "drinking_water")
except Exception as e:
print e

shreyas@tochukasui:~/devel/python$ ./cdep.py
Cicrular dependancy detected while resolving parameter 'natural.reservoire': natural.reservoire -> natural.river -> heavens.rains -> heavens.clouds -> heavens.vapor -> natural.reservoire
shreyas@tochukasui:~/devel/python$