more on config imports

In a previous post I mentioned importing a python file as a config file via ihooks.

Craig Ringer sent me a note, pasted below

I think there's a considerably simpler approach than the one you've taken, as well as an alternative to ihooks that's quite a bit tidier to use.

First, the simple alternative. The `import' statement pretty much just calls __builtin__.__import__(), and you're quite free to use that in your code directly. This lets you do something like this for your config importer (based off your posted code):

    import os
    import __builtin__

    try:
        config = __builtin__.__import__('config', globals(), {}, [])
    except Exception:
        # We catch any Exception because the user could do all sorts
        # of things wrong in the import. We can't even guarantee that
        # an ImportError comes from our __import__ call and not an import
        # attempt inside the config file without delving into the backtrace
        # using the traceback module.
        print "An error occurred while importing the config file."
        print "Traceback follows."
        raise

    def config_example():
        print config.some_config_dictionary["option"]

    if __name__ == "__main__":
            config_example()

It's worth noting that if I recall correctly the \_\_builtin\_\_
should be used; \_\_builtins\_\_ is specific to the CPython
implementation.

If you start having to accept an explicit path for the config file
from the user or something then ihooks is probably better. For
simple cases, \_\_builtin\_\_.\_\_import\_\_() is much quicker and
easier to understand though.

If you do decide you need more than \_\_builtin\_\_.\_\_import\_\_()
has to offer, it may be better to look into the PEP 302 import
hooks, which are much cleaner and simpler than the rather ancient
ihooks module. I should probably submit a patch to the ihook docs to
make them refer to PEP 302, actually.

Some info on PEP 302:
`http://www.python.org/doc/2.3.5/whatsnew/section-pep302.html <http://www.python.org/doc/2.3.5/whatsnew/section-pep302.html>`__

Indeed, __import__ was my first attempt at doing what I wanted; but it is a little off the semantics I wanted. I wanted to be able to specify a file on the command line with an argument --config=/path/to/file/config.py and have that loaded in. For example, __import__("/path/to/config.py") doesn't work.

Import hooks I looked into and are a possibility, but these also don't quite give the semantics I wanted. Here's a minimal example of how it might work

import sys
import imp
import os

path_to_config_file = "./testdir/config.py"

class Config:
    def __init__(self, path):
        print path
        self.path = path

    def _get_code(self, fullname):
        # path_to_config_file might have come in via our options, etc
        f = open(path_to_config_file)
        # read in the file and compile it
        c = ""
        for line in f.readlines():
            c = c + line
        code = compile(c, fullname, 'exec')
        # false means this is not a package, code is the compiled code
        return False, code

    def find_module(self, fullname):
        # sufficient to return ourselves
        print fullname
        return self

    def load_module(self, fullname):
        # get the code
        ispkg, code = self._get_code(fullname)

        # create a new module
        mod = imp.new_module(fullname)
        # add it to the modules list
        sys.modules[fullname] = mod
        # set the file
        mod.__file__ = "&lt;%s&gt;" % (path_to_config_file)
        # set the module loader
        mod.__loader__ = self

        # exec code
        exec code in mod.__dict__

        # return the module object
        return mod

sys.path_hooks.append(Config)

import aconfigfile

print aconfigfile

aconfigfile.hello()

The issue I see with this is that the import aconfigfile isn't really clearly showing what I want. I really want to be able to pass any arbitrary file name to be imported, e.g. something like import ./a/config/config.file. As far as I can see, import hooks are more designed to import files held in, say, a zip or pak archive.

Importing a python file as a config file

In some circumstances you might like to have the configuration file for your Python program actually be another Python program, especially if you want your users to be able to write really cool config files. At first glance the eval() statement looks like a nice way to do this, but unfortunatley it won't work because import isn't an expression, it's a statement.

>>> config_file = "config"
>>> eval("import " + config)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
NameError: name 'config' is not defined

You can get around this however with the ihooks library. First, create a config file called config.py with the following.

class Config:
      some_config_dictionary = {"option" : "Hello Config"}

Then you can import the config as per the following

import ihooks, os

def import_module(filename):
    loader = ihooks.BasicModuleLoader()
    path, file = os.path.split(filename)
    name, ext  = os.path.splitext(file)
    module = loader.find_module_in_dir(name, path)
    if not module:
        raise ImportError, name
    module = loader.load_module(name, module)
    return module

def config_example():
    config = import_module("config.py").Config
    print config.some_config_dictionary["option"]

if __name__ == "__main__":
        config_example()

All going well, you should print out Hello, Config!

On swig and size_t

Generally, when passing the size of something around it's good to use size_t and, should that something be a blob (in the binary object type sense) it probably wants to be a void*.

However, the cstring extension to SWIG uses int as sizes and char* for data, for example

%cstring_output_withsize(parm, maxparm): This macro is used to handle bounded character output functions where both a char * and a pointer int * are passed. Initially, the int * parameter points to a value containing the maximum size. On return, this value is assumed to contain the actual number of bytes. As input, a user simply supplies the maximum length. The output value is a string that may contain binary data.

You could potentially create your own typemaps to handle this and re-write large parts of cstring SWIG interface, but the point would be moot because by the time it got back to the Python API it has to be an int anyway; e.g. calls like PyObject* PyString_FromStringAndSize(const char *v, int len) all take an int. Since Python supports binary strings everything should be a char* too (this is less critical, but if you want to build with -Wall -Werror as you should, you'll need to make sure the types are right).

I would reccommend not following some of the SWIG instructions about doing your own typedef for size_t. This seems fraught with danger and you're only going to be calling Python API functions that expect an int anyway. Be aware that if you really have a need to be passing something around with a size that doesn't fit in an int, you'll have some work to do; otherwise design your API with the right types for the Python API.

Python SocketServer class

The socketserver class is very nifty, but the inbuilt documentation is a bit obscure. Below is a simple SocketServer application that simply listens on port 7000; run it and telnet localhost 7000 and you should be greeted by a message; type HELLO and you should get a message back, and QUIT should end the whole thing.

import SocketServer, time, select, sys
from threading import Thread

COMMAND_HELLO = 1
COMMAND_QUIT  = 2

# The SimpleRequestHandler class uses this to parse command lines.
class SimpleCommandProcessor:
    def __init__(self):
        pass

    def process(self, line, request):
        """Process a command"""
        args = line.split(' ')
        command = args[0].lower()
        args = args[1:]

        if command == 'hello':
            request.send('HELLO TO YOU TO!\n\r')
            return COMMAND_HELLO
        elif command == 'quit':
            request.send('OK, SEE YOU LATER\n\r')
            return COMMAND_QUIT
        else:
            request.send('Unknown command: "%s"\n\r' % command)


# SimpleServer extends the TCPServer, using the threading mix in
# to create a new thread for every request.
class SimpleServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):

    # This means the main server will not do the equivalent of a
    # pthread_join() on the new threads.  With this set, Ctrl-C will
    # kill the server reliably.
    daemon_threads = True

    # By setting this we allow the server to re-bind to the address by
    # setting SO_REUSEADDR, meaning you don't have to wait for
    # timeouts when you kill the server and the sockets don't get
    # closed down correctly.
    allow_reuse_address = True

    def __init__(self, server_address, RequestHandlerClass, processor, message=''):
        SocketServer.TCPServer.__init__(self, server_address, RequestHandlerClass)
        self.processor = processor
        self.message = message

# The RequestHandler handles an incoming request.  We have extended in
# the SimpleServer class to have a 'processor' argument which we can
# access via the passed in server argument, but we could have stuffed
# all the processing in here too.
class SimpleRequestHandler(SocketServer.BaseRequestHandler):

    def __init__(self, request, client_address, server):
        SocketServer.BaseRequestHandler.__init__(self, request, client_address, server)

    def handle(self):
        self.request.send(self.server.message)

        ready_to_read, ready_to_write, in_error = select.select([self.request], [], [], None)

        text = ''
        done = False
        while not done:

            if len(ready_to_read) == 1 and ready_to_read[0] == self.request:
                data = self.request.recv(1024)

                if not data:
                    break
                elif len(data) > 0:
                    text += str(data)

                    while text.find("\n") != -1:
                        line, text = text.split("\n", 1)
                        line = line.rstrip()

                        command = self.server.processor.process(line,
                                                                self.request)

                        if command == COMMAND_HELLO:
                            break
                        elif command == COMMAND_QUIT:
                            done = True
                            break

        self.request.close()

    def finish(self):
       """Nothing"""

def runSimpleServer():
    # Start up a server on localhost, port 7000; each time a new
    # request comes in it will be handled by a SimpleRequestHandler
    # class; we pass in a SimpleCommandProcessor class that will be
    # able to be accessed in request handlers via server.processor;
    # and a hello message.
    server = SimpleServer(('', 7000), SimpleRequestHandler,
                          SimpleCommandProcessor(), 'Welcome to the SimpleServer.\n\r')

    try:
        server.serve_forever()
    except KeyboardInterrupt:
        sys.exit(0)

if __name__ == '__main__':
    runSimpleServer()

Half baked Python Mutex

If you find yourself in the following situation : you have to use separate processes because Python doesn't really support sending signals with multiple threads but you need some sort of mutual exclusion between these processes. Your additional Iron Python ingredient is that you don't really want to have dependencies on any external modules not shipped with Python that implement SysV IPC or shared objects (though this is probably the most correct solution).

If you're a Python programmer and not a C programmer, you may not realise that mmap can map memory anonymously as it's not mentioned in the help. In this case, anonymously means that it is not really backed by a file, but by system memory. This doesn't seem to be documented in the Python documenation, but will work with most any reasonable Unix system.

Here is a Half Baked Mutex based on anonymous shared memory. This version really is half baked, because it uses the "Bakery Algorithm" designed by Lamport (which Peter Chubb taught me about in distributed systems). It differs slightly though -- our maximum ticket number must be less than 256 because of interactions between the python mmap, which treats the mmaped area as a string and the ascii character set (via ord and chr). This means heavily contested locks will be a bit "jerky" as the system waits to free up a few slots before continuing. We have a smattering of calls to sleep to attempt to reduce busy waiting. The only other caveat is "hashing" the PID down to a 1024 byte array -- a collision would probably be fatal to the algorithm.

It's not perfect, but it might something like it might get you out of a bind.

import os
import sys
from time import sleep

class HalfBakedMutex:
    def __init__(self):
        import mmap
        #C is an array of people waiting for a ticket
        self.C = mmap.mmap(-1, 1024,
                                  mmap.MAP_SHARED|mmap.MAP_ANONYMOUS)
        #N is list of tickets people are holding
        self.N = mmap.mmap(-1, 1024,
                           mmap.MAP_SHARED|mmap.MAP_ANONYMOUS)

    def take_a_number(self):
        #pick a number to "see the baker"
        i = os.getpid() % 1024

        #find current maximum number
        while True:
            #indicate we are currently getting a number
            self.C[i] = chr(1)

            max = 0
            for j in range(1024):
                if (ord(self.N[j])) > max:
                    max = ord(self.N[j])
            #we can't have max > 256 as chr(256) will fail
            if (max + 1 < 256):
                break
            else:
                self.C[i] = chr(0)
                sleep(0.1)

        #take next maximum
        self.N[i] = chr(max + 1)
        self.C[i] = chr(0)

    def lock(self):

        #first, take a number to see the baker
        self.take_a_number()

        i = os.getpid() % 1024

        for j in range(1024):
            #make sure the process isn't currently getting a ticket
            while ord(self.C[j]) == 1:
                sleep(0.1)
                continue

            # If process j has a ticket, i.e.
            #    N[j] > 0
            # AND either the process has a lower ticket, or the same
            # ticket and a lower PID, i.e.
            #   (N[j],j) < (N[i],i)
            # wait for it to run
            while (ord(self.N[j]) > 0) and (ord(self.N[j]),j) < (ord(self.N[i]),i) :
                sleep(0.1)
                continue

        #if we made it here, it is our turn to run!
        return

    def unlock(self):
        i = os.getpid() % 1024
        self.N[i] = chr(0)


mut = HalfBakedMutex()

os.fork()
os.fork() # 4 processes
os.fork() # 8 processes
os.fork() # 16 processes
os.fork() # 32 processes
os.fork() # 64 processes

while True:
    mut.lock()
    print(" ------ " + `os.getpid()` + " ------ ")
    mut.unlock()