Python and --prefix

Something interesting I discovered about Python and --prefix that I can't see a lot of documentation on...

When you build Python you can use the standard --prefix flag to configure to home the installation as you require. You might expect that this would hard-code the location to look for the support libraries to the value you gave; however in reality it doesn't quite work like that.

Python will only look in the directory specified by prefix after it first searches relative to the path of the executing binary. Specifically, it looks at argv[0] and works through a few steps — is argv[0] a symlink? then dereference it. Does argv[0] have any slashes in it? if not, then search the $PATH for the binary. After this, it starts searching for dirname(argv[0])/lib/pythonX.Y/os.py, then dirname(argv[0])/../lib and so on, until it reaches the root. Only after these searches fail does the interpreter then fall back to the hard-coded path specified in the --prefix when configured.

What is the practical implications? It means you can move around a python installation tree and have it all "just work", which is nice. In my situation, I noticed this because we have a completely self-encapsulated build toolchain, but we wish to ship the same interpreter on the thing that we're building (during the build, we run the interpreter to create .pyc files for distribution, and we need to be sure that when we did this we didn't accidentally pick up any of the build hosts python; only the toolchain python).

The PYTHONHOME environment variable does override this behaviour; if it is set then the search stops there. Another interesting thing is that sys.prefix is therefore not the value passed in by --prefix during configure, but the value of the current dynamically determined prefix value.

If you run an strace, you can see this in operation.

readlink("/usr/bin/python", "python2.7", 4096) = 9
readlink("/usr/bin/python2.7", 0xbf8b014c, 4096) = -1 EINVAL (Invalid argument)
stat64("/usr/bin/Modules/Setup", 0xbf8af0a0) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/lib/python2.7/os.py", 0xbf8af090) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/lib/python2.7/os.pyc", 0xbf8af090) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/python2.7/os.py", {st_mode=S_IFREG|0644, st_size=26300, ...}) = 0
stat64("/usr/bin/Modules/Setup", 0xbf8af0a0) = -1 ENOENT (No such file or directory)
stat64("/usr/bin/lib/python2.7/lib-dynload", 0xbf8af0a0) = -1 ENOENT (No such file or directory)
stat64("/usr/lib/python2.7/lib-dynload", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0

Firstly it dereferences symlinks. Then it looks for Modules/Setup to see if it is running out of the build tree. Then it starts looking for os.py, walking its way upwards. One interesting thing that may either be a bug or a feature, I haven't decided, is that if you set the prefix to / then the interpreter will not go back to the root and then look in /lib. This is probably pretty obscure usage though!

All this is implemented in Modules/getpath.c which has a nice big comment at the top explaining the rules in detail.