Articles in the web category


I find it very useful to spend 5 minutes a day to keep a small log of what was worked on, major bugs or reviews and a general small status report. It makes rolling up into a bigger status report easier when required, or handy as a reference before you go into meetings etc.

I was happily using an etherpad page until I couldn't save any more revisions and the page got too long and started giving javascript timeouts. For a replacement I wanted a single file as input with no boilerplate to aid in back-referencing and adding entries quickly. It should be formatted to be future-proof, as well as being emacs, makefile and git friendly. Output should be web-based so I can refer to it easily and point people at it when required, but it just has to be rsynced to public_html with zero setup.

rstdiary will take a flat RST based input file and chunk it into some reasonable looking static-HTML that looks something like this. It's split by month with some minimal navigation. Copy the output directory somewhere and it is done.

It might also serve as a small example of parsing and converting RST nodes where it does the chunking; unfortunately the official documentation on that is "to be completed" and I couldn't find anything like a canonical example, so I gathered what I could from looking at the source of the transformation stuff. As the license says, the software is provided "as is" without warranty!

So if you've been thinking "I should keep a short daily journal in a flat-file and publish it to a web-server but I can't find any software to do just that" you now have one less excuse.

Skipping pages with django.core.paginator

Here's a little snippet for compressing the length of long Django pagination by just showing context around the currently selected page. What I wanted was the first few and last few pages always selectable with some context pages around the currently selected page; e.g.

Example of skip markers in paginator

If that's what you're looking for, some variation on below may be of use. In this approach, you build up a list of pages similar to the paginator object page_range but with only the relevant pages and the skip-markers identified.

from django.core.paginator import Paginator
import unittest

class Pages:

    def __init__(self, objects, count):
        self.pages = Paginator(objects, count)

    def pages_to_show(self, page):
        # pages_wanted stores the pages we want to see, e.g.
        #  - first and second page always
        #  - two pages before selected page
        #  - the selected page
        #  - two pages after selected page
        #  - last two pages always
        # Turning the pages into a set removes duplicates for edge
        # cases where the "context pages" (before and after the
        # selected) overlap with the "always show" pages.
        pages_wanted = set([1,2,
                            page-2, page-1,
                            page+1, page+2,
                            self.pages.num_pages-1, self.pages.num_pages])

        # The intersection with the page_range trims off the invalid
        # pages outside the total number of pages we actually have.
        # Note that includes invalid negative and >page_range "context
        # pages" which we added above.
        pages_to_show = set(self.pages.page_range).intersection(pages_wanted)
        pages_to_show = sorted(pages_to_show)

        # skip_pages will keep a list of page numbers from
        # pages_to_show that should have a skip-marker inserted
        # after them.  For flexibility this is done by looking for
        # anywhere in the list that doesn't increment by 1 over the
        # last entry.
        skip_pages = [ x[1] for x in zip(pages_to_show[:-1],
                       if (x[1] - x[0] != 1) ]

        # Each page in skip_pages should be follwed by a skip-marker
        # sentinel (e.g. -1).
        for i in skip_pages:
            pages_to_show.insert(pages_to_show.index(i), -1)

        return pages_to_show

class TestPages(unittest.TestCase):

    def runTest(self):

        objects = [x for x in range(0,1000)]
        p = Pages(objects, 10)

                         [1, 2, -1, 99, 100])



if __name__ == '__main__':

Then somehow pass through the pages_to_show to your view (below I added it to the paginator object passed) and use a template along the lines of

<ul class="pagination">

{% if pages.has_previous %}
  <li><a href="foo.html?page={{ pages.previous_page_number }}">&laquo;</a></li>
{% else %}
  <li class="disabled"><a href="#">&laquo;</a></li>
{% endif %}

{% for page in pages.pages_to_show %}
  {% if page == -1 %}
  <li class="disabled"><a href="#">&hellip;</a></li>
  {% elif page == pages.number %}
  <li class="active"><a href="#">{{ page_num }}</a></li>
  {% else %}
  <li><a href="foo.html?page={{ page_num }}">{{page_num}}</a>
  {% endif %}
{% endfor %}

{% if pages.has_next %}
  <li><a href="foo.html?page={{ pages.next_page_number }}">&raquo;</a></li>
{% else %}
  <li class="disabled"><a href="#">&raquo;</a></li>
{% endif %}


Adding a javascript tracker to your Docbook document

If you convert a docbook document in a chunked HTML form, it would presumably be very useful for you to place one of the various available javascript based trackers on each of your pages to see what is most popular, linked to etc.

I'm assuming you've already got to the point where you have your DSSSL stylesheet going (see something like my prior entry).

It doesn't appear you can get arbitrary output into the <head> tag -- the %html-header-tags% variable is really just designed for META tags and there doesn't appear to be anything else in the standard stylesheets to override.

So the trick is to use $html-body-start$. But you have to be a little bit tricky to actually get your javascript to slip past the SGML parser. After several attempts and a bit of googling I finally ended up with the following for a Google Analytics tracker, using string-append to get the javascript comment filtering in:

(define ($html-body-start$)
  (make element gi: "script"
  attributes: '(("language" "javascript")
                ("type" "text/javascript"))
            (make formatting-instruction
              data: (string-append "<" "!--
 var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXXXXXX-1']);

  (function() {
    var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
    ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '';
    var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
// --" ">"))))

I think I've just about finished my Thanksgiving project

I'm a little bit obsessive about geotagging my photos and while I know there are many photo management solutions out there that can do it in various ways, I generally find it quicker and easier to use exiv2 and simple shell scripts to embed the location info directly into my files.

I've tried a number of things that have never worked out better than simply using Google Maps and panning around to find locations — I even bought a GPS tracker which would supposedly automatically tag my photos; assuming of course it could ever get a GPS lock, it hadn't run out of batteries and corrupted its filesystem, you had all times in sync and could figure out the various timezone issues, daylight savings changes, etc etc. I always feel safer having all my metadata embedded in the actual files just incase Yahoo ever does a to Flickr (I use a little Python script with IPTC bindings for comments, which I then backup similarly obsessively locally and to Amazon S3).

The site is fairly simple in concept — it allows you to search for locations, easily extract the geotag info and provides the ability to save frequently used locations for easy reference.

Mostly it was an exercise for me to implement something after reading the excellent Javascript patterns with YUI3, Google App Engine and OpenID — all of which of which I managed to cram in.

Although the audience may be limited (maybe just to me :) I hope someone else finds it useful for managaing their memories! If you think this might be useful and would like the output in some other format, just let me know.

Logging POST requests with Apache

After getting a flood of spam, I became suspicious that there was an exploit in my blog software allowing easy robo-posts. Despite a code audit I couldn't see anything, and thus wanted to log the incoming POST requests before any local processing at all.

It took me a while to figure out how to do this, hopefully this helps someone else. Firstly install libapache-mod-security, then the magic incarnation is

SecRuleEngine On
SecAuditEngine on
SecAuditLog /var/log/apache2/website-audit.log
SecRequestBodyAccess on
SecAuditLogParts ABIFHZ

SecDefaultAction "nolog,noauditlog,allow,phase:2"

SecRule REQUEST_METHOD "^POST$" "chain,allow,phase:2"
SecRule REQUEST_URI ".*" "auditlog"

So, to break it down a little, the default action says to do nothing during phase 2 (when the body is available for inspection); the allow means that we're indicating that nothing further will happen in any of the remaining phases, so the module can shortcut through them. The two SecRules work together -- the first says that any POST requests should be tested by the next rule (i.e. the chained rule), which in this case says that any request should be sent to the audit log. After that, the similar allow/phase argument again says that nothing further is going to happen in any of the subsequent phases mod_security can work on. As per the parts between A and Z, we'll log the headers, the request body, the final response and trailer.

So, as it turns out, there is no exploit; it seems most likely there is an actual human behind the spam that gets through, because every time they take a guess it is correct. So I guess I'll take a glass-half-full kind of approach and rather than being annoyed at removing the spam, I'll just convince myself that I made a small donation from some spam overlord to one of their poor minions!

YUI ButtonGroup Notes

Some tips and things to check if your YUI ButtonGroup isn't behaving as you wish it would.

  • Double-check your <body> tag has class="yui-skin-sam"

  • Unlike in the documentation example, you can't just put a call to YAHOO.widget.ButtonGroup pointing to your div anywhere in your HTML and expect it to work. You've got to wait for it to be ready with something like:

    <script type="text/javascript">
    YAHOO.util.Event.onContentReady("my_button_div", function() {
      var oButtonGroup = new YAHOO.widget.ButtonGroup("my_button_div");
  • You can easily get an image in each button. For example, if your button is defined as:

    <span id="my-button-id" class="yui-button yui-radio-button yui-button-checked">
     <span class="first-child">
       <button type="button" hidefocus="true"></button>

    Simply add a CSS class something like:

    .yui-button#my-button-id button { background:url(http://server/image.jpg) 50% 50% no-repeat; }

Hopefully, this will save someone else a few hours!

Facebook, API's, photos and IPTC data

As a photo management application, Facebook sucks. But it is something that people actually look at (as opposed to Flickr, which is great, but getting people to log-in or follow special guest pass links is a PITA).

I like to keep all my raw photos locally, using IPTC for comments (which Flickr reads -- I put them in using some custom scripts and the Python bindings of libiptcdata) and geo-tagged in the EXIF data (using my google maps point locator). I figure this way if Flickr goes bust/gets bought by Microsoft all I need to do is re-upload somewhere else.

I was waiting for Flickr to integrate with Facebook in some good way, but I then came across the very useful pyfacebook bindings, which, although being a little light on documentation, is a great way to easily throw my photos into Facebook (it's pending the NEW queue in Debian, see #511279).

My script might be a useful starting point if you want to do the same thing. It batches up photos into lots of 60 (the maximum photos in an album) and automatically creates the albums and uploads the photos, reading the IPTC data for comments. The only problem is that you'll have to sign up for a developer key and start a new application to get a secret key to talk to the API (if you're still reading this, I'm sure you can figure it out!).

Scrambled text

There's this thing going around on Facebook suggesting you are smart if you can read a paragaph where the middle letters of the words are scrambled.

Everything old is new again, since I remember reading about this in 2003. As far as I remember being able to read the scrambled text was not related to intelligence in any way.

Anyway, as a Friday afternoon distraction I wrote a little javascript to scramble text.

Plain Scrambled
Hello This is a paragraph that you will still be able to read even after it is scrambled, the only rule being the first and last letters remain in place. This javascript will try and be fairly smart about not moving around non-alphabetic characters too.  

Scramble It

Ahh, the good-olde Fisher-Yates shuffle algorithm, friend of the first year tutorial!