Shared memory on Linux

Thu 08 November 2007

You may have noticed a range of things mounted on a tmpfs file system.

$ mount | grep tmpfs
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)

tmpfs is actually more pervasive than it might appear. It was created to build a filesystem interface ontop of shared pages of RAM. This is useful because there are a number of ways to access shared RAM; notably SysV IPC Shared Memory (shmget, shmat, etc) and mmap of anonymous memory (MAP_ANONYMOUS).

In order to have a consistent infrastructure implementing all of this the kernel abstracts RAM as a filesystem and your mapping as a file. To this end the tmpfs filesystem was created to present this abstraction. It deals with issues such as caching, security and the ability to swap pages in a consistent manner (i.e. using the same infrastructure every other file system does).

Overall, tmpfs is implemented in mm/shmem.c. In the initialisation function the file system is mounted internally by the kernel with vfs_kern_mount so it is always around. It gets fairly complicated because it has to keep track of swapped out pages and the general complexity that come with implementing a full filesystem.

As described above, there are several distinct ways to get access to this file system.

The SysV IPC shared memory layer (ipc/shm.c) is a wrapper layer which manages IPC keys and internally plugs itself into the tmpfs code. Essentially creating a key with shmget() ends up calling ipc/shmem.c:newseg() which calls mm/shmem.c:shmem_file_setup() which then creates a struct file for the shared memory. This new file reuses the file operations from the tmpfs filesystem, but is not actually put into the file system (i.e. linked to the superblock). When shat() is called to attach to this memory the SysV IPC layer will internally translate this into a mmap of this file. This is what is used by the Shared Global Area of Oracle and some Java implementations, amongst many other things (see /proc/sysvipc/shm for a comprehensive list).
The second common thing to do is an anonymous mmap. By default MAP_ANONYMOUS ignores the file-descriptor, but an analagous methodology is to explicitly map /dev/zero. If you look at the implementation of /dev/zero in drivers/char/mem.c you can see code such as
```
static int mmap_zero(struct file * file, struct vm_area_struct * vma)
{
        int err;

        if (vma->vm_flags & VM_SHARED)
                return shmem_zero_setup(vma);
```
which ends up backing shared areas with tmpfs.
The third way is actually mounting tmpfs and using it from userspace as a normal file system. glibc implements the POSIX shared memory call shm_open by creating and mmaping files on a tmpsfs mount (by default looking in /dev/shm). Other utilities also use the tmpfs semantics and mount it at other points; the obvious one of course being mounting /tmp on it!

tmpfs: more than meets the eye!

Update: hopefully a little clearer, thanks to Helge Bahmann for comments.