"Get your fingers off my /proc/getpid()/*"
Closing the procfs hole in NetBSD
Hubert Feyrer <hubertf@netbsd.org>, April 2000

What is procfs?

In every Unix(like) system, there is a lot of information available about the system which you can query. Information like what network connections are open, settings of kernel parameters, filesystem-specifics and information related to user processes.

All this information is stored inside the the kernel, and to access these data, one has to know in which "variables" the kernel stores the information, then dig into the kernel memory to get the values. This approach has several disadvantages: It's not portable, as various operating systems store the information in different formats and with different names. Plus access to information can't be restricted that way - malicious processes can read/write any information in the kernel, not only the one they're interrested in. Access to the kernel memory is usually handled via /dev/kmem, on BSD based systems a user/process has to be a member of the group "kmem" to have access to this device file. If such a process does something it shouldn't, the system's security can be compromised - this is often used by programs exploiting so-called "buffer overruns", writing past buffer boundaries to write their own program code to the process, which then executes this malicious code. The results range from harmless core dumps over Denial of Service attacks to modifications of the system, usually installing backdoors to the system.

The problems of /dev/kmem-based programs to access kernel data structures has lead to the design of some alternatives. One of them being the "sysctl" facility, usually found on BSD based systems. With sysctl, one accesses information which is stored in a MIB structure like that of SNMP, e.g. to access some data of the IP stack, one would specify "net.inet.tcp.keepidle" to access (only) that bit of information. MIB entries are either read-only or read-write, so you cannot modify values like the kernel's load average which are read-only.

A problem of sysctl is that to access the information, the MIB must be specified as a series of numbers instead of strings, and thus contradicts the traditional "everything is a file" approach.

Following this concept more closely are several filesystems, which make certain information from the kernel visible to user space via a filesystem interface:

fdesc:
Provides access to process' file descriptors
kernfs:
Provides various kernel related parameters like OS revision, load average, ...
procfs:
Provides information related to processes, including a list of running processes (by PID), and for each process information such as the process' command line (argv), references to it's executable file and process memory, CPU status registers, a facility to send signals from e.g. debuggers, etc.
These three filesystems and the described contents are specific to BSD, other Unix-like operating systems provide different filesystems, contents and/or methods to access information stored there.

Background on the procfs Hole

Recently, a procfs related security exploit became available. We'll tell you about the technical details here, and how this was fixed in a general way in NetBSD. See also the NetBSD Security Advisory SA#2000-001.

Basically the exploint does open /proc/<pid>/mem, seek to a stack address, and then use this file descriptor as stderr. Then you fork and exec two suid programs, and make one of them write to stderr, which points to /proc/<pid>/mem@stack of the other setuid process. That way, you can manupulate the second process in an arbitrary way, just like any buffer overflow exploit does.

Writing to the other process' memory is possible because the procfs descriptor is left open after the parent process exec()s. A possible fix to this is to mark the descriptor as close-on-exec automatically from the kernel, but the process could unset this. A better fix is to invalidate the descriptor when the process it points to calls exec(2).

Implementation of this invalidation can be done in the exec-module of the kernel, or in a more general fashion, using a generic "process-exec hook", that can be used for other purposes, should the need arise.

The Fix - Interview with Frank van der Linden

Frank van der Linden (<fvdl>) explained to me (<hubertf>) how NetBSD solves the procs security hole. It boils down to the kernel doing a "get your fingers off my /proc/getpid()/anything" for the exec'ing process:

<hubertf> Frank, can you tell me about the exec-hook you added?
<fvdl> It's a simple interface, the same as e.g. shutdown hooks.
<hubertf> I can imagine what shutdown hooks do, but exec hooks? Are they called before any exec ?
<fvdl> Yes.
<hubertf> This sounds slow. What sort of hooks would one add there?
<fvdl> Why would it be?
<hubertf> Traversing a list of hooks, calling a function, checking the return value - sounds slow to me (but what do I know...)
<fvdl> In this case, there are is only one hook present, and only if a process is sugid, was accessed through procfs, and execs. The return value isn't checked. If you look at everything that's going on during an exec(), it's minor.
<hubertf> So what does that hook then do - check if stderr is on a procfs mem file, and bomb out if so?
<fvdl> The hook revokes all vnodes that reference the process, through procfs, if it's about to exec an suid binary.
<hubertf> Why revoke all vnodes, not only the ones for stdin/out/err, i.e. for file descriptors 0, 1 and 2?
<fvdl> The kernel should not have knowledge about their special status. Any potential problems with their special status should be solved in userspace.
<hubertf> OK. So getting back to the exploit, that evil binary will not get a malliciously setup stderr, even though it tried to do so?
<fvdl> It won't get a bad stderr, because the vnode for it was nuked.
<hubertf> Like a closed stderr?
<fvdl> The process trying to write it will get EIO, see revoke(2) (a low-level, in-kernel version of it of course)
<hubertf> Ok. So, this exec-hook basically says "get your fingers off my /proc/.../mem" ?
<fvdl> "get your fingers off my /proc/getpid()/anything"
<hubertf> Thank you for your time! :)

Further reading

For everyone interrested, the changes can be viewed via the NetBSD AnonCVS service using the following commands:

setenv CVSROOT anoncvs@anoncvs.netbsd.org:/cvsroot
cvs rdiff -r1.106 -r1.107 syssrc/sys/kern/kern_exec.c
cvs rdiff -r1.52 -r1.53 syssrc/sys/kern/kern_subr.c
cvs rdiff -r1.100 -r1.101 syssrc/sys/sys/systm.h
cvs rdiff -r1.27 -r1.28 syssrc/sys/miscfs/procfs/procfs.h
cvs rdiff -r1.28 -r1.29 syssrc/sys/miscfs/procfs/procfs_subr.c
cvs rdiff -r1.31 -r1.32 syssrc/sys/miscfs/procfs/procfs_vfsops.c

Thanks

This article was composed from facts and hints from Frank van der Linden, Jason Thorpe and Charles Hannum. Many thanks to them for explaining things!


This text was written for DaemonNews
$Id: procfs.html,v 1.4 2000/04/29 21:20:45 feyrer Exp feh39068 $