Today I've decided to write about filesystems -- in particular, some fundamentals, and what it takes to delete a file. First, a rough layout of pretty much any filesystem. A filesystem is a means of putting files on a disk partition, and keeping track of them. As a rule, this involves at least two different elements: the metadata about a file, such as its creation time, its user permissions and ownership, the filename itself, and where the file (with its data) resides on-disk. The metadata is in a "map" that you can think of it as a phone book listing: it gives you all the information on how to find a location, but it isn't the location, itself. Then there is the file data. This can be stored in a number of different ways: blocks and extents are the most common, though how they differ is outside the scope of this particular write-up. At the end of the day, though, the filesystem keeps track of the metadata -- the information *about* the file -- and the file, itself. Now we come to deleting a file. This can be a little trickier than it sounds, especially if you involve hard links, but we're trying to keep this simple and for illustrative purposes, so we'll happily ignore them for the moment. There's a sequence of things that have to happen in order to "delete" a file, and delete, itself, is even a word with different meanings in this context: The first one -- removing the name -- always happens. But returning the space to the pool -- actually deleting the file -- only happens if the file is not currently open for use by a process. Let's walk through that some by way of a little test. What we're going to do is create a file, write some data to it, keep it open (important!), delete the file... and then read the data back from the file, even though it's deleted. First, open two terminal sessions. In the first one, do this:

# The "cat > filename" construct creates a file and opens it for writing, taking its input from the console ken@methuselah ~ $ cat > /tmp/testfile.txt foobar<cr>

The <cr> just means "carriage return" -- or the <ENTER> key. So type 'foobar', and hit enter. There won't be a prmopt or anything, as cat is still happily waiting for more input to write to the file. Now, go to the second terminal, and verify the file is there:

ken@methuselah ~ $ ls -al /tmp/testfile.txt -rw-rw-r-- 1 ken ken 7 Dec 23 16:52 /tmp/testfile.txt

Delete the file:

ken@methuselah ~ $ rm /tmp/testfile.txt ken@methuselah ~ $

Verify it's gone:

ken@methuselah ~ $ ls -al /tmp/testfile.txt ls: cannot access '/tmp/testfile.txt': No such file or directory

So! We've deleted the file, right? Well... kinda. Since the terminal session #1 still has us writing data to the file, though, it's still alive and well *for that process*. We know the name of the deleted file, so, using the "lsof" command -- list open files -- let's see what we can see:

ken@methuselah ~ $ lsof | grep /tmp/testfile.txt lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing Output information may be incomplete. cat 10931 ken 1w REG 0,26 7 9559001 /tmp/testfile.txt (deleted)

You can ignore the "WARNING"; that's just lsof complaining that it can't do everything it wants to. The important part is the last line, which gives us all sorts of interesting stuff: "cat" is the command we're running "10931" is the process ID ("PID") associated with it "ken" is the owner of the process "1w" (that's a number '1', not the letter 'l') means the file is the first file descriptor for PID 10931, and the "w" means it's open for writing "7" is the size in bytes And, lastly, we can see the filename -- and the fact that it's deleted... but not yet reaped. So... how could we see what's in the file? Easy-peasy, by way of the /proc/ filesystem. If you do an 'ls' of /proc, you'll see a number of things -- but perhaps the most common "directories" listed are ones that consist of just numbers. These numbers are the PIDs of all running processes, and in each of these directories is a myriad of information about each process. In our case, let's take a look at the PID associated with our cat command, in /proc/10931/ :

ken@methuselah ~ $ ls -al /proc/10931 total 0 dr-xr-xr-x 9 ken ken 0 Dec 23 16:55 . dr-xr-xr-x 385 root root 0 Dec 17 16:34 .. dr-xr-xr-x 2 ken ken 0 Dec 23 17:01 attr -rw-r--r-- 1 ken ken 0 Dec 23 17:01 autogroup -r-------- 1 ken ken 0 Dec 23 17:01 auxv -r--r--r-- 1 ken ken 0 Dec 23 17:01 cgroup --w------- 1 ken ken 0 Dec 23 17:01 clear_refs -r--r--r-- 1 ken ken 0 Dec 23 17:01 cmdline -rw-r--r-- 1 ken ken 0 Dec 23 17:01 comm -rw-r--r-- 1 ken ken 0 Dec 23 17:01 coredump_filter -r--r--r-- 1 ken ken 0 Dec 23 17:01 cpuset lrwxrwxrwx 1 ken ken 0 Dec 23 16:55 cwd -> /home/ken -r-------- 1 ken ken 0 Dec 23 17:01 environ lrwxrwxrwx 1 ken ken 0 Dec 23 16:55 exe -> /bin/cat dr-x------ 2 ken ken 0 Dec 23 16:55 fd dr-x------ 2 ken ken 0 Dec 23 16:55 fdinfo [...]

There's LOTS of info in here, but we're interested in the "fd" directory -- file descriptors:

ken@methuselah ~ $ ls -al /proc/10931/fd total 0 dr-x------ 2 ken ken 0 Dec 23 16:55 . dr-xr-xr-x 9 ken ken 0 Dec 23 16:55 .. lrwx------ 1 ken ken 64 Dec 23 16:55 0 -> /dev/pts/14 l-wx------ 1 ken ken 64 Dec 23 16:55 1 -> '/tmp/testfile.txt (deleted)' lrwx------ 1 ken ken 64 Dec 23 16:55 2 -> /dev/pts/14 lrwx------ 1 ken ken 64 Dec 23 16:55 6 -> /dev/pts/14

Well, lookie there! File descriptor 1, as presaged by the lsof command, points to our deleted file. What happens if we cat that file?

ken@methuselah ~ $ cat /proc/10931/fd/1 foobar

So. Our file is deleted -- but the data has not been reclaimed by the filesystem, and is still there taking up space. And it won't be reclaimed until all processes that have it open are terminated. Now, sometimes, "for reasons," you don't want to kill a process, but you do want to reclaim the space that's being used by it. As an example, this can happen when a logfile goes bonkers, but logrotate doesn't work right. If you *just* wish to reclaim the space from a deleted file, it's easy:

ken@methuselah ~ $ echo > /proc/10931/fd/1 ken@methuselah ~ $ cat /proc/10931/fd/1 ken@methuselah ~ $

*poof* It's gone! However, the more direct way, perhaps, to free up that file and completely reclaim it is simply by killing the process that has it open. In the econd terminal window:

ken@methuselah ~ $ kill 10931

And now if you look at the first terminal window...

ken@methuselah ~ $ cat > /tmp/testfile.txt foobar Terminated ken@methuselah ~ $

We've killed the process, and that's reflected in the first terminal window. One step further shows us that things are well and truly wrapped up:

ken@methuselah ~ $ cat /proc/10931/fd/1 cat: /proc/10931/fd/1: No such file or directory

Congrats! You've now well and truly deleted the file... albeit in a rather roundabout manner.

Hope you found this informative! Please excuse the HandCrafted™ HTML. I know lots about Linux, but relatively little about web stuffs. Any questions or comments? Go to town and shoot me an e-mail!