Monday, 29 June 2015

Find open files on a filesystem in FreeBSD

I had a problem on an archaic FreeBSD machine whose /var filesystem had tipped over the 80% threshold of our monitoring system.

A brief investigation showed that there was a vast discrepancy between the output of the du and df  commands.

The df command showed that the file system was 81% full where the du command showed it was around 5% full.

This situation is caused when a file is deleted but a process keeps it open. The difference is due to the different ways that the df and du commands calculate free space vs used space. This means that the disk space will still be reserved and therefore seen by df, but since du will no longer see a reference to that file in the directory tree, it cannot see those reserved blocks resulting in a difference between reported disk usage.

So, the next step is to find what process is holding a deleted open. Unfortunately, I would normally use something like lsof to do this but on this ancient version of freebsd that is not an option, so a new option is required.


# fstat -f /var 

USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
root     fstat      13839   wd /var      94208 drwxr-xr-x     512  r
root     csh        13582   wd /var      94208 drwxr-xr-x     512  r
operator csh         6713   wd /var     188489 dr-xr-xr-x     512  r
nobody   snmpd      83626    3 /var     259076 -rw-r--r--  2766247084  w
root     cron        1287   wd /var     376832 drwxr-x---     512  r
root     cron        1287    3 /var     164874 -rw-------       4  w
_bgpd    bgpd        1251 root /var      47104 dr-xr-xr-x     512  r
_bgpd    bgpd        1251   wd /var      47104 dr-xr-xr-x     512  r
_bgpd    bgpd        1251 jail /var      47104 dr-xr-xr-x     512  r
_bgpd    bgpd        1250 root /var      47104 dr-xr-xr-x     512  r
_bgpd    bgpd        1250   wd /var      47104 dr-xr-xr-x     512  r
_bgpd    bgpd        1250 jail /var      47104 dr-xr-xr-x     512  r
dhcpd    dhcpd       1213    6 /var     400402 -rw-r--r--   45998  w
root     syslogd      930    3 /var     164883 -rw-------       3  w
root     syslogd      930   12 /var     259338 -rw-r--r--   12844  w
root     syslogd      930   13 /var     259324 -rw-------      78  w
root     syslogd      930   14 /var     259278 -rw-------   40072  w
root     syslogd      930   15 /var     259083 -rw-------      62  w
root     syslogd      930   16 /var     259301 -rw-------   83687  w
root     syslogd      930   17 /var     259077 -rw-r-----    1953  w
root     syslogd      930   18 /var     259079 -rw-r--r--      62  w
root     syslogd      930   19 /var     259085 -rw-------      62  w
root     syslogd      930   20 /var     259087 -rw-------  103292  w
root     syslogd      930   21 /var     259284 -rw-------      78  w
root     syslogd      930   22 /var     259084 -rw-r-----      62  w
root     syslogd      930   23 /var     259082 -rw-r-----      62  w


We can see from the output of the fstat command that the snmp daemon has a 2.7Gb file open.

I killed and restarted snmpd and file system usage dropped down to 3% 


Note: You could limit that output to only show files that are deleted (unowned) by grepping for "nobody"

# fstat -f /var 
nobody   snmpd      83626    3 /var     259076 -rw-r--r--  2766247084  w

No comments: