This blog post is aimed at helping you get started with B-tree filesystem (BtrFS). Kernel-based filesystems in the Linux kernel tree are currently over 55 with each filesystem having its pros and cons. Here we’ll extensively cover how to administer BtrFS filesystem in Linux.
Some filesystems have limited or rather very specific usage and the filesystems considered to be truly general purpose are extN systems like ext2,ext3,ext4 – They are stable and powerful but still have certain limitations.
B-tree filesystem (BtrFS) which is pronounced as Better FS has been making inroads in Linux for quite some time now. As of this writing, the stable available version is 4.9. Let’s now get to basics of BtrFS filesystem in Linux management.
What is BtrFS?
BtrFS is the next generation general purpose Linux file system that offers unique features like advanced integrated device management, scalability and reliability. BtrFS scales to 16 exabytes (EB) and is focused on features that no other Linux filesystems have, some even argue that Btrfs is the Linux answer to the Sun/Oracle ZFS, but its architecture is more scalable than ZFS. In fact BtrFS Filesystem in Linux is getting huge attention at the moment.
BtrFS builds the foundation for Ceph distributed filesystem and its RADOS object store layer for “cloud” technologies. It encompasses ideas from ext4,XFS,HP aufs and Reiser file systems. BtrFS development is very active and new features are being added at a tremendous pace.
Why BtrFS
BtrFS filesystem in Linux provides the following Features and Capabilities
- Built-in copy on write
- Powerful snapshot capabilities
- Built-in volume management with subvolumes
- Massive Scalability upto 16 Exabytes
- Built-in data integrity (checksums)
- SSD optimization
- Compression capabilities
- Cloud ready
- RAID built in BtrFS
- Manual defragmentation
- Online filesystem management
- Data and metadata integrity
- In-place conversion from ext2/3/4 and ReiserFS
- Quota groups
- Online expansion and reduction of filesystem size
- Object level RAID
- Seeding devices
- Support for ultiple devices
Btrfs Specs
- Max volume size: 16 EB (2^64 byte)
- Max file size: 16 EB
- Max file name size: 255 bytes
- Filesystem check: online and offline
- Directory lookup algorithm: B-Tree
- Characters in file name: any, except 0x00
- Compatibility
- Hard and symbolic links
- Access Control Lists (ACLs)
- Extended Attributes (xattrs)
- POSIX file owner/permissions
- Asynchronous and Direct I/O
- Sparse files
BtrFS truly support maximum file size of 16 Exabytes. If the term Exabytes make you a bit confused, refer to the diagram below which can help you visualize the perspective.
To check which filesystems your kernel currently supports, you can find in the file /proc/filesystems. Example is shown below for my local system.
# cat /proc/filesystems
nodev sysfs
nodev rootfs
..............
btrfs
ext3
ext2
ext4
vfat
xfs
fuseblk
nodev fuse
nodev fusectl
For BtrFS support, the output should contain the keyword btrfs.
Installing BtrFS on Linux distros
On Debian based systems:
sudo apt-get update
sudo apt-get -y install btrfs-progs
RHEL based systems:
sudo yum -y install btrfs-progs
Arch Linux
sudo pacman -S btrfs-progs
Gentoo
sudo emerge --ask sys-fs/btrfs-progs
BtrFS useful mount options
Option | Meaning |
---|---|
acl, noacl | Enable/disable support for Posix Access Control Lists (ACLs). The default is on |
device=/dev/name | Tells BtrFS to scan the named device/s for a BtrFS volume. |
max_inline=number | Specify the maximum amount of space, in bytes, that can be inlined in a metadata B-tree leaf. The default is the default value has changed to 2048 in kernel 4.6. It can be turned off by specifying 0. |
clear_cache | Use this option to clear all the free space caches during mount. |
thread_pool=number | The number of worker threads to allocate. NRCPUS is number of on-line CPUs detected at the time of mount. Default is min(NRCPUS + 2, 8).NRCPUS is number of on-line CPUs detected at the time of mount. Small number leads to less parallelism is processing data and metadata, higher numbers could lead to a performance hit due to increased locking contention, cache-line bouncing or costly data transfers between local CPU memories. |
space_cache, space_cache=version, nospace_cache | The free space cache greatly improves performance when reading block group free space into memory. However, managing the space cache consumes some resources, including a small amount of disk space. |
user_subvol_rm_allowed | Allow subvolumes to be deleted by their respective owner. Otherwise, only the root user can do that. The deafult is on. |
For more details on the available options, read btrfs man page
$ man 5 btrfs
Working with BtrFS – Using Examples
My lab machine currently has two secondary hard drives, each one consist of 1 GB
to use in the demonstrations to follow shortly. To follow along smoothly, you can spin a virtual machine, install btrfs-progs
package and add two secondary hard drives.
Creating and Mounting BtrFS partition
To kick off the demo, we’ll start by creating a BtrFS filesystem on a single 1 GB
partition, and mount it to the /data
directory. We’re going to create a partition on /dev/vdb
which covers 30%
of the block device. To make a basic BtrFS file system and mount it, use the following commands:
sudo parted --script /dev/vdb "mklabel gpt"
sudo parted --script /dev/vdb "mkpart primary 1 30%"
sudo parted /dev/vdb print
sudo mkdir /data
sudo mkfs.btrfs /dev/vdb1
sudo mount /dev/vdb1 /data
To confirm that mounted partition work the way we wanted, let’s copy some data to it as follows:
sudo find /usr/share/doc -name '*[a,b].html' -exec cp /data \;
ls -l /data
Check the filesystem using btrfs
commands as well:
btrfs filesystem show /dev/vdb1
btrfs filesystem df -h /data/
btrfs filesystem usage /data/
From these commands, you’ll see that we copied some of the existing html files to give us some real data to use for demo. The last command confirms the size close to 300MB ( 30% of 1 GB)
.
List the subvolumes of the root volume
$ btrfs subvolume list /data/
View the disk space utilization:
$ btrfs filesystem df -h /data
$ btrfs filesystem show /dev/vdb1
Enlarging a btrfs File System
From previous partitioning of dev/vdb
, we still have around 700MB unpartitioned. We’re going to use this to enlarge btrfs filesystem.
sudo parted /dev/vdb mkpart primary 30% 60%
sudo btrfs device add /dev/vdb2 /data/
btrfs filesystem show /data
df -h /data/
Removing btrfs devices
Use the btrfs device delete
command to remove an online device. It will redistribute any extents in use to other devices in the file system in order to be safely removed.
Example:
sudo btrfs device delete /dev/vdb2 /data
That’s all we needed to do to grow BtrFS filesystem. This was confirmed from the output shown below:
Grow BtrFS filesystem
You can also resize directly by specifying the intended size, the syntax is:
sudo btrfs filesystem resize amount /mount-point
The amount can be a set size, such as ”+3g” for an increase in 3 GiB, or it can be “max” to grow the file system to fill the whole block device. Use ”-3g” for a decrease of 3 GiB. Consider example below to add new partition /dev/sda4
to /home
and extend it.
$ sudo btrfs device add /dev/sda4 /home -f
$ sudo btrfs filesystem resize max /home
$ sudo btrfs filesystem show /home
Label: 'home' uuid: b40ffd9b-c09d-403e-a5f3-b79b5c314505
Total devices 2 FS bytes used 79.71GiB
devid 1 size 88.81GiB used 88.81GiB path /dev/mapper/arch-home
devid 2 size 8.89GiB used 1.00GiB path /dev/sda4
Note that new device was added successfully.
Balancing the filesystem
If we run out of disk space within the original volume, we can add an extra partition. The metadata and data on these devices are still stored only on /dev/vdb1. It must now be balanced to spread across all partitions using below commands:
$ sudo btrfs balance start -d -m /data
Arguments:
-d : Represents the data
-m: Represents the metadata
This will ensure that the disks are equally used.
Testing:
It’s time to do some testing on our BtrFS filesystem in Linux. To test that balancing works, i’ll generate two random data of sized 100MB
each.
sudo dd if=/dev/urandom of=/data/hugefile1 bs=1M count=100
sudo dd if=/dev/urandom of=/data/hugefile2 bs=1M count=100
sudo btrfs balance start -d -m /data
sudo btrfs filesystem show /data
You should notice that the data is well balanced across the two volumes.
If you would like the /data
directory mounted at boot time, append below entry to the /etc/fstab
file:
/dev/vdb1 /data btrfs device=/dev/vdb1,device=/dev/vdb2 0 0
Multi-device File System Creation
With BtrFS filesystem in Linux. It’s possible to do multi-device management. This makes use of -d and -m options with the mkfs.btrfs command. Valid specifications are:
- single
- raid0 : Striping without redundancy
- raid1 : Disk mirroring
- raid10 : Striped mirror
The -m single option instructs that no duplication of metadata is done. This may be desired when using hardware raid.
To add a new device to an already created multi-device filesystem, use:
sudo mkfs.btrfs /dev/device1 /dev/device2 /dev/device3
sudo mount /dev/device3 /mount-point
Reload btrfs module then run:
sudo btrfs device scan
to discover all multi-device filesystems.
Let’s consider example below to create raid10
and raid1
btrfs file system. Notice that raid 10 needs at least four devices for it to operate correctly.
Four devices with metadata mirrored, data striped
sudo mkfs.btrfs /dev/device1 /dev/device2 /dev/device3 /dev/device4
Two devices, metadata striping but no mirroring
sudo mkfs.btrfs -m raid0 /dev/device1 /dev/device2
raid10
being used for both data and metadata
sudo mkfs.btrfs -m raid10 -d raid10 /dev/device1 /dev/device2 /dev/device3 /dev/device4
Full capacity of each device being used when the drives are different sizes:
sudo mkfs.btrfs /dev/device1 /dev/device2 /dev/device3
sudo mount /dev/device1 /mount-point
Do not duplicate metadata on a single drive.
sudo mkfs.btrfs -m single /dev/device
BtrFS device scanning
Scan all block devices under /dev and probe for BtrFS volumes using:
sudo btrfs device scan
sudo btrfs device scan /dev/device
Create BtrFS subvolumes
Subvolumes allow discrete management identities within the BtrFS filesystem. In this section, we’ll create two subvolumes, subvolume1 and subvolume2. For this, we’ll start by creating a new BtrFS on the /dev/vdb3 device, create a mount point and mount it,:
sudo parted /dev/vdb mkpart primary 60% 100%
sudo mkfs.btrfs /dev/vdb3
sudo mkdir /subvol_btrfs
sudo mount /dev/vdb3 /subvol_btrfs
Now let’s create the two subvolumes on /subvol_btrfs.
sudo btrfs subvolume create /subvol_btrfs/subvolume1
sudo btrfs subvolume create /subvol_btrfs/subvolume2
When we define the subvolumes, both the directories and BtrFS subvolume entities will be created in the filesystem.
Create a few files in /subvol_btrfs
and the subvolumes:
sudo touch /subvol_btrfs/btrfsmainfile.txt
sudo touch /subvol_btrfs/subvolume1/subvolume1file.txt
sudo touch /subvol_btrfs/subvolume2/subvolume2file.txt
List the currently available subvolumes in /subvol_btrfs
:
$ sudo btrfs subvolume list /subvol_btrfs
ID 256 gen 9 top level 5 path subvolume1
ID 257 gen 9 top level 5 path subvolume2
Unmount /subvol_btrfs:
sudo umount /subvol_btrfs
Mounting subvolumes
You can mount a subvolume to a mount point. Let’s do this and compare the results using ls -l
command:
$ sudo mount /dev/vdb3 /subvol_btrfs/
$ ls -l /subvol_btrfs/
$ sudo umount /subvol_btrfs/
$ sudo mount -o subvol=subvolume1 /dev/vdb3 /subvol_btrfs/
$ ls -l /subvol_btrfs/
$ sudo mount -o subvol=subvolume2 /dev/vdb3 /subvol_btrfs/
$ ls -l /subvol_btrfs/
Make subvolume default subvolume instead of the current root volume:
Let’s make subvolume1
the default subvolume. What we need is its ID:
$ sudo umount /subvol_btrfs/ 2>/dev/null
$ sudo mount /dev/vdb3 /subvol_btrfs/
$ ID=`btrfs subvolume list /subvol_btrfs/ | grep subvolume1 | awk 'print $2'`
$ btrfs subvolume set-default $ID /subvol_btrfs
Test by re-mounting /dev/vdb3
:
$ sudoumount /subvol_btrfs/
$ sudo mount /dev/vdb3 /subvol_btrfs/
$ ls -l /subvol_btrfs/
total 0
-rw-r--r--. 1 root root 0 Jan 10 11:22 subvolume1file.txt
Notice from the output above that the data we had created on subvolume1
is the default available on mounting /dev/vdb3
.
To set the default back to the root volume, use the ID of 0 or 5:
$ sudo btrfs subvolume set-default 0 /subvol_btrfs
$ sudo umount /subvol_btrfs
$ sudo mount /dev/vdb3 /subvol_btrfs/
$ ls -l /subvol_btrfs/
total 0
-rw-r--r--. 1 root root 0 Jan 10 11:22 btrfsmainfile.txt
drwxr-xr-x. 1 root root 36 Jan 10 11:22 subvolume1
drwxr-xr-x. 1 root root 36 Jan 10 11:22 subvolume2
Working with BtrFS snapshots
BtrFS filesystem in Linux snapshots feature can be used as read only or read/write copies of data. Snapshots can be used in the following ways:
1.
Creating the snapshot as read only and subsequently implementing a backup of the snapshot. In this way, the backup will be of the host filesystem at the point in time that the snapshot was created.
2.
Using it as revert point when modifying many files. If the modifications cause negative results, you can easily revert to the snapshot copy.
The snapshot have to be created on the same filesystem as the target data since rapid creation of the snapshot is affected by a form of internal linking within the filesystem.
NOTE: You cannot create a snapshot of the complete filesystem. This is because changes to the snapshot will need to be written back to itself resulting in infinite recursion.
For the purpose of demonstrations, we’ll use the two subvolumes we created earlier. Our scenario is that we create a read-only snapshot of the working subvolume subvoume1
.
$ sudo btrfs subvolume snapshot -r /subvol_btrfs/subvolume1 /subvol_btrfs/subvolume2/backup/
Create a readonly snapshot of '/subvol_btrfs/subvolume1' in '/subvol_btrfs/subvolume2/backup'
We can list the available subvolumes with the command:
$ sudo btrfs subvolume list /subvol_btrfs/
ID 256 gen 24 top level 5 path subvolume1
ID 257 gen 24 top level 5 path subvolume2
ID 258 gen 24 top level 257 path subvolume2/backup
From the output, we can see that the snapshot appears as a new subvolume. Listing the contents of both directories should indicate that the contents are the same:
$ ls /subvol_btrfs/subvolume2/backup/
subvolume1file.txt
$ ls /subvol_btrfs/subvolume1/
subvolume1file.txt
Should we delete all the files from /subvol_btrfs/subvolume1/
, the copy-on-write (COW) technology in BtrFS will then create the files in /subvol_btrfs/subvolume2/backup
. We can simply copy the files back to the original location in the event of a catastrophe since they won’t be modified if the original files changes.
BtrFS In-Place Migration; Convert an ext4 Filesystem to BtrFS
In this example, I’ll show you how to convert an ext4 Filesystem to BtrFS. Since I’m running CentOS server on KVM, I’ll add secondary hard drive, create an ext4
partition, then convert it to BtrFS so that you can get the full picture of how it is done.
Add 1GB
secondary block device, this is to de done on the host machine:
syudo virsh vol-create-as default --name btrfs-sec.qcow2 1G
sudo virsh vol-list --pool default
sudo virsh attach-disk --domain cs1 --source /var/lib/libvirt/images/btrfs-sec.qcow2 --persistent --target vdc
Confirm it’s added on vm:
$ lsblk /dev/vdc
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vdc 252:32 0 1G 0 disk
Create ext4 partition:
sudo parted --script /dev/vdc "mklabel gpt mkpart primary 0% 100%"
sudo parted --script /dev/vdc print
sudo lsblk -f /dev/vdc
Mount the newly created file system, create a few files and directories then unmount the filesystem:
sudo mkdir /ext4tobtrfs
sudo mount /dev/vdc1 /ext4tobtrfs/
sudo mkdir /ext4tobtrfs/test-1-4-dir
sudo touch /ext4tobtrfs/test-file1..10.txt
sudo ls -l /ext4tobtrfs/
sudo umount /ext4tobtrfs/
Convert the filesystem to Btrfs:
# btrfs-convert -l convertedfs /dev/vdc1
create btrfs filesystem:
blocksize: 4096
nodesize: 16384
features: extref, skinny-metadata (default)
creating btrfs metadata.
copy inodes [o] [ 0/ 22]
creating ext2 image file.
set label to 'convertedfs'
cleaning up system chunk.
conversion complete.
Mount the filesystem again and view the filesystem type:
# mount /dev/vdc1 /ext4tobtrfs/
# df -hT /ext4tobtrfs/
Filesystem Type Size Used Avail Use% Mounted on
/dev/vdc1 btrfs 1022M 51M 643M 8% /ext4tobtrfs
Note that the filesystem of /ext4tobtrfs
is of type btrfs.
To view subvolumes,BtrFS information and content, use:
# btrfs filesystem show /ext4tobtrfs/
Label: 'convertedfs' uuid: 3e985770-66a0-4b85-810e-2e93182696f3
Total devices 1 FS bytes used 34.78MiB
devid 1 size 1022.00MiB used 616.25MiB path /dev/vdc1
# btrfs subvolume list /ext4tobtrfs/
ID 256 gen 6 top level 5 path ext2_saved
# ls -l /ext4tobtrfs/
total 16
drwxr-xr-x. 1 root root 10 Jan 10 13:22 ext2_saved
drwx------. 1 root root 0 Jan 10 13:14 lost+found
drwxr-xr-x. 1 root root 0 Jan 10 13:19 test-1-4-dir
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file10.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file1.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file2.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file3.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file4.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file5.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file6.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file7.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file8.txt
-rw-r--r--. 1 root root 0 Jan 10 13:19 test-file9.txt
# file /ext4tobtrfs/ext2_saved/image
/ext4tobtrfs/ext2_saved/image: Linux rev 1.0 ext4 filesystem data, UUID=7e6849f2-8560-4b9d-add8-d344ef577650 (extents) (64bit) (large files) (huge files)
> To mount subvolume or the image in ext2_saved subvolume, use:
# mount -o subvol=ext2_saved /dev/vdc1 /mnt/
# ls -l /mnt
# umount /mnt
# mount -o loop /ext4tobtrfs/ext2_saved/image /mnt/
# ls -la /mnt/
Roll back to the ext4
filesystem:
# umount /ext4tobtrfs/
# btrfs-convert -r /dev/vdc1
rollback complete.
# mount /dev/vdc1 /ext4tobtrfs/
# df -hT /ext4tobtrfs/
Filesystem Type Size Used Avail Use% Mounted on
/dev/vdc1 ext4 990M 2.6M 921M 1% /ext4tobtrfs
If you view the files in /ext4tobtrfs/, you’ll note that the directories you created on the BtrFS are gone, only those created initially on the ext4 file system are there.
convert an existing single device system
Convert an existing single device system, /dev/vdb1
in this case, into a two device, raid1
system in order to protect against a single disk failure, use the following commands:
# umount /subvol_btrfs/
# mount /dev/vdb1 /subvol_btrfs/
# btrfs device add /dev/vdb2 /subvol_btrfs/ -f
# btrfs balance start -dconvert=raid1 -mconvert=raid1 /subvol_btrfs/
BtrFS Maintenance Tasks
BtrFS filesystem in Linux will always require an Admin to know how to perform the following maintenance tasks.
1.
Verify checksums with scrub:
Open a terminal window and run:
# watch btrfs scrub status /subvol_btrfs/
Open another terminal window and run:
# btrfs scrub start /subvol_btrfs/
The watch
at the first prompt will show the scrubbing progress.
2.
Watch balance:
On one terminal run:
# watch btrfs balance status /subvol_btrfs/
On another terminal, run:
# btrfs balance start /subvol_btrfs/
3.
Defragment the filesystem recursively,
# btrfs filesystem defragment -r /subvol_btrfs/
Replacing failed devices on a btrfs file system
If a device is missing or the super block is corrupted, the filesystem will need to be mounted in a degraded mode before troubleshooting. Example is shown below:
# mkfs.btrfs -m raid1 /dev/vdb /dev/vdc /dev/vdd
# mount -o degraded /dev/vdb /mnt
# btrfs device delete missing /mnt
Conclusion
In this guide on BtrFS filesystem in Linux, I provided comprehensive coverage of BtrFS filesystem in Linux, starting from the basics to doing hands-on configurations. BtrFS is truly something to start working with now as it will be the default enterprise filesystem for years to come. We saw how BtrFS filesystem in Linux simplifies filesystem and volume management by bundling the two in a single-task-work model. Hope you had fun working with BtrFS.
References
1.
The man page btrfs(8)
is a good place to start. It covers all important management commands, which includes:
- Subvoume and snapshots management
- Use of
scrub,balance and defragment
commands - Filesystem management with
manage
command - The
device
commands for managing devices.
Other man pages include:
To learn more on BtrFS filesystem in Linux administration, please refer to the following man pages.
# man mkfs.btrfs
# man 5 btrfs
# man 8 fsck.btrfs