ZFS: Difference between revisions

imported>Tyleroconnell
m Adding a solution from Matrix when ZFS doesn't automatically discover zpools after reboot within a VM
imported>Aidalgol
Rewrite section on installing on ZFS root
Line 179: Line 179:
This would disable only weekly snapshots on the given filesystem.
This would disable only weekly snapshots on the given filesystem.


== How to install NixOS on a ZFS root filesystem ==
== Installing NixOS on a ZFS root filesystem ==


Another guide titled "Encrypted ZFS mirror with mirrored boot on NixOS" is available at https://elis.nu/blog/2019/08/encrypted-zfs-mirror-with-mirrored-boot-on-nixos/.
Another guide titled "Encrypted ZFS mirror with mirrored boot on NixOS" is available at https://elis.nu/blog/2019/08/encrypted-zfs-mirror-with-mirrored-boot-on-nixos/.
Line 185: Line 185:
OpenZFS document for NixOS Root on ZFS is also available:
OpenZFS document for NixOS Root on ZFS is also available:
https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/Root%20on%20ZFS.html
https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/Root%20on%20ZFS.html
This guide is based on the above OpenZFS guide and the NixOS installation instructions in the [https://nixos.org/manual/nixos/stable/index.html#sec-installation NixOS manual].


=== Pool Layout Considerations ===
=== Pool Layout Considerations ===
Line 193: Line 195:
<syntaxhighlight lang="none">
<syntaxhighlight lang="none">
rpool/
rpool/
       local/
       nixos/
             nix        mounted to /nix
             nix        mounted to /nix
       safe/
       userdata/
             root        mounted to /
             root        mounted to /
             home        mounted to /home
             home        mounted to /home
Line 201: Line 203:
</syntaxhighlight>
</syntaxhighlight>


the name of `local` and `safe` can change, but them being peers
the name of `nixos` and `userdata/` can change, but them being peers is important.
is important.


ZFS can take consistent and atomic snapshots recursively down a
ZFS can take consistent and atomic snapshots recursively down a dataset's hierarchy. Since Nix is good at being Nix, most users will want their server's ''data'' backed up, and don't mind reinstalling NixOS and then restoring data. If this is sufficient, only snapshot and back up the <code>userdata</code> hierarchy. Users who want to be able to restore a service with only ZFS snapshots will want to snapshot the entire tree, at the significant expense of snapshotting the Nix store.
dataset's hierarchy. Since Nix is good at being Nix, most users will want their server's ''data'' backed up, and don't
mind reinstalling NixOS and then restoring data. If this is sufficient, only snapshot and back up the <code>safe</code> hierarchy. Users who want to be able to restore a service with only ZFS snapshots will want to snapshot the entire tree, at the significant expense of snapshotting the Nix store.


=== Dataset Properties ===
=== Dataset Properties ===
Line 212: Line 211:
The following is a list of recommended dataset properties which have no drawbacks under regular uses:
The following is a list of recommended dataset properties which have no drawbacks under regular uses:


* <code>compression=lz4</code>
* <code>compression=lz4</code> (<code>zstd</code> for higher-end machines)
* <code>xattr=sa</code> for Journald
* <code>xattr=sa</code> for Journald
* <code>acltype=posixacl</code> also for Journald
* <code>acltype=posixacl</code> also for Journald
* <code>relatime=on</code> for reduced stress on SSDs


The following is a list of dataset properties which are often useful, but do have drawbacks:
The following is a list of dataset properties which are often useful, but do have drawbacks:
Line 241: Line 241:
</syntaxhighlight>
</syntaxhighlight>


=== Single-disk ===
=== Environment Setup ===
For convenience set a shell variable with the paths to your disk(s):


These instructions will get you started with a single-disk ZFS setup. If you're interested in setting up RAID, see below.
For multiple disks:
<syntaxhighlight lang="console">
$ disk=(/dev/disk/by-id/foo /dev/disk/by-id/bar)
</syntaxhighlight>


<syntaxhighlight lang="bash">
For a single disk:
# Always use the by-id aliases for devices, otherwise ZFS can choke on imports.
<syntaxhighlight lang="console">
DISK=/dev/disk/by-id/...
$ disk=/dev/disk/by-id/foo
 
# Partition 2 will be the boot partition, needed for legacy (BIOS) boot
sgdisk -a1 -n2:34:2047 -t2:EF02 $DISK
# If you need EFI support, make an EFI partition:
sgdisk -n3:1M:+512M -t3:EF00 $DISK
# Partition 1 will be the main ZFS partition, using up the remaining space on the drive.
sgdisk -n1:0:0 -t1:BF01 $DISK
 
# Create the pool. If you want to tweak this a bit and you're feeling adventurous, you
# might try adding one or more of the following additional options:
# To disable writing access times:
#  -O atime=off
# To enable filesystem compression:
#  -O compression=lz4
# To improve performance of certain extended attributes:
#  -O xattr=sa
# For systemd-journald posixacls are required
#  -O  acltype=posixacl
# To specify that your drive uses 4K sectors instead of relying on the size reported
# by the hardware (note small 'o'):
#  -o ashift=12
#
# The 'mountpoint=none' option disables ZFS's automount machinery; we'll use the
# normal fstab-based mounting machinery in Linux.
# '-R /mnt' is not a persistent property of the FS, it'll just be used while we're installing.
zpool create -O mountpoint=none rpool $DISK-part1
 
# Create the filesystems. This layout is designed so that /home is separate from the root
# filesystem, as you'll likely want to snapshot it differently for backup purposes. It also
# makes a "nixos" filesystem underneath the root, to support installing multiple OSes if
# that's something you choose to do in future.
zfs create -o mountpoint=legacy rpool/root
zfs create -o mountpoint=legacy rpool/root/nixos
zfs create -o mountpoint=legacy rpool/home
 
# Mount the filesystems manually. The nixos installer will detect these mountpoints
# and save them to /mnt/nixos/hardware-configuration.nix during the install process.
mount -t zfs rpool/root/nixos /mnt
mkdir /mnt/home
mount -t zfs rpool/home /mnt/home
 
# If you need to boot EFI, you'll need to set up /boot as a non-ZFS partition.
mkfs.vfat $DISK-part3
mkdir /mnt/boot
mount $DISK-part3 /mnt/boot
 
# Generate the NixOS configuration, as per the NixOS manual.
nixos-generate-config --root /mnt
 
# Edit /mnt/etc/nixos/configuration.nix and add the following line:
## ---8<-------------------------8<---
  boot.supportedFilesystems = [ "zfs" ];
## ---8<-------------------------8<---
 
# Also, make sure you set the networking.hostId option, which ZFS requires:
## ---8<-------------------------8<---
  networking.hostId = "<random 8-digit hex string>";
## ---8<-------------------------8<---
# See https://nixos.org/nixos/manual/options.html#opt-networking.hostId for more.
 
# Continue with installation!
nixos-install
</syntaxhighlight>
</syntaxhighlight>


=== With RAID ===
=== Partitioning the disks ===
 
Here's an example of how to create a ZFS root pool using 4 disks in RAID-10 mode (striping+mirroring), create a ZFS root+home filesystems and install NixOS on them:
(thanks to Danny Wilson for the instructions)
 
<syntaxhighlight lang="bash">
<syntaxhighlight lang="bash">
# Verify that the installer environment has loaded the ZFS kernel module (default since 18.09)
# Multiple disks
lsmod | grep zfs
for x in "${disk[@]}"; do
  sudo parted "$x" -- mklabel gpt
  sudo parted "$x" -- mkpart primary 512MiB -8GiB
  sudo parted "$x" -- mkpart primary linux-swap -8GiB 100%
  sudo parted "$x" -- mkpart ESP fat32 1MiB 512MiB
  sudo parted "$x" -- set 3 esp on


# Create boot partition and (zfs) data partition
  sudo mkswap -L swap "${x}-part2"
# For information on the ZFS partitions see https://openzfs.github.io/openzfs-docs/Getting%20Started/Ubuntu/Ubuntu%2018.04%20Root%20on%20ZFS.html#step-2-disk-formatting
  sudo mkfs.fat -F 32 -n EFI "${x}-part3"
# The linked guide assumes a pure ZFS setup which is not the same suitable for this guide. You will have to create the partitions for the /boot raid by yourself.
done
fdisk /dev/sda


# Copy the partition table to the other disks
# Single disk
sfdisk --dump /dev/sda | sfdisk /dev/sdb
sudo parted "$disk" -- mklabel gpt
sfdisk --dump /dev/sda | sfdisk /dev/sdc
sudo parted "$disk" -- mkpart primary 512MiB -8GiB
sfdisk --dump /dev/sda | sfdisk /dev/sdd
sudo parted "$disk" -- mkpart primary linux-swap -8GiB 100%
sudo parted "$disk" -- mkpart ESP fat32 1MiB 512MiB
sudo parted "$disk" -- set 3 esp on


# Create a RAID-10 ZFS pool. Use "-o ashift=12" to create your ZFS pool with 4K sectors
sudo mkswap -L swap "${disk}-part2"
# enable posixacls, otherwise journalctl is broken for users
sudo mkfs.fat -F 32 -n EFI "${disk}-part3"
zpool create -o ashift=12 -o altroot=/mnt -O  acltype=posixacl -O xattr=sa rpool mirror /dev/sda2 /dev/sdb2 mirror /dev/sdc2 /dev/sdd2
</syntaxhighlight>


# Create the filesystems
=== Laying out the filesystem hierarchy ===
zfs create -o mountpoint=none rpool/root
In this guide, we will be using a <code>tmpfs</code> for <code>/</code>, since no system state will be stored outside of the ZFS datasets we will create.
zfs create -o mountpoint=legacy rpool/root/nixos
<syntaxhighlight lang="console">
zfs create -o mountpoint=legacy rpool/home
$ sudo mount -t tmpfs none /mnt
zfs set compression=lz4 rpool/home    # compress the home directories automatically
</syntaxhighlight>


# Mount the filesystems manually
==== Create the ZFS pool ====
mount -t zfs rpool/root/nixos /mnt
<syntaxhighlight lang="console">
$ sudo zpool create \
  -o ashift=12 \
  -o autotrim=on \
  -R /mnt \
  -O canmount=off \
  -O mountpoint=none \
  -O acltype=posixacl \
  -O compression=zstd \
  -O dnodesize=auto \
  -O normalization=formD \
  -O relatime=on \
  -O xattr=sa \
  -O encryption=aes-256-gcm \
  -O keylocation=prompt \
  -O keyformat=passphrase \
  rpool \
  mirror \
  "${disk[@]/%/-part1}"
</syntaxhighlight>


mkdir /mnt/home
For a single disk, remove <code>mirror</code> and specify just <code>"${disk}"</code> as the device.
mount -t zfs rpool/home /mnt/home


# Create a raid mirror of the first partitions for /boot (GRUB)
If you do not want the entire pool to be encrypted, remove the options <code>encryption</code> <code>keylocation</code> and <code>keyformat</code>.
mdadm --create /dev/md127 --metadata=0.90 --level=1 --raid-devices=4 /dev/sd[a,b,c,d]1
mkfs.ext4 -m 0 -L boot -j /dev/md127


mkdir /mnt/boot
==== Create the ZFS datasets ====
mount /dev/md127 /mnt/boot
Since zfs is a copy-on-write filesystem even for deleting files disk space is needed. Therefore it should be avoided to run out of disk space. Luckily it is possible to reserve disk space for datasets to prevent this.
<syntaxhighlight lang="bash>
sudo zfs create -o refreservation=1G -o mountpoint=none rpool/reserved
</syntaxhighlight>


# Generate the NixOS configuration, as per the NixOS manual
Create the datasets for the operating system.
nixos-generate-config --root /mnt
<syntaxhighlight lang="bash>
sudo zfs create -o canmount=off -o mountpoint=/ rpool/nixos
sudo zfs create -o canmount=on rpool/nixos/nix
sudo zfs create -o canmount=on rpool/nixos/etc
sudo zfs create -o canmount=on rpool/nixos/var
sudo zfs create -o canmount=on rpool/nixos/var/lib
sudo zfs create -o canmount=on rpool/nixos/var/log
sudo zfs create -o canmount=on rpool/nixos/var/spool
</syntaxhighlight>


# Now edit the generated hardware config:
Create datasets for user home directories.  If you opted to not encrypt the entire pool, you can encrypt just the userdata by specifying the same ZFS properties when creating rpool/userdata, and the child datasets will also be encrypted.
nano /mnt/etc/nixos/hardware-configuration.nix
<syntaxhighlight lang="bash>
sudo zfs create -o canmount=off -o mountpoint=/ rpool/userdata
sudo zfs create -o canmount=on rpool/userdata/home
sudo zfs create -o canmount=on -o mountpoint=/root rpool/userdata/home/root
# Create child datasets of home for users' home directories.
sudo zfs create -o canmount=on rpool/userdata/home/alice
sudo zfs create -o canmount=on rpool/userdata/home/bob
sudo zfs create -o canmount=on rpool/userdata/home/...
</syntaxhighlight>


## ---8<-------------------------8<---
==== Mount /boot ====
# This is what you want:
We are going to use the default NixOS bootloader systemd-boot, which can install to only one device.  You will want to periodically rsync <code>/mnt/boot</code> to <code>/mnt/boot2</code> so that you can always boot your system if either disk fails.
<syntaxhighlight lang="bash>
sudo mkdir /mnt/boot /mnt/boot2
sudo mount "${disk[0]}-part3" /mnt/boot
sudo mount "${disk[1]}-part3" /mnt/boot2
</syntaxhighlight>


  fileSystems."/" =
Or for single-disk systems:
    { device = "rpool/root/nixos";
<syntaxhighlight lang="bash>
      fsType = "zfs";
sudo mkdir /mnt/boot
    };
sudo mount "${disk}-part3" /mnt/boot
</syntaxhighlight>


  fileSystems."/home" =
=== Configure the NixOS system ===
    { device = "rpool/home";
Generate the base NixOS configuration files.
      fsType = "zfs";
<syntaxhighlight lang="bash">
    };
$ nixos-generate-config --root /mnt
</syntaxhighlight>


  fileSystems."/boot" =
Open <code>/mnt/etc/nixos/configuration.nix</code> in a text editor and change <code>imports</code> to include <code>hardware-configuration-zfs.nix</code> instead of the default <code>hardware-configuration.nix</code>.  We will be editing this file later.
    { device = "/dev/md127";
      fsType = "ext4";
    };
## ---8<-------------------------8<---


# configuration.nix needs an adjustment:
Now Add the following block of code anywhere (how you organise your <code>configuration.nix</code> is up to you):
nano /mnt/etc/nixos/configuration.nix
<syntaxhighlight lang="nix">
 
   # ZFS boot settings.
## ---8<-------------------------8<---
# This is some more of what you want:
 
   boot.loader.grub.devices = [ "/dev/sda" "/dev/sdb" "/dev/sdc" "/dev/sdd" ];
   boot.supportedFilesystems = [ "zfs" ];
   boot.supportedFilesystems = [ "zfs" ];
## ---8<-------------------------8<---
  boot.zfs.devNodes = "/dev/";
</syntaxhighlight>


# Ready to go!
Now set <code>networking.hostName</code> and <code>networking.hostId</code>.  The host ID must be an eight digit hexadecimal value.  You can derive it from the <code>/etc/machine-id</code>, taking the first eight characters; from the hostname, by taking the first eight characters of the hostname's md5sum,
nixos-install
<syntaxhighlight lang="console">
$ hostname | md5sum | head -c 8
</syntaxhighlight>
or by taking eight hexadecimal characters from <code>/dev/urandom</code>,
<syntaxhighlight lang="console">
$ tr -dc 0-9a-f < /dev/urandom | head -c 8
</syntaxhighlight>
</syntaxhighlight>


== Encrypted ZFS ==
Now add some ZFS maintenance settings:
 
<syntaxhighlight lang="nix">
Assuming that a zpool named <code>zroot</code> has been already created as described.
  # ZFS maintenance settings.
Encrypted datasets can be added on top as follow:
  services.zfs.trim.enable = true;
: posixacl are needed for journald
  services.zfs.autoScrub.enable = true;
<syntaxhighlight lang="bash">
  services.zfs.autoScrub.pools = [ "rpool" ];
zfs create -o  acltype=posixacl -o xattr=sa -o encryption=aes-256-gcm -o keyformat=passphrase -o mountpoint=none zroot/root
</syntaxhighlight>
</syntaxHighlight>


Instead of encrypting just a dataset (and all its child datasets) you can also directly encrypt the whole pool upon creation:
You may wish to also add <code>services.zfs.autoSnapshot.enable = true;</code> and set the ZFS property <code>com.sun:auto-snapshot</code> to <code>true</code> on <code>rpool/userdata</code> to have automatic snapshots.  (See [[#How to use the auto-snapshotting service]] earlier on this page.)
<syntaxhighlight lang="bash">
zpool create -o ashift=12 -o altroot="/mnt" -O mountpoint=none -O encryption=aes-256-gcm -O keyformat=passphrase zroot /dev/sdxy
</syntaxHighlight>


All child datasets will inherit the encryption.
Now open <code>/mnt/etc/nixos/hardware-configuration-zfs.nix</code>.


Note that using grub to boot directly from zfs with encryption enabled might not work at the moment, so a separate boot partition is required.
* Add <code>options = [ "zfsutil" ];</code> to every ZFS <code>fileSystems</code> block.
* Add <code>options = [ "X-mount.mkdir" ];</code> to <code>fileSystems."/boot"</code> and <code>fileSystems."/boot2"</code>.
* Replace <code>swapDevices</code> with the following, replacing <code>DISK1</code> and <code>DISK2</code> with the names of your disks.


A full encrypted nixos installation on an UEFI system could look like this:
<syntaxhighlight lang="nix">
<syntaxhighlight lang="bash">
  swapDevices = [
zfs create -o mountpoint=legacy -o sync=disabled zroot/root/tmp
    { device = "/dev/disk/by-id/foo-part2";
zfs create -o mountpoint=legacy -o com.sun:auto-snapshot=true zroot/root/home
      randomEncryption = true;
zfs create -o mountpoint=legacy -o com.sun:auto-snapshot=true zroot/root/nixos
    }
</syntaxHighlight>
    { device = "/dev/disk/by-id/bar-part2";
 
      randomEncryption = true;
<syntaxhighlight lang="bash">
    }
mount -t zfs zroot/root/nixos /mnt
  ];
mkdir /mnt/{home,tmp,boot}
</syntaxhighlight>
</syntaxHighlight>
For single-disk installs, remove the second entry of this array.
: assuming that /dev/sda1 is the boot partition
<syntaxhighlight lang="bash">
mkfs.vfat /dev/sda1
mount /dev/sda1 /mnt/boot/
</syntaxHighlight>
 
<syntaxhighlight lang="bash">
mount -t zfs zroot/root/home /mnt/home/
mount -t zfs zroot/root/tmp /mnt/tmp/
</syntaxHighlight>
 
<syntaxhighlight lang="bash">
nixos-generate-config  --root /mnt
</syntaxHighlight>
 
To unlock the zfs dataset at root also the <code>boot.zfs.requestEncryptionCredentials</code> option must be set to <code>true</code>. Note that at the moment one can only use passphrases (<code>keylocation=prompt</code>) for pools that are mounted as the root fs. Data pools are mounted by a background systemd service and need a key (<code>keylocation=file://</code>). A key file could be for example put on a root filesystem if it is encrypted.
 
If the key is not on the root filesystem, you will also need to set <code>zfs-import-poolname.serviceConfig.RequiresMountsFor=/path/to/key</code>, where <code>poolname</code> is the name of the data pool. This makes sure that systemd will mount the filesystem for <code>/path/to/key</code> first before importing the zfs pool.
 
=== Unlock encrypted zfs via ssh on boot ===


==== Optional additional setup for encrypted ZFS ====
===== Unlock encrypted zfs via ssh on boot =====
In case you want unlock a machine remotely (after an update), having an ssh service in initrd for the password prompt is handy:
In case you want unlock a machine remotely (after an update), having an ssh service in initrd for the password prompt is handy:


Line 486: Line 450:
* If your network card isn't started, you'll need to add the according kernel module to the initrd as well, e.g. <code>boot.initrd.kernelModules = [ "r8169" ];</code>
* If your network card isn't started, you'll need to add the according kernel module to the initrd as well, e.g. <code>boot.initrd.kernelModules = [ "r8169" ];</code>


=== Import and unlock multiple encrypted pools/dataset at boot ===
===== Import and unlock multiple encrypted pools/dataset at boot =====
 
If you have not only one encrypted pool/dataset but multiple ones and you want to import and unlock them at boot, so that they can be automounted using the hardware-configuration.nix, you could just amend the <code>boot.initrd.network.postCommands</code> option.
If you have not only one encrypted pool/dataset but multiple ones and you want to import and unlock them at boot, so that they can be automounted using the hardware-configuration.nix, you could just amend the <code>boot.initrd.network.postCommands</code> option.


Line 513: Line 476:


When you login by SSH into the box or when you have physical access to the machine itself, you will be prompted to supply the unlocking password for your zroot and tankXXX pools.
When you login by SSH into the box or when you have physical access to the machine itself, you will be prompted to supply the unlocking password for your zroot and tankXXX pools.
=== Install NixOS ===
<syntaxhighlight lang="console">
$ nixos-install --show-trace --root /mnt
</syntaxhighlight>
<code>--show-trace</code> will show you where exactly things went wrong if <code>nixos-install</code> fails.  To take advantage of all cores on your system, also specify <code>--max-jobs n</code> replacing <code>n</code> with the number of cores on your machine.


== ZFS Trim Support for SSDs ==
== ZFS Trim Support for SSDs ==