ZFS: Difference between revisions

imported>2r
Update to OpenZFS documentation.
imported>2r
Updates
Line 55: Line 55:
</syntaxhighlight>
</syntaxhighlight>


== How to use the auto-snapshotting service ==
== Take a snapshot automatically ==


To auto-snapshot a ZFS filesystem or a ZVol, set its <code>com.sun:auto-snapshot</code> property to <code>true</code>, like this:
To auto-snapshot a ZFS filesystem or a ZVol, set its <code>com.sun:auto-snapshot</code> property to <code>true</code>, like this:
Line 63: Line 63:
</syntaxhighlight>
</syntaxhighlight>


(Note that by default this property will be inherited by all descendent datasets, but you can set their properties to false if you prefer.)
(Note that by default this property will be inherited by all descendant datasets, but you can set their properties to false if you prefer.)


Then, to enable the auto-snapshot service, add this to your <code>configuration.nix</code>:
Then, to enable the auto-snapshot service, add this to your <code>configuration.nix</code>:
Line 91: Line 91:
This would disable only weekly snapshots on the given filesystem.
This would disable only weekly snapshots on the given filesystem.


== Installing NixOS on a ZFS root filesystem ==
== Unlock encrypted zfs via ssh on boot ==
 
Another guide titled "Encrypted ZFS mirror with mirrored boot on NixOS" is available at https://elis.nu/blog/2019/08/encrypted-zfs-mirror-with-mirrored-boot-on-nixos/.
 
OpenZFS document for NixOS Root on ZFS is also available:
https://openzfs.github.io/openzfs-docs/Getting%20Started/NixOS/Root%20on%20ZFS.html
 
This guide is based on the above OpenZFS guide and the NixOS installation instructions in the [https://nixos.org/manual/nixos/stable/index.html#sec-installation NixOS manual].
 
=== Pool layout considerations ===
 
it is important to keep <code>/nix</code> and the rest of the filesystem in
different sections of the dataset hierarchy, like this:
 
<syntaxhighlight lang="text">
rpool/
      nixos/
            nix        mounted to /nix
      userdata/
            root        mounted to /
            home        mounted to /home
            ...
</syntaxhighlight>
 
the name of <code>nixos</code> and <code>userdata/</code> can change, but them being peers is important.
 
ZFS can take consistent and atomic snapshots recursively down a dataset's hierarchy. Since Nix is good at being Nix, most users will want their server's ''data'' backed up, and don't mind reinstalling NixOS and then restoring data. If this is sufficient, only snapshot and back up the <code>userdata</code> hierarchy. Users who want to be able to restore a service with only ZFS snapshots will want to snapshot the entire tree, at the significant expense of snapshotting the Nix store.
 
=== Dataset properties ===
 
The following is a list of recommended dataset properties which have no drawbacks under regular uses:
 
* <code>compression=lz4</code> (<code>zstd</code> for higher-end machines)
* <code>xattr=sa</code> for Journald
* <code>acltype=posixacl</code> also for Journald
* <code>relatime=on</code> for reduced stress on SSDs
 
The following is a list of dataset properties which are often useful, but do have drawbacks:
 
* <code>atime=off</code> disables if a file's access time is updated when the file is read. This can result in significant performance gains, but might confuse some software like mailers.
 
==== Journald ====
 
Journald requires some properties for <code>journalctl</code> to work for non-root users. The dataset containing <code>/var/log/journal</code> (probably the <code>/</code> dataset for simple configurations) should be created with <code>xattr=sa</code> and <code>acltype=posixacl</code>.
 
For example:
 
<syntaxhighlight lang="console">
# zpool create  -O xattr=sa -O acltype=posixacl rpool ...
</syntaxhighlight>
 
or:
<syntaxhighlight lang="console">
# zfs create -o xattr=sa -o acltype=posixacl rpool/root
</syntaxhighlight>
 
If you have already created the dataset, these properties can be set later:
 
<syntaxhighlight lang="console">
# zfs set xattr=sa acltype=posixacl rpool/root
</syntaxhighlight>
 
=== Environment setup ===
For convenience set a shell variable with the paths to your disk(s):
 
For multiple disks:
<syntaxhighlight lang="console">
$ disk=(/dev/disk/by-id/foo /dev/disk/by-id/bar)
</syntaxhighlight>
 
For a single disk:
<syntaxhighlight lang="console">
$ disk=/dev/disk/by-id/foo
</syntaxhighlight>
 
=== Partitioning the disks ===
<syntaxhighlight lang="bash">
# Multiple disks
for x in "${disk[@]}"; do
  sudo parted "$x" -- mklabel gpt
  sudo parted "$x" -- mkpart primary 512MiB -8GiB
  sudo parted "$x" -- mkpart primary linux-swap -8GiB 100%
  sudo parted "$x" -- mkpart ESP fat32 1MiB 512MiB
  sudo parted "$x" -- set 3 esp on
 
  sudo mkswap -L swap "${x}-part2"
  sudo mkfs.fat -F 32 -n EFI "${x}-part3"
done
 
# Single disk
sudo parted "$disk" -- mklabel gpt
sudo parted "$disk" -- mkpart primary 512MiB -8GiB
sudo parted "$disk" -- mkpart primary linux-swap -8GiB 100%
sudo parted "$disk" -- mkpart ESP fat32 1MiB 512MiB
sudo parted "$disk" -- set 3 esp on
 
sudo mkswap -L swap "${disk}-part2"
sudo mkfs.fat -F 32 -n EFI "${disk}-part3"
</syntaxhighlight>
 
=== Laying out the filesystem hierarchy ===
==== Create the ZFS pool ====
<syntaxhighlight lang="bash">
sudo zpool create \
  -o ashift=12 \
  -o autotrim=on \
  -R /mnt \
  -O canmount=off \
  -O mountpoint=none \
  -O acltype=posixacl \
  -O compression=zstd \
  -O dnodesize=auto \
  -O normalization=formD \
  -O relatime=on \
  -O xattr=sa \
  -O encryption=aes-256-gcm \
  -O keylocation=prompt \
  -O keyformat=passphrase \
  rpool \
  mirror \
  "${disk[@]/%/-part1}"
</syntaxhighlight>
 
For a single disk, remove <code>mirror</code> and specify just <code>"${disk}-part1"</code> as the device.
 
If you do not want the entire pool to be encrypted, remove the options <code>encryption</code> <code>keylocation</code> and <code>keyformat</code>.
 
==== Create the ZFS datasets ====
Since zfs is a copy-on-write filesystem even for deleting files disk space is needed. Therefore it should be avoided to run out of disk space. Luckily it is possible to reserve disk space for datasets to prevent this.
<syntaxhighlight lang="console">
# zfs create -o refreservation=1G -o mountpoint=none rpool/reserved
</syntaxhighlight>
 
Create the datasets for the operating system.  (Experienced ZFS users may wish to split up the OS datasets further.)
<syntaxhighlight lang="bash">
sudo zfs create -o canmount=on -o mountpoint=/ rpool/nixos
sudo zfs create rpool/nixos/nix
</syntaxhighlight>
 
Create datasets for user home directories.  If you opted to not encrypt the entire pool, you can encrypt just the userdata by specifying the same ZFS properties when creating rpool/userdata, and the child datasets will also be encrypted.
<syntaxhighlight lang="bash">
sudo zfs create -o canmount=off -o mountpoint=/ rpool/userdata
sudo zfs create -o canmount=on rpool/userdata/home
sudo zfs create -o canmount=on -o mountpoint=/root rpool/userdata/home/root
# Create child datasets of home for users' home directories.
sudo zfs create -o canmount=on rpool/userdata/home/alice
sudo zfs create -o canmount=on rpool/userdata/home/bob
sudo zfs create -o canmount=on rpool/userdata/home/...
</syntaxhighlight>
 
==== Mount <code>/boot</code> ====
We are going to use the default NixOS bootloader systemd-boot, which can install to only one device.  You will want to periodically rsync <code>/mnt/boot</code> to <code>/mnt/boot2</code> so that you can always boot your system if either disk fails.
<syntaxhighlight lang="bash">
sudo mkdir /mnt/boot /mnt/boot2
sudo mount "${disk[0]}-part3" /mnt/boot
sudo mount "${disk[1]}-part3" /mnt/boot2
</syntaxhighlight>
 
Or for single-disk systems:
<syntaxhighlight lang="bash">
sudo mkdir /mnt/boot
sudo mount "${disk}-part3" /mnt/boot
</syntaxhighlight>
 
=== Configure the NixOS system ===
Generate the base NixOS configuration files.
<syntaxhighlight lang="console">
# nixos-generate-config --root /mnt
</syntaxhighlight>
 
Open <code>/mnt/etc/nixos/configuration.nix</code> in a text editor and change <code>imports</code> to include <code>hardware-configuration-zfs.nix</code> instead of the default <code>hardware-configuration.nix</code>.  We will be editing this file later.
 
Now Add the following block of code anywhere (how you organise your <code>configuration.nix</code> is up to you):
<syntaxhighlight lang="nix">
# ZFS boot settings.
boot.supportedFilesystems = [ "zfs" ];
boot.zfs.devNodes = "/dev/";
</syntaxhighlight>
 
Now set <code>networking.hostName</code> and <code>networking.hostId</code>.  The host ID must be an eight digit hexadecimal value.  You can derive it from the <code>/etc/machine-id</code>, taking the first eight characters; from the hostname, by taking the first eight characters of the hostname's md5sum,
<syntaxhighlight lang="console">
$ hostname | md5sum | head -c 8
</syntaxhighlight>
or by taking eight hexadecimal characters from <code>/dev/urandom</code>,
<syntaxhighlight lang="console">
$ tr -dc 0-9a-f < /dev/urandom | head -c 8
</syntaxhighlight>
 
Now add some ZFS maintenance settings:
<syntaxhighlight lang="nix">
# ZFS maintenance settings.
services.zfs.trim.enable = true;
services.zfs.autoScrub.enable = true;
services.zfs.autoScrub.pools = [ "rpool" ];
</syntaxhighlight>
 
You may wish to also add <syntaxhighlight lang="nix" inline>services.zfs.autoSnapshot.enable = true;</syntaxhighlight> and set the ZFS property <code>com.sun:auto-snapshot</code> to <code>true</code> on <code>rpool/userdata</code> to have automatic snapshots.  (See [[#How to use the auto-snapshotting service]] earlier on this page.)
 
Now open <code>/mnt/etc/nixos/hardware-configuration-zfs.nix</code>.
 
* Add <syntaxhighlight lang="nix" inline>options = [ "zfsutil" ];</syntaxhighlight> to every ZFS <code>fileSystems</code> block.
* Add <syntaxhighlight lang="nix" inline>options = [ "X-mount.mkdir" ];</syntaxhighlight> to <syntaxhighlight lang="nix" inline>fileSystems."/boot"</syntaxhighlight> and <syntaxhighlight lang="nix" inline>fileSystems."/boot2"</syntaxhighlight>.
* Replace <code>swapDevices</code> with the following, replacing <code>DISK1</code> and <code>DISK2</code> with the names of your disks.
 
<syntaxhighlight lang="nix">
swapDevices = [
  { device = "/dev/disk/by-id/DISK1-part2";
    randomEncryption = true;
  }
  { device = "/dev/disk/by-id/DISK2-part2";
    randomEncryption = true;
  }
];
</syntaxhighlight>
For single-disk installs, remove the second entry of this array.
 
==== Optional additional setup for encrypted ZFS ====
===== Unlock encrypted zfs via ssh on boot =====


{{note|As of 22.05, rebuilding your config with the below directions may result in a situation where, if you want to revert the changes, you may need to do some pretty hairy nix-store manipulation to be able to successfully rebuild, see https://github.com/NixOS/nixpkgs/issues/101462#issuecomment-1172926129}}
{{note|As of 22.05, rebuilding your config with the below directions may result in a situation where, if you want to revert the changes, you may need to do some pretty hairy nix-store manipulation to be able to successfully rebuild, see https://github.com/NixOS/nixpkgs/issues/101462#issuecomment-1172926129}}
Line 355: Line 138:
* If your network card isn't started, you'll need to add the according kernel module to the initrd as well, e.g. <syntaxhighlight lang="nix" inline>boot.initrd.kernelModules = [ "r8169" ];</syntaxhighlight>
* If your network card isn't started, you'll need to add the according kernel module to the initrd as well, e.g. <syntaxhighlight lang="nix" inline>boot.initrd.kernelModules = [ "r8169" ];</syntaxhighlight>


===== Import and unlock multiple encrypted pools/dataset at boot =====
== Import and unlock multiple encrypted pools/dataset at boot ==
If you have not only one encrypted pool/dataset but multiple ones and you want to import and unlock them at boot, so that they can be automounted using the hardware-configuration.nix, you could just amend the <code>boot.initrd.network.postCommands</code> option.
If you have not only one encrypted pool/dataset but multiple ones and you want to import and unlock them at boot, so that they can be automounted using the hardware-configuration.nix, you could just amend the <code>boot.initrd.network.postCommands</code> option.


Line 382: Line 165:
When you login by SSH into the box or when you have physical access to the machine itself, you will be prompted to supply the unlocking password for your zroot and tankXXX pools.
When you login by SSH into the box or when you have physical access to the machine itself, you will be prompted to supply the unlocking password for your zroot and tankXXX pools.


=== Install NixOS ===
== ZFS trimming ==
<syntaxhighlight lang="console">
# nixos-install --show-trace --root /mnt
</syntaxhighlight>
<code>--show-trace</code> will show you where exactly things went wrong if <code>nixos-install</code> fails.  To take advantage of all cores on your system, also specify <code>--max-jobs n</code> replacing <code>n</code> with the number of cores on your machine.
 
== ZFS trim support for SSDs ==
 
ZFS 0.8 now also features trim support for SSDs.
 
=== How to use ZFS trimming ===


ZFS trimming works on one or more zpools and will trim each ssd inside it. There are two modes of it. One mode will manually trim the specified pool and the other will auto-trim pools. However the main difference is, that auto-trim will skip ranges that it considers too small while manually issued trim will trim all ranges.
ZFS trimming works on one or more zpools and will trim each ssd inside it. There are two modes of it. One mode will manually trim the specified pool and the other will auto-trim pools. However the main difference is, that auto-trim will skip ranges that it considers too small while manually issued trim will trim all ranges.
Line 512: Line 285:
# zpool scrub $pool
# zpool scrub $pool
</syntaxhighlight>
</syntaxhighlight>
== Mount datasets without legacy mountpoint ==
Contrary to conventional wisdom, <code>mountpoint=legacy</code> is not required for mounting datasets. The trick is to use <code>mount -t zfs -o zfsutil path/to/dataset /path/to/mountpoint</code>.
Also, legacy mountpoints are also inconvenient in that the mounts can not be natively handled by <code>zfs mount</code> command, hence <code>legacy</code> in the name.
An example configuration of mounting non-legacy dataset is the following:
<syntaxhighlight lang="nix">
{
  fileSystems."/tank" =
    { device = "tank_pool/data";
      fsType = "zfs"; options = [ "zfsutil" ];
    };
}
</syntaxhighlight>
An alternative is to set <syntaxhighlight lang="nix" inline>boot.zfs.extraPools = [ pool_name ];</syntaxhighlight>, which is recommended by the documentation if you have many zfs filesystems.


== NFS share ==
== NFS share ==
Line 548: Line 304:
</syntaxhighlight>
</syntaxhighlight>
For more options, see <code>man 5 exports</code>.
For more options, see <code>man 5 exports</code>.
== See also ==
This article on how to setup encrypted ZFS on Hetzner: <https://mazzo.li/posts/hetzner-zfs.html>.