Proxmox Homeserver Backup & restore

Proxmox Homeserver Backup

Backup introduction

Did
you already heard about the three remarks that you never want to hear
from your system administrator or consulting.

      1. I never saw this before!

      2. Oh! It’s funny!

      3. You have a Backup, do you?

Of course we must make sure that we are backing up our new server, then,
even if Proxmox is really stable an issue or error can always happen.
And don’t stop your backup after a few weeks or months because you
never had such an issue, because the in this case Murphy’s law will
strike, you probably already heard about it, it says that there is no
reason why the worst cannot happen. And remember, a good backup is
the one you never need, but once you need it your data should not be
days or weeks old.

This means we need an external disk or storage outside our proxmox server,
preferably in a separate room, I need here-for my old NAS server zhat
is loacted in the basement at home. You can use an USB disk too, but
make sure it’s big enough to keep all your data.

    1. Backup of the Virtual machine or container

    2. Backup of the proxmox confguration

    3. Backup of the data on the ZFS pool

    4. ZFS autosnapshot

But now it’s time to start we change to our terminal session.

Backup of the VM’s or container

NFS connection

If you want to use a NFS share, like I do, you must first set it up as share in the NAS. Now you can mount this share in your Proxmox server.


echo "backup:/data/Backup /backup nfs rw" >> /etc/fstab

mkdir /backup

mount -a

mkdir -p /backup/pvetest/{config,dump,borgbackup}


Of course you must use your NAS name and directory, but once done the share is now configured.

USB Disk connection

If you want to use an USB disk, forget the last steps, just connect your disk to the server and check if the system mounted it automatically. In the terminal session :


dmesg

[1034408.992911] scsi host28: usb-storage 2-5:1.0

[1034410.019036] scsi 28:0:0:0: Direct-Access WD My Passport 0820 1012 PQ: 0 ANSI: 6

[1034410.019219] scsi 28:0:0:1: Enclosure WD SES Device 1012 PQ: 0 ANSI: 6

[1034410.020505] sd 28:0:0:0: Attached scsi generic sg10 type 0

[1034410.020609] ses 28:0:0:1: Attached Enclosure device

[1034410.020671] ses 28:0:0:1: Attached scsi generic sg11 type 13

[1034410.021649] sd 28:0:0:0: [sdk] Spinning up disk…

[1034416.080627] ses 28:0:0:1: Wrong diagnostic page; asked for 1 got 8

[1034416.080632] ses 28:0:0:1: Failed to get diagnostic page 0x1

[1034416.080635] ses 28:0:0:1: Failed to bind enclosure -19

[1034417.090624] .ready

[1034417.091202] sd 28:0:0:0: [sdk] 3906963456 512-byte logical blocks: (2.00 TB/1.82 TiB)

[1034417.091508] sd 28:0:0:0: [sdk] Write Protect is off

[1034417.091509] sd 28:0:0:0: [sdk] Mode Sense: 47 00 10 08

[1034417.091800] sd 28:0:0:0: [sdk] No Caching mode page found

[1034417.091803] sd 28:0:0:0: [sdk] Assuming drive cache: write through

[1034417.180210] sdk: sdk1 sdk2

[1034417.181195] sd 28:0:0:0: [sdk] Attached SCSI disk


As you can see here our disk has been recognized as /dev/sdk , we just need to mount it. Create a new directory and mount the disk.


mkdir /backup

root@pve2:~# mount /dev/sdk2 /backup

root@pve2:~# df -h | grep backup

/dev/sdk2 1.9T 404G 1.5T 22% /backup

mkdir -p /backup/pvetest/{config,dump,borgbackup}

Setting up the backup

Change now to the webfrontend:

We must set our backup directory as storage, click in the left panel on Datacenter and then on the right panel on storage – Add and chose our directory (/backup/pvetest/dump).

And chose the type as VZDump backup file

Under node you can let it as is or chose our server name.

Click then on add.

After some time the backup storage is ready for use.

Now go to backup on the right panel.

And chose add

I recommend to set up two different kind of backup, once a week a backkup of all your VM’s or container, and the other days only the most important server.

So we have it, now every VM or Container are backed up once a week.

Now we need to set up a daily backup for our important server.

Now click again on Add. And chose only the server that you really need.


I chose only one of both server.

The last step is to define how many backup you want to keep.


For example here, 4 daily backup and one weekly.

Backup of the Proxmox configuration

Why exert oneself? A program for this backup already exists, many thanks here to

# Author DerDanilo

# Contributors aboutte, xmirakulix, bootsie123, phidauex

For this script.

You can download it under https://github.com/marcosgus/proxmox/blob/master/prox_config_backup.sh

higlight the text with your mouse, copy it, change to the terminal session and type in:


mkdir -p /root/bin

cd /root/bin

vi /root/bin/pve_config_backup.sh


With „esc“, „:i“ change into editor mode, paste the lines copied, and go to DEFAULT_BACK_DIR change it to whatever you defined before e.G. /backup/pvetest/config.

I want to add just some words about the configuration’s directory, /etc/pve is a virtual Filesystem, in reality it’s located in a sqllite3 database ‚/var/lib/pve-cluster/config.db‘ the best way to backup this configuration is to use the internal sqlite3 backuptool, tis means, you can backup the config without stopping your proxmox server. For that we must adapt the script.

Between line 68 and 69 add the following. Just type „esc“ and then „:set nu“, you switch up the line counter, go to the beginning of the line 69 gehen and with „esc“ „:i“ change in the Editor Mode.

Add the following:


_filename9="$_tdir/proxmoxconfigDB.$_now.db"


Then in line 135 add this:


sqlite3 /var/lib/pve-cluster/config.db ".backup ${_filename9}"


Type again „esc“ „:wq“ to save your script.

Make your script executable and add it into the crontab.

In the terminal session type in:


chmod 750 /root/bin/pve_config_backup.sh

crontab -e


If you are prompted for your preferred editor, chose what you want to have, I take here „vi“ (or vim)

esc“ „:i“ and type the following line


45 23 * * * /root/bin/pve_config_bckp.sh > /var/log/pve/bckp.log 2>&1

esc“ „:wq“


The line above means only start the program /root/bin/pve_config_backup.sh at 23:45 hour every days, and write the logs to the file /var/log/pve/bckp.log and if somethng wrong happen wtite it in the same file (2>&1).

I would check the result of the backup periodically, just type in “cat /var/log/pve/bckp.log”, you can see the backup protocoll

Now just run this program and have a look at the results.

For that, copy the line from the crontab and paste it into the bash


crontab -l

45 23 * * * /root/bin/pve_config_bckp.sh > /var/log/pve/bckp.log 2>&1


Highlight from the last star to the end and copy and paste it into the bash, add a blank and „&“ , hit the return key.

The program will start in the background, you can monitor it with the rail program:


/root/bin/pve_config_bckp.sh > /var/log/pve/bckp.log 2>&1 &


tail -300f /var/log/pve/bckp.log


At the end you will see a new file in your backup directory.


/backup/pvetest/config/proxmox_backup_pve2.netwitch.de_2022-12-04.23.45.01.tar.gz


Tar files


/etc/./

/etc/./.pwd.lock

/etc/./adduser.conf

/etc/./aliases

/etc/./alternatives/

/etc/./alternatives/README

/etc/./alternatives/arptables

/etc/./alternatives/arptables-restore

/etc/./alternatives/arptables-save

/etc/./alternatives/awk

/etc/./alternatives/awk.1.gz

/etc/./alternatives/ebtables

/etc/./alternatives/ebtables-restore

/etc/./alternatives/ebtables-save

/etc/./alternatives/editor


The program create first a backup of the most important directories as well as a detailled configuration and hardware list of the server.

Backup of the ZFS Pool VDEV’s with BorgBackup

Until now we have only secured our proxmox environment. The data stored on the ZFS Pool and used as share are still not backed up e.g. film, photos, archive or any kind of personal data. Those must be backed up separatly. Now we will encounter a first issue, those data take a lot of capacity on the disks, at least for me, I have the following vdev’s defined


data/dvr

data/backup

data/bilder

data/musik

data/timemachine

data/serien

data/videos

data/videos-private

data/vm

data/youtube


Those directories are configured as NFS and/or samba share, I will explain why I chose this solution against a full NAS like Freenas for example in a future video.

All those directories use alltogether about 6,3 TB, if you want to back them up you need at least the same capacity in your backup storage. I am using borgbackup, this program is compressing and deduplicating the data, and this spares a lot of capacity. For example here the backup result for my environment.

Original size Compressed size Deduplicated size

All archives: 6.32 TB 6.08 TB 3.21 TB

As you can see the compression is not that good, this comes from the type of data that I have (alltogether 3,2 TB for films, music oder photos. All those data type are already compressed and there is not many capacity that can be spared here. But the deduplication can almost spare half the capacity of my data.

At first we must install borgbackup, then create a small script.


apt install -y python3-pip borgbackup

python3 -m pip install pip –upgrade


Now initialize the backup directory


root@pve:~/bin# borg init -e repokey-blake2 /backup/pvetest/borgbackup

Enter new passphrase:

Enter same passphrase again:

Do you want your passphrase to be displayed for verification? [yN]: N

By default repositories initialized with this version will produce security

errors if written to with an older version (up to and including Borg 1.0.8).

If you want to use these older versions, you can disable the check by running:

borg upgrade –disable-tam /backup/pvetest/borgbackup

See https://borgbackup.readthedocs.io/en/stable/changes.html#pre-1-0-9-manifest-spoofing-vulnerability for details about the security implications.


IMPORTANT: you will need both KEY AND PASSPHRASE to access this repo!

Use "borg key export" to export the key, optionally in printable format.

Write down the passphrase. Store both at safe place(s).

You will be prompted for a password for the encryption of your data on the disk.

Just follow the instruction. And please secure your password and the key, without them it is not possible to retrieve your data from the backup. If ypu don’t want any encryption just initialize with:


borg init -e none /backup/pvetest/borgbackup


Here the different possibilitiesfor the password/encryption mode:


repokey and keyfile

use AES-CTR-256 for encryption and HMAC-SHA256 for authentication in an encrypt-then-MAC (EtM) construction. The chunk ID hash is HMAC-SHA256 as well (with a separate key). These modes are compatible with Borg 1.0.x.

repokey-blake2 and keyfile-blake2

are also authenticated encryption modes, but use BLAKE2b-256 instead of HMAC-SHA256 for authentication. The chunk ID hash is a keyed BLAKE2b-256 hash. These modes are new and not compatible with Borg 1.0.x.

authenticated

mode uses no encryption, but authenticates repository contents through the same HMAC-SHA256 hash as the repokey and keyfile modes (it uses it as the chunk ID hash). The key is stored like repokey. This mode is new and not compatible with Borg 1.0.x.

authenticated-blake2

is like authenticated, but uses the keyed BLAKE2b-256 hash from the other blake2 modes. This mode is new and not compatible with Borg 1.0.x.

none

mode uses no encryption and no authentication. It uses SHA256 as chunk ID hash. This mode is not recommended, you should rather consider using an authenticated or authenticated/encrypted mode. This mode has possible denial-of-service issues when running borg create on contents controlled by an attacker. Use it’s only for new repositories where no encryption is wanted and when compatibility with 1.0.x is important. If compatibility with 1.0.x is not important, use authenticated-blake2 or authenticated instead.


If you want to keep your backups in the net a password and encryption is crucial.

Now our backup script:


#!/bin/bash

##################################

### Beispieldaten:

### logDirectory="/var/log/pve/"

### backupDiscMount="/backup/pvetest/borgbackup/"

### borgBackupDirs=" List of the directories that you want to backup”

##################################

export BORG_PASSPHRASE=’MyS3cr3tP@ssWort‘

export BORG_UNKNOWN_UNENCRYPTED_REPO_ACCESS_IS_OK=yes

export BORG_RELOCATED_REPO_ACCESS_IS_OK=yes

startTime=$(date +%s)

currentDate=$(date –date @"$startTime" +"%Y%m%d_%H%M%S")

currentDateReadable=$(date –date @"$startTime" +"%d.%m.%Y – %H:%M:%S")

logDirectory="/var/log/pve"

logFile="${logDirectory}/${currentDate}.log"

backupDiscMount="/backup/pvetest/borgbackup"

borgRepository="${backupDiscMount}"

borgBackupDirs="/root /data/bilder /data/videos"

if [ ! -d "${logDirectory}" ]

then

mkdir -p "${logDirectory}"

fi

errorecho() { cat <<< "$@" 1>&2; }

exec >>(tee -i "${logFile}")

exec 2>&1

if [ "$(id -u)" != "0" ]

then

errorecho "ERROR: This script must be run as root!"

exit 1

fi

echo -e "\n###### Start des Backups: ${currentDateReadable} ######\n"

echo -e "Daten werden zusammengestellt"

echo -e "\nBackup mit borgbackup"

borg create –stats $borgRepository::"${currentDate}" $borgBackupDirs

echo

borg prune –progress –stats $borgRepository –keep-within=7d –keep-weekly=2

borg compact $borgRepository

endTime=$(date +%s)

endDateReadable=$(date –date @"$endTime" +"%d.%m.%Y – %H:%M:%S")

duration=$((endTime-startTime))

durationSec=$((duration % 60))

durationMin=$(((duration / 60) % 60))

durationHour=$((duration / 3600))

durationReadable=$(printf "%02d Stunden %02d Minuten %02d Sekunden" $durationHour $durationMin $durationSec)

echo -e "\n###### Ende des Backups: ${endDateReadable} (${durationReadable}) ######\n"

echo -e "Plattenbelegung:\n"

df -h ${backupDiscMount}


Just create the new directories if needed and add one line in the crontab.


mkdir -p /var/log/pve /backup/pvetest/borgbackup

echo "15 0 * * * /root/bin/backup.sh > /var/log/pve/backup_data.log 2>&1" >> /var/spool/cron/crontabs/root


Anschließend crontab aufrufen und einfach speichern


crontab -e

esc :wq

crontab -l


For control

Of course you need to adapt the directories to your environment

ZFS automated snapshot

One of the advantage of ZFS is it’s snapshot possibility. Snapshot in ZFS takes almost no space on disk. That means you can follow the changes on your important vdev and in case of an error ( file erased, or overwritten, etc… ) you are able to restore it from snapshot.

We don’t have to do it manually, again some people have done an amazing job and created already such a program. We just have to download It and install a few add-on on the Operating system.

Login as root


apt install -y make gcc


then:


wget https://github.com/zfsonlinux/zfs-auto-snapshot/archive/upstream/1.2.4.tar.gz
tar -xzf 1.2.4.tar.gz
cd zfs-auto-snapshot-upstream-1.2.4
make install

The program is now installed and configured, we just need to define which vdev should be included in the snapshots.

Without any other configuration the program is doing snapshot of every pool and vdev’s automatically, it create a snapshot every 15 min, it’s named frequent snapshot and then it’s going further:

4 frequents ==> hourly

24 Hourly ==> daily

8 daily ==> weekly

31 daily ==> monthly

12 monthly


I would recommand to deactivate the snapshot completely from the ZFS pool itself and configure any vdev as requested, you have to decide if and how many snapshot you need for one vdev, I would avoid autosnapshot on the data/vm vdev, as snapshot for this vdev are being done on the webfrontend of proxmox. For our example here I will create a new vdev data/videos we could set that up as following:


zfs create -oxattr=sa -ocompression=lz4 -oatime=off -orecordsize=16k -omountpoint=/data/videos data/videos

zfs set com.sun:auto-snapshot=false data

zfs set com.sun:auto-snapshot=true data/videos

zfs set com.sun:auto-snapshot:monthly=false data/videos

zfs set com.sun:auto-snapshot:weekly=true,keep=8 data/videos

zfs set com.sun:auto-snapshot:daily=true data/videos

zfs set com.sun:auto-snapshot:hourly=true data/videos

zfs set com.sun:auto-snapshot:frequent=true data/videos


After a few hours you can control the result of this setup :


zfs list -t snapshot


Until now we have took care of our server and of the virtual machine and the data, we must now check if those backup can be used for a restore.

Restore from a complete data loss (Disaster recovery)

Restore proxmox server

>

We have now four different backups

    1. VM and Container backup.sh

    2. Konfiguration’s backup

    3. Borgbackup our ZFS Pool

    4. zfs snapshots

Now the recovery depend on the type of issue that we have

    • Complete data and/or server loss (fire, coffee pot or mug, Football (don’t laugh it happend to me), storm with lightning strike, etc… ==> here we will need Backup 2, 1 and 3

    • Virtual Maschine crash or don’t start anymore after an unlucky update ==> Backup 1

    • File erased or destroyed ==> Backup 1 oder 4 oder eine Mischung aus Backup 1 und 4

First step is to check if the hardware is still working normally, if necessary destroyed part must be replaced, or the server must be rebuild from scratch.

Next step:

Recovery after disaster

Proxmox new installation

  1. Control if the disk in the ZFS Pool are online (Open webfrontend and log in)

  2. After the server reboot we have to stop the services running:

for i in pve-cluster pvedaemon vz qemu-server; do systemctl stop $i ; done

    3. If the disks were not part of the disaster you should be able to bring your pool online without dataloss. Here the two possibilities.

      1. ZFS Pool has been recognized during boot and is mounted ==> Nothing to do

      2. ZFS is not here ==> You can still try to import the pool

First with:

zpool status -v

no pools available

ZFS Pool is was not recognized by the system, you can try to do it manually

zpool import

pool: data

id: 14024415627059556521

state: ONLINE

action: The pool can be imported using its name or numeric identifier.

config:


data ONLINE

raidz1-0 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINEProxmox new installation


Control if the disk in the ZFS Pool are online (Open webfrontend and log in)


After the server reboot we have to stop the services running:


for i in pve-cluster pvedaemon vz qemu-server; do systemctl stop $i ; done


If the disks were not part of the disaster you should be able to bring your pool online without dataloss. Here the two possibilities.


ZFS Pool has been recognized during boot and is mounted ==> Nothing to do


ZFS is not here ==> You can still try to import the pool


First with:


zpool status -v


no pools available


ZFS Pool is was not recognized by the system, you can try to do it manually


zpool import


pool: data

id: 14024415627059556521

state: ONLINE

action: The pool can be imported using its name or numeric identifier.


config:


data ONLINE

raidz1-0 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE


And now just import it:


zpool import data


root@pve:~# zpool status

pool: data

state: ONLINE

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022

config:


NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0

errors: No known data errors


In this case everything is good, you just have to restore the configuration. If only one disk is missing, after the import you will get something like that:


zpool status -v


pool: data

state: DEGRADED

status: One or more devices could not be used because the label is missing or

invalid. Sufficient replicas exist for the pool to continue

functioning in a degraded state.

action: Replace the device using ‚zpool replace‘.


zpool import data


root@pve:~# zpool status


pool: data

state: ONLINE

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022

config:


NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


In this case everything is good, you just have to restore the configuration. If only one disk is missing, after the import you will get something like that:


zpool status -v


pool: data

state: DEGRADED

status: One or more devices could not be used because the label is missing or

invalid. Sufficient replicas exist for the pool to continue

functioning in a degraded state.

action: Replace the device using ‚zpool replace‘.

see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022


config:


NAME STATE READ WRITE CKSUM

data DEGRADED 0 0 0


zpool import data


root@pve:~# zpool status

pool: data


state: ONLINE

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022

config:



NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


In this case everything is good, you just have to restore the configuration. If only one disk is missing, after the import you will get something like that:


zpool status -v


pool: data

state: DEGRADED

status: One or more devices could not be used because the label is missing or

invalid. Sufficient replicas exist for the pool to continue functioning in a degraded state.


action: Replace the device using ‚zpool replace‘.

raidz1-0 DEGRADED 0 0 0


7723795709861257045 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-part1

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


Change the faulted disk and after that repair the ZFS pool


zpool replace -f data 7723795709861257045 /dev/sdd


root@pve:~# zpool status -v


pool: data

state: DEGRADED

scan: resilvered 213M in 00:00:04 with 0 errors on Sun Dec 11 14:25:56 2022


config:


NAME STATE READ WRITE CKSUM

data DEGRADED 0 0 0

raidz1-0 DEGRADED 0 0 0

replacing-0 DEGRADED 0 0 0

7723795709861257045 FAULTED 0 0 0 was /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-part1

sdd ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


As we can see here, the disk is being resilvered, as soon as the this is finished you will have a healthy poool again.


zpool status -v


pool: data

state: ONLINE

scan: scrub repaired 0B in 00:00:03 with 0 errors on Sun Dec 11 14:26:09 2022


config:


NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

sdd ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


If you lost more than one disk ( for a RAIDZ1) then you must recreate the ZFS pool completely with all vdev’s, you can have some help in the last backup logs, once any vdev have been recreated we can go to the next step


And now just import it:

zpool import data

root@pve:~# zpool status

pool: data

state: ONLINE

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022

config:


NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi4 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


In this case everything is good, you just have to restore the configuration. If only one disk is missing, after the import you will get something like that:

zpool status -v

pool: data

state: DEGRADED

status: One or more devices could not be used because the label is missing or

invalid. Sufficient replicas exist for the pool to continue

functioning in a degraded state.

action: Replace the device using ‚zpool replace‘.

see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022

config:


NAME STATE READ WRITE CKSUM

data DEGRADED 0 0 0

raidz1-0 DEGRADED 0 0 0

7723795709861257045 UNAVAIL 0 0 0 was /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-part1

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


Change the faulted disk and after that repair the ZFS pool



zpool replace -f data 7723795709861257045 /dev/sdd

root@pve:~# zpool status -v

pool: data

state: DEGRADED

scan: resilvered 213M in 00:00:04 with 0 errors on Sun Dec 11 14:25:56 2022

config:


NAME STATE READ WRITE CKSUM

data DEGRADED 0 0 0

raidz1-0 DEGRADED 0 0 0

replacing-0 DEGRADED 0 0 0

7723795709861257045 FAULTED 0 0 0 was /dev/disk/by-id/scsi-0QEMU_QEMU_HARDDISK_drive-scsi4-part1

sdd ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


As we can see here, the disk is being resilvered, as soon as the this is finished you will have a healthy poool again.

zpool status -v

pool: data

state: ONLINE

scan: scrub repaired 0B in 00:00:03 with 0 errors on Sun Dec 11 14:26:09 2022

config:


NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

sdd ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi3 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi2 ONLINE 0 0 0

scsi-0QEMU_QEMU_HARDDISK_drive-scsi1 ONLINE 0 0 0


errors: No known data errors


If you lost more than one disk ( for a RAIDZ1) then you must recreate the ZFS pool completely with all vdev’s, you can have some help in the last backup logs, once any vdev have been recreated we can go to the next step



Restore of the data in the ZFS Pool

If your ZFS pool was destroyed too , it must be restored, if you need some help for the ZFS and vdev configuration you can have a look at the backup config file. Copy the last backup file from /backup/pvetest/config to /root

mkdir -p /root/restore

cp /backup/pvetest/config/proxmox_backup_pve.netwitch.de_2022-12-11.14.53.47.tar.gz /root/restore

cd /root/restore

tar -xvzf proxmox_backup_pve.netwitch.de_2022-12-11.14.53.47.tar.gz

tail -100 var/tmp/proxmox-P6ewPaJB/proxmoxreport.2022-12-11.19.18.14.txt

# zpool status

pool: data

state: ONLINE

scan: scrub repaired 0B in 00:00:04 with 0 errors on Sun Dec 11 00:24:05 2022

config:

NAME STATE READ WRITE CKSUM

data ONLINE 0 0 0

raidz1-0 ONLINE 0 0 0

sdd ONLINE 0 0 0

sde ONLINE 0 0 0

sdf ONLINE 0 0 0

sdg ONLINE 0 0 0

errors: No known data errors

 

# zpool list -v

NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT

data 79.5G 869M 78.7G – – 0% 1% 1.00x ONLINE –

raidz1-0 79.5G 869M 78.7G – – 0% 1.06% – ONLINE

sdd 20.0G – – – – – – – ONLINE

sde 20.0G – – – – – – – ONLINE

sdf 20.0G – – – – – – – ONLINE

sdg 20.0G – – – – – – – ONLINE

 

# zfs list

NAME USED AVAIL REFER MOUNTPOINT

data 650M 57.0G 32.9K /data

data/vm 649M 57.0G 34.4K /data/vm

data/vm/subvol-100-disk-0 324M 11.7G 324M /data/vm/subvol-100-disk-0

data/vm/subvol-110-disk-0 324M 11.7G 324M /data/vm/subvol-110-disk-0

 

You have here all necessary informations about your pool and vdev’s and can create those (Howto is under ZFS configuration in the guide )

As soon as the pool has been recreated it should appear in the web frontend, if not you will have to restore this point too.

Now we still have to restore our data from the backup. First let see which backup we can use:

 

root@pve:~/bin# borg list /backup/pvetest/borgbackup/

Enter passphrase for key /backup/pvetest/borgbackup:

20221211_210728 Sun, 2022-12-11 21:07:29 [0c6720223308ffefc883a4ea5408157819cd69819f68743cc94e73708b7d2c00]

We can now mount this backup

root@pve:~/bin# mkdir /restore

root@pve:~/bin# borg mount /backup/pvetest/borgbackup::20221211_210728 /restore

Enter passphrase for key /backup/pvetest/borgbackup:

 

root@pve:~/bin# cd /restore

root@pve:/restore# ll

total 0

drwxr-xr-x 1 root root 0 Dec 11 21:14 data

drwx—— 1 root root 0 Dec 11 20:56 root

 

Our data are under /restore/data the root directory is only here if some script should needed.

Jetzt nur noch die daten zurückspielen

 

rsync -av /restore/data/videos/ /data/videos/

 

You must do the same for each subdirectory under /restore/data , this restore can take a long time, many hours up to days, depending on the disk occupancy. When this part is finished we can start restoring the configuration.

Restore the configuration

Under
/backup/pvetest/config copy the last backup to /root/bin

z.B.:

cp
/backup/pvetest/config/proxmox_backup_pve.netwitch.de_2022-12-11.14.53.47.tar.gz
/root/bin/

 

cd
/root/bin

 

Die
Datei pve_config_restore.sh erstellen

 

vi
pve_config_restore.sh

 

„esc“
„:i“

 

#!/bin/bash

# Version 0.2.3

# Date 04.18.2022

# Author razem-io

# Contributors

# Very basic restore script

# based on https://github.com/DerDanilo/proxmox-stuff/issues/5

# Restores backup from pve_config_backup.sh

 

set -e

if [[ $# -eq 0 ]] ; then

echo „Argument missing -> restore.sh proxmox_backup_proxmoxhostname_2017-12-02.15.48.10.tar.gz“

exit 0

fi

for i in pve-cluster pvedaemon vz qemu-server; do systemctl stop ${i} ;done || true

FOLDER_1=“./$1_1″

FOLDER_2=“./$1_2″

mkdir „$FOLDER_1“

mkdir „$FOLDER_2“

tar -zxvf $1 -C „$FOLDER_1“

find „$FOLDER_1“ -name „*tar“ -exec tar -xf ‚{}‘ -C „$FOLDER_2“ \;

KONFIGDB=$(find „$FOLDER_1“ -name „*db“ -print)

for i in $(ls $FOLDER_2 | grep -v root);do rsync -av $FOLDER_2/${i}/ /${i}/ ;done

echo „restoring sqlite3 Database“

sqlite3 /var/lib/pve-cluster/config.db „.restore ${KONFIGDB}“

echo „Cleaning directories!“

rm -r „$FOLDER_1“ „$FOLDER_2“ || true

find /etc/pve -delete

read -p „Restore complete. Hit ‚Enter‘ to reboot or CTRL+C to cancel.“

shutdown -r now

 

„esc“ „:wq“

 

chmod
750 pve_config_restore.sh

./pve_config_restore.sh
proxmox_backup_pve.netwitch.de_2022-12-11.14.53.47.tar.gz

 

And
last reboot the server if the script don’t have done it already

 

init 6

 

 



Restore of the container und VM’s

Now we have to check if anything is running again, if not we have to restore our Vm’s and/or containers

Open the webfrontend in a browser e.G. https://pve.netwitch.de:8006

If anything is there and working, the restore is finished! Time to take a coffee!

If not, we still have some work to do. On the left panel click on the backup storage, if it’s missing you will have to recreate it ( siehe further above under NFS Connection).

On the right panel click on backup and search for the last backup for your VM’sd or container, highlight it and click above on restore.

Once you are receiving the message Task OK then the restore is finished and you can close this windows.

Your server is agin online and you can start it. Now you must restore all the server needed until everything is again here.

It’s all folk, but don’t be mistaken this procedure is very time-consuming if the ZFS pool has to be restored.

High availibility

Another solution is to set up a second proxmox server, configure the storage as high available with CephFS, in this case you are able to moive your VM’s and container from one PVE server to the other, and the crash of one PVE server would not impact your work that much, as you can take the VM’s and container up on the second server, but so a setup is a totally other approach of Proxmox and is not part of our Backup theme here. It could be the theme for future video and/or article.

Schreibe einen Kommentar