Linux to Windows backup using Cygwin and Rsync

Linux to Windows backup using Cygwin and Rsync

This procedure may be helpful for backups of Linux VMs on the cloud. Some (most) service providers make it notoriously difficult to export VM images or snapshots out of the cloud.

While it is still possible to perform an image backup using Clonezilla, again, not all service providers allow boot on an ISO for a given VM. In these cases, a helper volume should be added to the VM and have Clonezilla installed on it, and selected as the boot device. The original boot volume would then be backed-up to an image locally on the helper volume and exported out of the cloud after the process completes. Although that is the theory, I have not tested this specific process myself. Keep in mind that some providers VM instances only support one volume attached. Also, this process is a one-shot operation, and is hardly automated, and creates downtime.

There are other block level device time consistent backup strategies you can try though if you use LVM and have space on the LVM group, it is possible to snapshot your volume, mount the snapshot, and have rsync transfer it over the network. Keep in mind that if the rsync process is initiated on the backup system rather than on the backed-up system, you’ll better have to use ssh or other remote command tools to perform snapshot creation and mount and then rsync, have a way to check for errors returned by these commands and then only initate rsync on the backup system. Then perform a post-backup remote execution to unmount the snapshot and destroy the snapshot. This whole process is a bit similar to the Windows VSS snapshot operation.

For more information about this process, check :

LVM snapshot Backup and restore on linux

If you don’t use LVM then you’ll have to resort to backups that are not time consistent (not frozen in time), but it is better than nothing. That is the object of this tutorial.

Setting up cygwin on Windows.

Download cygsetup and perform a full or a minimal install. For a minimal install, you’ll need SSH and Rsync as well as it’s dependencies.

Now you’ll have to choose between using rsync through a SSH encrypted channel, or through a rsync daemon running on the remote host to be backed up.

Using rsync through SSH for a full system backup requires a root login through SSH, and automating rsync with SSH will require a way to supply credentials automatically, unless using a public/private key non interactive (passphrase less) authentication scheme.

In the case of SSH plain password authentication, supplying it to rsync can be done through sshpass, which exists as a Cygwin package, but I have not tested it myself in conjunction with rsync

http://www.cygwin.com/packages/summary/sshpass.html

https://stackoverflow.com/questions/3299951/how-to-pass-password-automatically-for-rsync-ssh-command

However, allowing SSH password root authentication plus storing its password as cleartext in a file for sshpass to use it is a huge security risk.

At least, with a passphrase less public/private key pair, the problem amounts to securing the private key well on the filesystem. It will still be acessible (read only) in the user account context that runs the rsync/ssh script on Windows.

For all these reasons, I find it preferable to use the rsync daemon on the remote system. Which allows to use a backup account instead of root.

The downsides however are that you need to open the rsync TCP port on the remote system and configure your firewalls in the connection path accordingly; and also rsync daemon does not encrypt traffic. If it is an issue, then use a VPN or an IpSec tunnel.

Setting up the rsync daemon on linux Debian

apt-get install rsync

Edit the rsync daemon configuration file.

vi /etc/rsyncd.conf

Here is a sample of the rsyncd.conf

uid=root
gid=root

[share_system_backup]
	path = /
	comment = system root
	read only = true
	auth users = backup
	secrets file = /etc/rsyncd.secrets
	hosts allow = <ip_of_backup_system>

As per the rsync manpage, rsyncd runs as root, and uid and gid parameters, which can be global to all rsync shares or share specific, specify the impersonated context of the rsync access to the filesystem.

Since we’ll perform a system wide backup, we’ll use root.

auth users specify the rsync users authorized to access the share. These users are rsync specific, not system users.

read only is used since no writes will be performed on the backed up system, unless you want rsync to upload some state file on the remote system after backup, as we won’t be using SSH, that is a trick way to signal something to the backed up system, without any remote execution.

hosts_allow is useful for a cloud VM that does not have firewalling options provided by the service provider.

The user login password pair is specified in /etc/rsyncd.secrets.

vi /etc/rsyncd.secrets

backup:346qsfsgSAzet

Use a strong password.

Next start the rsync daemon, check it runs and check its socket is listening on TCP port 873.

rsync --daemon

ps-auwx | grep rsync && netstat -nlp | grep rsync

Then we’ll make rsync launch at system boot.

vi /etc/default/rsync

Change RSYNC_ENABLE to true to start the daemon at system startup through init.d

Rsync configuration on Windows

We’ll start by setting up the windows batch files that will prepare the cygwin environment and run rsync. Change <IP> with your remote system to be backed up IP or DNS name. This script assumes standard port 873

The default file is named CWRSYNC.CMD and should reside at the root of the cygwin64 base folder.

@ECHO OFF
REM *****************************************************************
REM
REM CWRSYNC.CMD - Batch file template to start your rsync command (s).
REM
REM *****************************************************************

REM Make environment variable changes local to this batch file
SETLOCAL

REM Specify where to find rsync and related files
REM Default value is the directory of this batch file
SET CWRSYNCHOME=%~dp0

REM Create a home directory for .ssh 
IF NOT EXIST %CWRSYNCHOME%\home\%USERNAME%\.ssh MKDIR %CWRSYNCHOME%\home\%USERNAME%\.ssh

REM Make cwRsync home as a part of system PATH to find required DLLs
SET CWOLDPATH=%PATH%
SET PATH=%CWRSYNCHOME%\bin;%PATH%

REM Windows paths may contain a colon (:) as a part of drive designation and 
REM backslashes (example c:\, g:\). However, in rsync syntax, a colon in a 
REM path means searching for a remote host. Solution: use absolute path 'a la unix', 
REM replace backslashes (\) with slashes (/) and put -/cygdrive/- in front of the 
REM drive letter:
REM 
REM Example : C:\WORK\* --> /cygdrive/c/work/*
REM 
REM Example 1 - rsync recursively to a unix server with an openssh server :
REM
REM       rsync -r /cygdrive/c/work/ remotehost:/home/user/work/
REM
REM Example 2 - Local rsync recursively 
REM
REM       rsync -r /cygdrive/c/work/ /cygdrive/d/work/doc/
REM
REM Example 3 - rsync to an rsync server recursively :
REM    (Double colons?? YES!!)
REM
REM       rsync -r /cygdrive/c/doc/ remotehost::module/doc
REM
REM Rsync is a very powerful tool. Please look at documentation for other options. 
REM

REM ** CUSTOMIZE ** Enter your rsync command(s) here

echo "start" >> c:\scripts\rsync_sys.log
date /t >> c:\scripts\rsync_sys.log
time /t >> c:\scripts\rsync_sys.log

rsync --no-perms --itemize-changes --password-file=rsync_p -lrvcD --exclude={"/dev/*","/proc/*","/sys/*","/tmp/*","/run/*","/mnt/*","/media/*","/lost+found","/root/*"} backup@<IP>::share_system_backup/ /cygdrive/d/BACKUPS_FROM_CLOUD/SYSTEM >> c:\scripts\rsync_sys.log

date /t >> c:\scripts\rsync_sys.log
time /t >> c:\scripts\rsync_sys.log
echo "stop" >> c:\scripts\rsync_sys.log

You’ll need to add a file named rsync_p in my example that contains the password specified for the backup user, matching the password defined on the rsync daemon host. This allows non interactive execution of rsync.

Place this file at the cygwin64 base folder level.

Secure that file well at the NTFS level.

itemize-changes will log in the rsync_sys.log the file operations performed on the system. It is useful for troubleshooting.

http://www.staroceans.org/e-book/understanding-the-output-of-rsync-itemize-changes.html

The -lrvc flags tells rsync to copy symbolic link information (absolute and relative). You can test that is the case by using ls -ltra through a cygwin shell, recurse directories, be verbose, and check for changes based on checksum instead of file timestamps, which could be useful if there are timestamp discrepancies because of the cygwin environment. It is worth testing it’s effect with a test run.

Also, I excluded the standard list of directories recommended to be excluded from a system backup for Debian systems using the –exclude option.

Permissions / ACL issues

Note that I use –no-perms for rsync. That is one caveat with rsync running on cygwin, since it uses POSIX to Windows ACL translation, copying perms COULD render the files unreadable by other processes on the Windows system if permissions from the source system are preserved, On the other hand, preserving source permissions could make baremetal recovery possible. It is worth experimenting with this option.

A way to circumvent this problem when using –no-perms is to back up permissions separately on the linux system using getfacl and setfacl for restore operations. keep in mind that getfacl ouput is substantial in terms of size and getfacl does not support directory exclusions. It may involve a bit of scripting.

Example of getfacl script (backupacl.sh)

cd /home/backup
getfacl -RL /bin > filesystem_acls_bin.dat
getfacl -RL /boot > filesystem_acls_boot.dat
getfacl -RL /etc > filesystem_acls_etc.dat
getfacl -RL /home > filesystem_acls_home.dat
getfacl -RL /lib > filesystem_acls_lib.dat
getfacl -RL /lib64 > filesystem_acls_lib64.dat
getfacl -RL /media > filesystem_acls_media.dat
getfacl -RL /mnt > filesystem_acls_mnt.dat
getfacl -RL /opt > filesystem_acls_opt.dat
getfacl -L /root > filesystem_acls_root.dat
getfacl -RL /sbin > filesystem_acls_sbin.dat
getfacl -RL /snap > filesystem_acls_snap.dat
getfacl -RL /srv > filesystem_acls_srv.dat
getfacl -RL /tmp > filesystem_acls_tmp.dat
getfacl -RL /usr > filesystem_acls_usr.dat
getfacl -RL /var > filesystem_acls_var.dat

tar -czvf backup_acls.tar.gz filesystem_acls_*
rm -f filesystem_acls_*

You can use xargs and a $ placeholder plus a list of folders to backup permissions from to make this script a one-liner.

the backup_acls.tar.gz will be backed up by rsync when it runs.

In this example, ACL backups and the backup script are stored in /home/backup. As said before, backup is not necessarily a linux user since rsync maintains its own authentication database, but I use this directory for convenience.

In this script backup of /root ACLs is not recursive as a dirty way to remove the .wine directory that contains symbolic links to the whole file system, and thus would make getfacl to follow them and spend a huge useless time (and filespace) for directories such as /dev, /proc /sys. So it’s possible for you to add the recurse flag if this problem does not affect your system.

Run the batch for tests and schedule it by adding it to /etc/cron.daily/ for instance. It should run in the root context.

A note about restores and symbolic links

the rsync -l options copies symbolic links, which means thar you have to use rsync too for restore operations, Don’t expect symbolic links to magically reappear if you use WinSCP or another tool to restore a folder.

More on this subject :

https://www.baeldung.com/linux/rsync-copy-symlinks

Testing the backup.

Edit the cwrsync.cmd to add the dry-run option, run the cwrsync.cmd through a cmd shell and examine the rsync_sys.log file, If everything seems OK, then you can remove the dry-run option, and run it again. The first run will take a long time, use the start and stop markers in the logfile to check how long it takes once finished. rsync will tell you the speedup due to the delta algorithm, it should increase on subsequent runs.

Scheduling the backup.

Use Windows task scheduler to run the cwrsync.cmd script. In my case, I use a privileged account. It is advised to test with the least amount of privileges for cygwin to be happy and for filesystem operations to succeed.

It is advised the turn on scheduled tasks history. And very important, make sure that the “start in” directory for the cwrsync.cmd action is set to the base cygwin64 folder.

Test the backup run using a “run once” trigger at t+1 minute in addition to the recurrent trigger.

A final note.

This backup strategy will ensure that you have an updated state of your remote system on the Windows system. It is similar to an incremental backup that patches the destination directory. It does not store any history of files besides the last backup operation.

I don’t use the –delete option that propagates deletions of files and folders from the source to the destination, has it can be very impactful for a disaster recovery use where there is no file versioning history. Note however, that this will bloat the backed up structure on the destination and make recovery efforts more complicated in the long term.

The best option for a backup if you use LVM, albeit using more space on the destination is to use cygwin + rsync to backup a tar gz file of a LVM snapshot directory structure instead of the directory tree of “/”. If you go this way though, it is advised, as already stated above, to perform the snapshot operations synchronously from the backup system through the cwrsync.cmd using SSH, before the rsync operations. That probably means using a passphrase less public/private key pair for ssh to run these commands interactively. Don’t forget to unmount and remove the snapshot after rsync is done.

R.Verissimo

Leave a Reply