The first project involves developing a small Python script for backing up files on NFS of a Linux based data science / trading platform. In case of good performance and possible physical proximity to Madrid or Budapest, there will be followup projects which go deeper in our infrastructure. Other than Python, knowledge of network architectures, VPN, Cisco, AWS, Atlassian tools is a plus.
This first task is about implementing a cron/rsync based backup framework for given directories (including a data folder and user homes). The backup should make it easy to access daily snapshots of the last 31 days, then weekly snapshots for the last 3 months, monthly snapshots for the last year, and finally yearly snapshots.
- The backup should be incremental, that is the same files in different snapshots should not be stored multiple times.
- The backup should be possible to browse on file/directory level directly on the file system without using additional tools.
- Restoring any subset of the data should be straightforward by copying.
The solution uses rsync and its linking support (see the --link-dest option in man rsync) to implement daily snapshots. Then we can use a Python script to erase old snapshots, only keeping the relevant ones (keeping recent daily snapshots, recent Mondays, recent 1st of months and all 1st snapshots of the year).
A good resource on using rsync for this purpose:
- [url removed, login to view]
The resulted tool will probably be open-sourced.