Introduction to backups for artistsby Dawid Michalczyk
Updated: 30 August 2013
Summary: An introduction to taking proper backups with emphasis on the data commonly used by artists. Choose the right backup type and the tools needed to preserve your valuable data.
A backup, in computer lingo, refers to making a copy of important data for the purpose of data recovery. The word "data" refers to anything stored on a computer system: images, programs, documents, videos, etc. Should the important data get damaged or lost, a properly made backup will restore it all. Taking backups of important data can prevent loss of valuable work and the time needed to recreate it.
In this article we'll take a look at common backup types and strategies, data compression, and common backup media types. A real life backup scenario will illustrate my own backup procedures. The article will end with general backup tips.
Common backup types
A full-backup consist of making a copy of all important data. When you copy a folder with important files, from say a hard drive to a CD, you actually make a full-backup of those files. Due to simplicity, this approach is the most reliable of all backup types. Its main advantage is ease of backup creation and restoration. The main disadvantage is that each backup will use as much space as the important data. If the data is large, the backup process can be very resource intensive in terms of time, backup space requirements, and the processing power needed to carry out. Imagine the time needed to full-backup a digital library consisting of thousands of movies. Such operation can take days.
An incremental-backup works differently in that it backs up only the modified, or newly added files since the last backup. When using this method, a full backup is created first and then incremental backups are run on regular basis. For large amounts of data this method is often the only practical way to backup. It requires less space than taking regular full backups and is less resource intensive to run. On the other hand, contrary to full backups, incremental backups need dedicated backup software to keep track of which files to backup.
Compressing the backup data is a popular option. Such practice lowers the amount of space needed on the backup media. Although compression adds an additional layer of complexity, it can be a good (if relied on wisely) and sometimes necessary solution.
Essential backup strategies
Backup should be taken on a regular basis. The more frequently the data changes the more often it should be backed-up. For example, some of my most frequently updated files (website files, source code, notes, etc.) are backed-up daily. Files that are less frequently updated are backed-up monthly.
Backup should be automatic. Except for the initial configuration of the backup program and the occasional supervision, the whole backup process should be automatic and completely transparent. That is, the backup should run by itself without causing any attention unless necessary.
Backup should be stored in a safe remote location. Should the location of the important data get damaged, destroyed, or exposed to theft - a remotely stored backup becomes invaluable. How remote? Disasters like fire, flood, tornado, earthquake, etc., can cause widespread damage. Ideally a backup should be stored in a far away enough, minimal risk location.
Backup should rely on well established hardware and software technologies. Such technologies are typically in widespread use - thus cheaper and easier to troubleshoot or get help in the event of failure. As the established technologies become gradually replaced by new and better ones, so should the backup media and hardware, and, if used, the software to re/store the data. There is no guarantee that the common backup media of today, like CD or DVD, will be usable in ten years. The same is true for software. Thus, a good data preservation strategy should include continual migration of the backup data to mature and well established technologies of the time.
A bit about data compression
Many compression formats exist. Each format uses some sort of compression method called an algorithm. There are two types of data compression algorithms: "lossy" and "lossless". Lossless compression reduce the data size without modifying its content. Lossy compression modifies the data content to make it even smaller than lossless compression.
Unfortunately, due to the nature of lossy compression, JPG, MP3 or any other lossy format degrade the original data to some extent. In other words, saving an image or music in a lossy file format will make it different than the original. Usually the difference, called compression artifacts, is so small that most of us don't see or hear it.
For the above reasons, lossy compression should never be used when saving important master data. Only lossless compression is suitable for that. PNG and TIFF are examples of image file formats that support lossless compression. Such formats are ideal for storing hi-resolution master images.
Finally, compression takes time and normally uses all available processing power. Generally, the better the compression the slower it is. Some compression algorithms are extremely good at compressing but also extremely slow. For backup purposes, one should evaluate common compression formats and set for the most suitable one.
Consider your needs
Note the difference between "built-in" image compression, done every time you save an image in a format that supports it, and compressing the backup data - applied to all backup data regardless of what it is.
What backup compression to use, and if to use it at all, depends on the type of backup data. Generally, text files (TXT, HTML, XML, etc) can be compressed the most of all file types. Images that have been compressed with their own algorithms (PNG, JPG, TIFF, etc) can't later be compressed much if at all. Images which don't have own compression (BMP, TGA, etc) can often be compressed quite a bit, though this depends on the actual image data.
There are a few other things to consider when compressing backup data. What compression program to use and how to compress the files.
ZIP is the most commonly used compression format today - it's fast and compresses well. Its been around for a long time and is universally available. But there are other, less known, good alternatives. For example, 7ZIP, RAR, and BZIP2 compress significantly better than ZIP and are only slightly slower.
Finally, how to compress backups. Basically one can either create a compressed archive of many files, or compress each file individually. The main disadvantage to creating a compressed archive is the possibility of loosing all files in the archive if the archive gets corrupted and can not be recovered. On the other hand, if files are compressed individually one looses only one file - should it get corrupted and be unrecoverable. Additionally, since a compressed file uses less space than uncompressed, it's less likely to get corrupted. Thus it's safer to compress files individually.
A lot of space can be saved thanks to compression.
I took one of my
images and saved it in BMP, TIFF, PNG and JPG formats.
I then compressed those files with a few general purpose compressors.
All lossless compression was done with maximum compression settings. 
Since JPG is a lossy format it is only included for the sake of comparison.
The Book.txt is Sun Tzu's
The Art of War.
Sizes are in bytes. The percentage indicates how much the compressed size is out of the initial size. The smaller the better.
The compression times vary somewhat but not too much to be impractical. PNG is a clear winner among images. It uses about 58% less space than BMP! Notice that only one of the general purpose compression tools, 7ZIP, further compressed (slightly) the already compressed PNG file. The book file was compressed down to about 26-38% of its original size, which is typical for text compression.
What backup media to use
A combination of different media may often be the ideal solution. For example, some of my own backup practices include using an external hard drive to mirror (update) certain parts of my computer hard drives. Twice a year I burn all important data on several DVDs.
I recommend spending some time investigating the most suitable media and the hardware to operate it. High quality products will minimize the possibility of backup failure.
The necessity of verifying backups
I wrote a script specifically for the purpose of backup verification. If you use Linux you may find it useful.
A real life backup scenario
I backup daily, monthly and twice a year. Once a day, the files which are frequently updated (my notes, work in progress images, source code, website files, email, etc.) are backed up to another hard drive. This happens during the boot process and takes about a minute. Once a month I backup to a CD which also includes less frequently updated files. A copy of that CD is stored in a remote location. Twice a year I take full backup and store it on several DVDs at a friend's house. If I work on something especially important, I store it daily on a CD/DVD or a USB mem-stick. My most critical data is also regularly encrypted and stored on a very remote internet host. I wrote a script to run all these backups automatically. With the exception of CD/DVD storage, no manual work is involved.
As you can see, a custom backup solution can be quite sophisticated yet simple to carry out. It can involve a combination of different media and backup procedures to optimally satisfy ones needs.
It's best to avoid products which rely on proprietary or closed solutions. For example, a backup software may store the backup data in an unknown format only supported by this particular backup software. Avoid that. If the company goes out of business and the backup, or backup software, breaks, your backup data may be lost forever. Look for products that rely on well known, mature, and ideally open technologies. For example, PNG is an open format for storing image data. What this means is that the specification, or blueprint, for that format is publicly available for anyone to use it. This increases compatibility and reduces reliance on any specific vendor or product.
Most artists important data consists mainly of images and 3d files. To save space rely on PNG, TIFF or JPG for bitmap image formats. Vector images and 3d files can be compressed individually if needed. A basic backup software that simply copies specified files or directories to the backup media may be all that is needed. It's best to make two sets of the backup data and store each at different location. One close to home, like a friends place, or a bank box and the other far away.
Setting up a proper backup strategy may initially require a significant amount of time and cost money. There is a lot to research and consider. In the end however, a good backup procedure will prove an exceptionally valuable investment. As you read this, your screen could go blank due to a hard drive crash. All your valuable data - years of work, reference images, business documents, photo albums, 3d files, email, etc., - could be lost forever. Unless you are prepared and have a backup.
bzip2: -9 (used on all test files)
7zip: -m0=ppmd:o=4 (used on Colony.bmp)
7zip: -m0=lzma:a=1:d=0:lc=8:LP0:PB0:mf=bt2 (used on Colony.tiff)
7zip: -m0=lzma:a=1:d=0:lc=8:LP0:PB0:mf=bt2 (used on Colony.png)
7zip: -m0=lzma:a=1:d=0:lc=8:LP0:PB0:mf=bt2 (used on Colony.jpg)
7zip: -m0=ppmd:o=20:mem=26 (used on Book.txt)
Dawid Michalczyk is a freelance illustrator and an artist. To see examples of his artwork and writings visit his website at http://www.art.eonworks.com
Copyright © 2006 Dawid Michalczyk. All Rights Reserved. This content may be copied in full, with copyright, contact, creation, information and links intact, without specific permission, when used only in a not-for-profit format.
Art and Illustration Studio.
The most popular images during the past 365 days:
1. Stellar vista
2. Ancient giants
3. Planet scape
4. Edge of perception
5. Future bandits
6. Epsilon hunter
7. Endless opposites
8. Cosmic vista
9. Singular ambience
10. Desert outpost