Summary:
An introduction to taking backups with
emphasis on the data commonly used by artists.
A backup, in computer lingo, refers to making a copy of important data
for the purpose of data recovery.
The word "data" refers to anything stored on a computer system: images, programs, documents,
videos, etc.
Should the important data get damaged or lost, a properly made backup will restore it all.
Common backup types
|
A backup does not need to rely on dedicated software. Making a copy of a file is a basic form of backup.
|
|
The best backup methods rely on simple and time proven concepts.
The simpler the procedure, the more likely it is to work correctly.
New or unnecessary technologies are best avoided till proven reliable and necessary.
A full-backup consist of making a copy of all important data.
When you copy a folder with important files, from say a hard drive to a USB memstick,
you actually make a full-backup of those files. Due to simplicity, this approach
is the most reliable of all backup types. Its main advantage is the ease
of backup creation and restoration - since no spacial backup software is needed.
The main disadvantage is that each
backup will use as much space as the important data that was copied. If the data is
large, the backup process can be very resource intensive in terms of
time, backup space requirements, and the processing power needed to carry it out.
Imagine the time needed
to do a full-backup of a digital library consisting of thousands of HD movies. Such operation
can take days.
An incremental-backup works differently in that it backs up only the
modified and newly added files since the last backup was done. When using this method, a full
backup is created first and then incremental backups are run on regular
basis. For large amounts of data, incremental-backup is often the only practical
way to do backups. It requires much less space than taking full backups
and is less resource intensive to run. On the other hand, contrary to
full backups, incremental backups need dedicated backup software to keep
track of which files to backup.
Compressing the backup data is a popular option. Such practice reduces the amount
of space needed on the backup media. Although compression adds an additional
layer of complexity, it can be a good (if relied on wisely) and sometimes a
necessary solution.
Essential backup strategies
Regardless of the backup type and data, the following backup strategies
should always be followed:
- backup should be taken on a regular basis
- backup should be automatic and need as little human supervision as possible
- backup should be stored in a safe remote location
- backup should rely on well established hardware and software technologies
Backup should be taken on a regular basis. The more frequently the data
changes the more often it should be backed-up.
Backup should be automatic. Except for the initial configuration of the
backup program and the occasional supervision, the whole backup process
should be automatic and completely transparent. That is, the backup
should run by itself without attracting any attention unless necessary.
Backup should be stored in a safe remote location. Should the location
of the important data get damaged, destroyed, or exposed to theft - a
remotely stored backup becomes invaluable. How remote? Disasters like fire,
flood, tornado, earthquake, etc., can cause widespread damage. Ideally a backup
should be stored in a far away enough, minimal risk location.
Backup should rely on well established hardware and software
technologies. Such technologies are typically in widespread use - thus
cheaper and easier to troubleshoot or get help in the event of failure. As the
established technologies become gradually replaced by new and better
ones, so should the backup media and hardware, and, if used, the software to
re/store the data. There is no guarantee that the common backup media
of today, like optical discs or USB memsticks, will be in widespread use in ten years.
The same is true
for software. Thus, a good data preservation strategy should include
continual migration of the backup data to mature and well established
technologies of the time.
A bit about data compression
Compression makes data smaller and thus is a popular option when saving files because
less space is used. The downside is the extra time needed to compress the data and
later to uncompress it when opening the compressed data.
Data compression is done by a compression algorithm, which is a method employed
to reduce data size.
There are two types of data compression algorithms: lossy and lossless.
Lossless compression reduces the size of the data without modifying its content.
Lossy compression modifies data content, which allows for even smaller than
lossless compression. Furthermore, there are many different lossless algorithms and
many different lossy algorithms. For example, the PNG and TIFF image formats both rely on
lossless algorithms to compress image data, but the algorithm used by PNG is different
from the one used by TIFF.
|
Most of the low cost burnable DVDs have a life span of around two years. Higher quality DVDs can last up to five.
|
|
Some file formats that rely on compression, like MP3 or JPG, are highly specialized. They use lossy
algorithms and produce very small file sizes but can only compress a particular
type of data. Other formats, like ZIP or BZIP2, rely
on lossless compression algorithms and can work on any data.
However, they will never produce a smaller file size than special purpose formats like MP3 or JPG.
Because lossy compression changes the data, formats like JPG, MP3 or any
other lossy format degrade the original data to some extent. In other words, saving an
image or music in a lossy file format will make it different than the original.
Usually, the difference (called compression artifacts) is so small that most people
don't see it or hear it. However, this largely depends on the compression settings.
The more the data is compressed, the easier it is to notice the difference.
For the above reasons, lossy compression should never be used when saving
important master / original data. Only lossless compression is suitable for that. PNG and
TIFF are examples of image file formats that have lossless compression. Such
formats are ideal for storing hi-resolution master images of finished artwork.
A lot of space can be saved thanks to compression.
I took one of my
images and saved it in different image formats: BMP (which has no compression),
TIFF (lossless compression), PNG (lossless compression)
and JPG (lossy compression).
All lossless compression was done with maximum compression settings [1].
I then compressed those files with three general purpose lossless compressors: ZIP, BZIP2, 7ZIP.
Since JPG is a lossy format it is only included for the sake of comparison.
The Book.txt is Sun Tzu's
The Art of War.
file format
|
size in bytes
|
zip'ed
|
bzip2'ed
|
7zip'ed
|
Colony.bmp
|
1 440 054
|
911 154 (63%)
|
693 481 (48%)
|
713 287 (49%)
|
Colony.tiff
|
662 948
|
652 315 (98%)
|
655 239 (99%)
|
652 955 (98%)
|
Colony.png
|
611 676
|
611 923 (100%)
|
613 466 (100%)
|
610 711 (100%)
|
Colony.jpg
|
303 217
|
302 933 (100%)
|
302 852 (100%)
|
300 268 (99%)
|
Book.txt
|
343 695
|
130 340 (38%)
|
100 696 (29%)
|
91 187 (26%)
|
(The percentage in the table above indicates how much the compressed size is out of
the initial size. The smaller the better.)
The compression times vary somewhat but not too much to be impractical.
PNG is a clear winner among images. It uses about 58% less space
than BMP! Notice that only one of the general purpose compression tools,
7ZIP, further compressed (slightly) the already compressed PNG file.
The book file was compressed down to about 26-38% of its original size,
which is typical for text compression.
Generally, text files (TXT, HTML,
XML, etc) can be compressed the most of all file types. Images that have
been compressed with their own algorithms (PNG, JPG, TIFF, etc) can't
later be compressed much if at all. Images
which don't have own compression (BMP, RAW, etc) can often be
compressed quite a bit, though this depends on the actual image data.
ZIP is a commonly used compression archive format - it's fast
and compresses well. It can be used on a single file or a whole directory
structure. Its been around for a long time and is universally
available. But there are other, less known, good alternatives.
For example, 7ZIP, RAR, and BZIP2 compress significantly better than ZIP but
are a little slower.
One popular backup solution is to compress a whole directory
structure into an ZIP archive and copy the archive to a backup media.
The problem with this approach is the possibility
of losing all files in the archive if the archive gets corrupted
and can not be recovered. Therefore it's better that each file is compressed
and stored individually, because the probability that all files get corrupted
and unrecoverable is
much smaller since each file uses much less space than an archive.
Thus it's safer to compress files individually.
What backup media to use
|
External hard drives are a popular backup media due to large capacity and speed.
|
|
The commonly used backup media of today are external hard drives, USB memsticks, tapes,
optical discs (DVD, Blu-ray, etc), and online cloud storage. Every media has its pros and cons.
External hard drives are the fastest and often the best option for
large amounts of data. They are also the most expensive and not very
durable. Tapes are slow but can store a lot of data and can last decades.
USB memsticks have a very small physical size but can get lost easily,
and may not offer enough space for your data.
Optical discs are probably the most common backup
media used due to very low cost. Unfortunately, they are not very reliable,
and most have a relatively short expected life span of between
two to five years. Online backup solutions are limited by the speed of your
internet connection and access to it, and may not offer sufficient space for your needs.
However, backing up data online is very convenient.
Reliability is important to consider when choosing the backup media.
How robust is the media and for how long can it retain the data? The
quality of the media plays a significant role here. All media degrade over
time, but some degrade more than other. For example, most of the low cost burnable DVDs
have a life span of around two years. Higher quality DVDs can last
up to five. Very high quality DVDs with a golden layer are expected to
last decades. Generally, if the handling and storage conditions are good,
quality media should last at least few years without data loss.
A combination of different media may often be the ideal solution. For
example, my own backup practice includes using an external
hard drive, USB memsticks, and online storage.
Because everybody has different needs,
I recommend evaluating different backup media in order to decide which
suits your needs best. Keep in mind that using high quality products will minimize the possibility
of a backup failure.
The necessity of verifying backups
One of the most important aspects of taking backups is making sure
they are error free. The backup data may prove useless if corrupted
due to media or other error.
It is therefore essential to immediately test the backup for its validity.
Errors will be detected and a new backup can be taken right away.
Any respectable backup program provides an option for data verification.
What good is a backup if its data is corrupted?
Final notes
Depending on your needs a dedicated backup software may be a necessary
investment. Make sure to research this carefully. Usually, products from
reputable companies that specialize in backup solutions are best. There
are also many good open source or free software alternatives.
|
The quality of the backup hardware, media and software are equally important.
|
|
It's best to avoid products which rely on proprietary or closed solutions.
For example, a commercial backup
software may store the backup data in an unknown format only supported
by this particular backup software. Avoid that. If the company goes
out of business and the backup, or backup software, breaks, your backup data may
be lost forever. Look for products that rely on well known, mature, and
ideally open technologies. For example, PNG is an open format for storing
image data. What this means is that the specification, or blueprint, for
that format is publicly available for anyone to use it. This increases
compatibility and reduces reliance on any specific vendor or product.
For most artists the important data consists mainly of images and 3d files. To save
space rely on PNG, TIFF or JPG bitmap image formats. Vector images and 3d files
can be compressed individually if needed. A basic incremental backup software
that regularly copies the important files from your harddrive to a backup media
may be all that is needed.
It's best to make two sets of the backup data and store each at different location.
One close to home, like a friend's place or a bank box, and the other far away.
Setting up a proper backup procedure may initially require a significant
amount of time and cost money. There is a lot to research and consider.
In the end however, a good backup procedure will prove an exceptionally valuable
investment. As you read this, your screen could go blank due to a hard drive crash.
All your valuable data - artwork, reference images, documents, photo albums,
etc. - could be lost forever. Unless you are prepared and have a backup.
RESOURCES
zip
- the zip file format, a popular archiver with compression.
7zip
- a file archiver supporting a variety of archive formats and compression algorithms.
bzip2
- my favorite file compression tool.
png
- a highly versatile, and my favorite image file format.
rsync
- synchronizes files and directories from one location to another.
tar
- an archive file format designed for tapes but commonly used for many other media.
backup software
- a list of commercial and free backup programs.
storage review
- a good source of hard drive reviews.
FOOTNOTES
1. The LZW algorithm was used for the TIFF image. PNG image was compressed with maximum
compression. JPG with the lowest compression setting(100).
Zip, bzip2 and 7zip were all set to use maximum compression.
The following switches were used:
zip: -9 (used on all test files)
bzip2: -9 (used on all test files)
7zip: -m0=ppmd:o=4 (used on Colony.bmp)
7zip: -m0=lzma:a=1:d=0:lc=8:LP0:PB0:mf=bt2 (used on Colony.tiff)
7zip: -m0=lzma:a=1:d=0:lc=8:LP0:PB0:mf=bt2 (used on Colony.png)
7zip: -m0=lzma:a=1:d=0:lc=8:LP0:PB0:mf=bt2 (used on Colony.jpg)
7zip: -m0=ppmd:o=20:mem=26 (used on Book.txt)
Dawid Michalczyk is a freelance illustrator and an artist.
To see examples of his artwork and writings visit his website at
http://www.art.eonworks.com
Copyright © 2006 Dawid Michalczyk. All Rights Reserved. This content
may be copied in full, with copyright, contact, creation, information and links
intact, without specific permission, when used only in a not-for-profit
format.