Dynamic Hot Backups

By Matthew F. Reagan, Centocor



System Configuration

At the time this article was written, our system configuration consisted of two Digital Alpha 2100 Sables running under OpenVMS 6.1. They are running Oracle V 7.0.16.6, serving production, development, validation, and evaluation databases using SQL*Forms V3, SQL*Menu V5, and SQR V2.5.

Introduction

This article is the next in a series discussing the database administration responsibilities that I inherited when our DBA was dragged away and forced to take lots of money, write books, etc. In the last issue, I treated the file distribution issues that I had to deal with, having just migrated from a pair of Vaxes running Parallel Server with numerous small drives to two Alpha Sables (for which Parallel server does not work under V7.0) with a smaller number of larger drives.

In this issue, I will address the concept of hot backups. I always knew that they were an extremely important component of our disaster recovery strategy. I also had a general idea about how they worked. Once I was responsible for them, the need to fully understand what they are and how they work became a high priority. From the responses I got to the last article, there are a fair number of you who have received battlefield promotions and are attempting to deal with these issues. Since I need to do the research anyway, I'll be publishing the results. If you enjoy these articles, please let me know. If there is a topic you would like me to address, also let me know. And if I make any glaring errors, DEFINITELY let me know!

What is a Hot Backup?

According to the Oracle7 Server Concepts Manual, page 24-3, a "partial backup" is any operating system backup short of a full [cold] backup, taken while the database is open or shut down. According to their definition, this could include a single data file, all of the data files of a tablespace, just a control file, or the entire database and its support files.

For the purposes of this article, we will be discussing a backup of all of the data files that make up a database, any redo log files written during the backup, and its control file, performed while the database is open and in use. At our site, and at most sites I know, this is referred to as a "Hot Backup". It is not necessary to back up all of the files during a hot backup, but I don't want to get ahead of myself.

Partial or hot backups can only be performed on a database operating in Archive Log mode. Any backups taken of a database while it is open and NOT in archive log mode are of no value. They cannot be used to recover a database.

Why perform a Hot Backup?

There are three basic ways to protect the data in your database: 1) A cold backup, which is a backup of the database data files, online redo log files, and the control files after a clean, normal shutdown of the database. 2) A full database export, which extracts all of the objects from the database and places them in an external file in a proprietary format. 3) A hot backup, as mentioned above.

By far, the most convenient, easiest, and usually fastest way to back up a database is with a cold backup. The end result is a complete, self-contained database. All of the files within the backup are consistent to a particular point in time (the time of the shutdown prior to the backup). Unfortunately, to perform a cold backup, the database must be down (i.e., unavailable for use). In most installations, this is becoming less and less of an option: The downtime required is just not available most of the time. It is still a valuable tool, and should be used when an opportunity presents itself, but it cannot be expected to carry the bulk of the database recovery strategy for most installations (unless you feel really lucky/confident about your archive log storage systems!).

An export of the data is another tool that can aid in protecting your database. However, it does not lend itself to recovery from most recovery situations (a lost drive, for example). Its most common use, outside of Oracle version migration and database defragmentation, is a recovery mechanism for accidentally dropped or truncated tables or deleted records. The data in the export file will only be consistent with a particular time within each table (if you use the CONSISTENT=NO command line option, available with Oracle7). With CONSISTENT=YES, the data will be consistent with a particular time across tables. Unfortunately, neither export can be rolled forward using archived redo logs, like the hot and cold backups can. In other words, you can only recover to the time the export was taken.

The hot backup provides you with the same recoverability as a cold backup, but removes the restriction of having to shut down the database before backing up the database data files.

How do you perform a Hot backup?

In order to back up a database while it is still open and in use, you need to back up the tablespace data files, the archive logs that were created during the backup, the online redo logs that were written to during the backup, and one of the control files.

First, the data files

The tablespaces that make up a database can be retrieved from the DBA_TABLESPACES view:

select tablespace_name from sys.dba_tablespaces;

The data files that make up a tablespace can be retrieved from the DBA_DATA_FILES view:

select file_name from sys.dba_data_files where tablespace_name = TSNAME;

substituting the desired tablespace name for TSNAME.

At this point, the database needs to be informed that the tablespace is about to be backed up. The syntax to do this in SQL*DBA is:

sqldba lmode=y
connect internal;
alter tablespace TSNAME begin backup;
exit;

The data files can now be backed up using the appropriate operating system commands. Once the backup is complete, the tablespace needs to be taken out of backup mode with:

sqldba lmode=y
connect internal;
alter tablespace TSNAME end backup;
exit;

These steps must be performed for each tablespace and its constituent data files.

Second, the archive and online redo logs

A hot backup of a data file is a copy of possibly inconsistent data; it is not guaranteed to be consistent with respect to a specific point in time since activity could have occurred while the data file was being backed up. When the backup of an online tablespace (or its constituent data files) starts (by issuing the alter tablespace tsname begin backup command), Oracle stops recording the occurrence of checkpoints in the headers of the data files being backed up. This means that when a data file is restored, the last checkpoint it knows about was the one just prior to the backup. As a result, Oracle asks for the appropriate set of redo log files to apply should recovery be needed. Once the backup is complete, Oracle records the current database checkpoint in the file headers.

In order for this to be possible, we need to back up the redo logs that were being written while the hot backup was being performed. If a log switch had occurred, the archiver would have copied the online redo log file to the archive destination and started a new file. To get all of the relevant redo information, we need to force a log switch and then back up all of the files that are sitting in the archive destination. The syntax to do this in SQL*DBA is:

sqldba lmode=y
connect internal;
alter system switch logfile;
exit;

This forces the log writer to close the current online redo log file and pass it to the archiver. At this point, we need to wait long enough for the archiver to finish moving the file to the archive destination before backing up the archive logs to tape. Unfortunately, I don't have a particularly clean way to do this. I just wait 10 minutes. It normally takes 2 minutes, 4 if we are under a heavy network load, so the 10 minute allowance has a big enough safety margin. However, each site varies in the length of time it takes to copy archive logs. If anyone has a more elegant way of handling this, let me know (and I'll put it in the next issue).

Once the archiver is done, all of the logs in the archive destination should be backed up using the appropriate operating system commands. This will also back up any archive logs that were in the directory prior to the hot backup.

Finally, the control file

Last, but not least, is a backup of the control file. The database control file should be backed up after making any structural modification to a database operating in archive log mode. This is a simple task, but many sites just don't do it. If you have a failure, you MUST have a copy of the latest control file. If not, you must be able to recreate it using the create controlfile command, not a task for the faint of heart. At the very least, you should back up the control file to the same medium as your hot backup. That way, at least you have the control file that was in use at the time. You may be able to use it as a basis for reconstructing the correct control file (i.e., add the tablespaces/data files that you created after the hot backup). The syntax to do this in SQL*DBA is:

sqldba lmode=y
connect internal;
alter database backup controlfile to ''filename' reuse;
exit;

This copies a current snapshot of the control file to where it can be backed up using the appropriate operating system commands.

Another good move is to use the trace option:

alter database backup controlfile to trace noresetlogs;

This option causes Oracle to write SQL commands to the database trace file, rather than copy a snapshot of the control file. These commands start up the database, recreate the control file as it is currently configured, and recover and open the database using the current control file. Just copy the commands from the trace file, remove the commenting, modify the structure to reflect the latest version, and use the script to recover the database. Again, you will only need this if you lose all of your current copies of the control file, which is highly unlikely since you have multiple copies spread across your disks, controllers, and maybe even across your network, right? Back up a copy of your trace file to the same media as your hot backup using the appropriate operating system commands.

The hot backup script

The resulting hot backup script to back up a simple two tablespace database is listed in Code Sample # 1. Since the code is commented, I won't reiterate the commands listed.

Maintenance issues

Once you have created this wonderful hot backup script, you need to update it every time you modify your database's physical structure: Add a new data file, consolidate two data files for a tablespace into one, drop a tablespace (and its data files), etc. If not, you won't be backing up the entire database. This is really not something you want to discover when you have a media failure. "What do you mean, that tablespace wasn't backed up?" can kill, if spoken with the right conviction. Avoiding this problem is the purpose of the next part of this article: Making your script dynamic.

Unlike the database's physical structure, the archive destination and control files do not have any maintenance issues: Under Open VMS, the archive destination is designated using a logical. If your backup script uses the logical, it will repoint to any new destination automatically. Since control files are never backed up directly, moving them does not impact your script. The backup controlfile command will place a copy in the location specified in the command line regardless of where the original files are.

Dynamic?

Take a moment and think about what we have discussed so far: What part of the backup script is subject to change? The list of data files that make up the tablespaces. Hmm... This information is stored in the view dba_tablespaces and dba_data_files. We were using these tables at the beginning of the article when we were putting together the script. Hmm... Maybe we can write a program that will query the database and build the backup script on demand. Yeah! We can do that, right?

To create this utility, you will need a tool that 1) can access the database, retrieving the tablespace names and the data file names, and 2) can create a batch program. Many tools will do this: All of Oracle's precompilers, several report writers, etc. My code is written using SQR V 2.5, by MITI of Long Beach, CA. SQR classifies as a report writer, but, as you can tell from this article, it can do a bit more than that. Since this code was written for a Digital Alpha running Open VMS V6.1, the batch programming language that will be used is DCL. If you are on a different platform, or have a different operating system, you should be able to substitute the equivalent commands with little difficulty.

The dynamic hot backup script utility

The dynamic hot backup script utility code is listed in Code Sample # 2. This code is much simpler than I thought it would be when I started. There are only three sections to worry about: Startup, Get_Tablespaces, and Shutdown.

The Startup procedure is responsible for opening the COM file for writing and entering the beginning (non-dynamic code). This code sets the database and instance to use, sets up the backup command, initializes the tape, puts a timestamp into the file, and enters SQL*DBA.

The Get_Tablespaces procedure is where the dynamic code comes in. It is responsible for retrieving the tablespace names from the dba_tablespaces view, writing the script to put the tablespace in backup mode and exit SQL*DBA. It then retrieves the data file names from the dba_data_files view, writing the script to back up the data files, and writing the script to enter SQL*DBA and take the tablespace out of backup mode.

The Shutdown procedure is responsible for writing the scripts to: cause a log switch, exit from SQL*DBA, back up the archive logs, and enter SQL*DBA.. It then requests that the database back up a static copy of the control file to the current directory and writes the script to exit from SQL*DBA, back up the control file, and put a timestamp into the file.

Extensions

Once the basic code has been written, it is relatively easy to extend the code to meet almost any site requirements. My first modifications will be to add support for more than one tape drive. Since another drive of equal capacity is available at that time, taking advantage of it would cut the current backup time almost in half. The only concern that I (and you) need to be aware of, is you will need to synchronize the backup of the data files. The redo log switch cannot occur before all data files have been backed up, or you will not get all of the redo log information you need to restore the data files to a time consistent state.

Another option I am planning on evaluating is whether I can use code like this to handle copying the data file to a temporary place on disk, instead of backing up to a relatively slower medium, like a tape drive. The data file will only be in backup mode during the copy to the fast disk, minimizing downtime problems. The code would then handle backing up the static image of the data file to tape, without having to be concerned about concurrent access to it.

Wrap Up

The goal of this article was to define and discuss the concept of a hot backup, and how it fit into the general database recovery strategy. Once the behind-the-scenes logic was reviewed, a script was built that would handle the safe backup of a database while it was open and in use. These concepts were then taken one step further and a program, written in SQR, was written that would use the database DBA views and build the hot backup script dynamically, allowing the hot backup procedures to adapt to changes in the physical structure of the data base.

I hope that I have been successful in introducing you to hot backups (or reviewing what you have heard/learned). I also hope I have planted some seeds that will allow you to design a dynamic hot backup utility to take care of the annoying maintenance issues caused by modifying the physical database structure. This article is not intended to be a complete, authoritative work on dynamic (or static) hot backup strategies. Please consult your system and Oracle documentation for a more complete picture of database recovery tools and techniques.

If you find any problems with this information, please, bring it to my attention and I'll buy you dinner at IOUW/ECO (or at least print your clarification in the next newsletter).

References

Oracle7 Server Concepts Manual
Oracle7 Server Administrator's Guide
and John, Joe, Joseph, and Kelly, whose proofreading and technical reviews helped tremendously in creating this article.


Volume III Issue 1 Table of Contents


File maintained by Matt Reagan; last modified 10-FEB-1996.