|
Data Lost Protection
By Cindy Elliott
When it comes to data protection, do you think you’re compliant? Answer correctly: Corrupt metadata could land you in jail.
The amount of data that companies must store is growing at an enormous pace as regulations, including HIPAA,SEC17a-4, the Federal Rules of Civil Procedureand Sarbanes-Oxley—to name just a few—place an increasing burden on enterprise storage, backup and recovery needs. Failure to protect information in a recoverable state can lead to fines, other penalties and jail time.
Matters are made more complex when enterprise content management, or ECM, systems are in use. The benefits of ECM—workflow, collaboration, and integrated management of information assets—bring an extra level of responsibility when it comes to protecting the content and metadata contained in ECM systems. That extra level of care required primarily relates to ECM metadata (audit trails, digital signatures, workflows, renditions and other “data about data”) that must be backed up with the content it supports in a synchronous manner to ensure full recoverability and compliance with numerous regulations.
Most C-level executives believe the information in their ECM systems is already protected by their enterprise backup systems. “We’re compliant,” they say. “We have an enterprise content management system in place and my IT team assures me it’s being backed up.”
Asking the right questions
If you ask your IT department if your ECM information is backed up, the answer you will inevitably get back is “yes.” But you need to ask much deeper, probing questions of your IT team if you are to protect your organization and yourself from the risks of data loss. ECM systems present several challenges to recoverability: complex data structures, large databases and critical metadata that cannot be lost.
Today’s regulatory environment is forcing many companies to re-evaluate their company-wide processes and procedures, including backup and recovery strategies. Section 404 of the Sarbanes-Oxley act of 2002, specifically, makes C-level executives accountable for misrepresentation of corporate, operational or financial information. Without a sound backup strategy that protects content and metadata in ECM systems, the risk of non-compliance grows.
The importance of metadata
Most of the time, the metadata associated with contracts, supplier bids and other documents are more important than the content within them. It’s important to understand the complexity of information stored in ECM repositories to get the full picture on the relationship between content and metadata. ECM systems have applications, such as credit and loan processing applications and contract management applications, that integrate with them. Business process management tools and applications are also often integrated with ECM systems.
The ECM system acts as a tool for creating compliant workflows and processes, manages all transactions and is ultimately the warehouse for information and the relationships between different pieces of that information.
As new business applications, documents and transactions are added to the ECM repository, an infrastructure of content (say, a loan application) and metadata (say, loan numbers, annotations, revisions and digital signatures) is built around the original document. This infrastructure metadata is lifecycle information about the original content that creates complex interrelationships among different pieces of information.
Take, for example, the government procurement manager coordinating responses from defense contractors replying to an RFI for a guidance system. The procurement manager must be certain that the response documents, which may include a contracts section, an engineering section, a project management section, specific responses to questions and a number of appendices, are routed through the various steps integral to the contractor selection process.
As the responses move through these steps, they “collect” metadata such as workflows, digital signatures, approvals and renditions. Any loss or corruption of this metadata could significantly delay the contractor selection process, because, for reasons explained below, metadata are not easily recreated or recovered.
Metadata are also important because preservation is required by numerous industry and government regulatory mandates holding IT and business managers liable for the retention of original, unaltered information down to the individual record. Without proper safeguards, the loss of ECM system metadata carries the liability of not complying with regulations, the result of which can include significant fines, negative brand exposure and even jail time. Preserving ECM content and metadata such as audit trails while ensuring granular recoverability and creating a demonstrable “chain of custody” can help organizations avoid many of the risks associated with the loss or corruption of complex ECM system information
Depending upon which constituency your company serves, any single piece of ECM information could be subject to a number of compliance, records management and regulatory conditions. Not ensuring the integrity and recoverability of this information is, at a minimum, risky behavior.
- For financial services companies, SEC 17a-4 sets out which records must be retained and for what time periods. The regulation also calls for companies to maintain a system to show the audit trail of each record, to provide verification that those records were not altered, and to store records in such a way that they cannot be altered, overwritten or erased.
- For all companies doing business in the United States, the Federal Rules of Civil Procedure apply. The recent e-discovery amendment to the FRCP says that metadata—including audit trails and renditions, regardless of how complex—must be preserved and produced on demand, and it specifies a default form for producing electronically stored information in a “reasonably usable” form.
It is important to note that while there are records management solutions that help companies meet data retention requirements associated with the above regulations, they do not protect against the information loss and corruption which can result in an inability to comply with them.
What puts metadata at risk?
The key to protecting ECM system content is ensuring that the relationship between content and its related metadata is not lost during either backup or recovery, or as is more likely to happen, during a partial information loss incident. According to Strategic Research Inc. & AIIM International, these incidents account for more than 80 percent of ECM information loss and are caused by common occurrences such as user, logical or programmatic errors; malfeasance; viruses; or metadata corruption. Regardless of the cause, if the relationship between content and metadata is broken, critical workflow data, audit trails, digital signatures and other metadata—from the document’s complete lifecycle—companies can suffer compliance risk, the inability to respond to e-discovery demands, lost productivity and loss of potential revenue.
Anything that disrupts operation of an ECM application can corrupt content and its metadata. If an administrator applies a bug patch but uses the wrong date range or if a virus in an application manifests, permanent data loss may result. The only way to ensure that original metadata, content and the links between them are recoverable is to capture both the metadata and content in a synchronous (simultaneous) manner or capture both the content and database servers while they are offline.
However, it’s also important to keep in mind that most existing solutions do not validate the integrity of ECM system information during the capture process. This means that if corrupt metadata or content exist in the repository (which they typically do), they are captured in this corrupt state and therefore will be recovered in this same state, rendering the information useless. So a solution is needed that not only will capture content and metadata in their original state, but that also will capture only clean, uncorrupted information and flag any corruptions or inconsistencies for proactive correction.
Traditional enterprise solutions are structured for system-level backup and recovery and are designed for response to full-system failures or disasters. They do not provide granular recovery for response to partial information loss incidents, and require companies to either suffer the repercussions of the partial loss or recover from the last full backup.
Traditional “cold” recovery disrupts business operations by requiring the ECM system to be brought down for a rollback, which takes employees offline and causes the loss of all additions and changes made to repository information since the last cold backup. This can involve extremely large amounts of information if the loss is not noticed until weeks or months after it occurs.
“Hot” backup practices using traditional enterprise solutions have become popular because they eliminate the need to take the ECM system offline for the system backup. However, this method is not foolproof. Most hot backup solutions capture content and metadata in an unsynchronized manner, one after the other, while users may be making changes or additions to the information in the repository. This can cause the relationships between content and metadata to become disassociated, rendering information lost or inaccessible (corrupt) upon recovery.
When the integrity of information is compromised in this way, the result is not only disruption of business operations (when a user attempts to access it), but also compliance-related risks since audit trails and other metadata may be inaccessible. ECM-specific solutions that provide synchronized hot backups do exist, but while they are ideal for recovery from a full-system failure or natural disaster, they don’t enable recovery of information at a granular level.
Continuous data processing
CDPis a form of always-on, continuous backup, which captures changes as they are made. CDP creates a ‘snapshot’ at the point in time data is modified, capturing a record of transactions or changes while systems stay online. However, this solution does require the ECM system to be offline during recovery, and information cannot be recovered at a granular level. In addition, if the CDP technology used in an ECM repository doesn’t capture changes to both metadata and content simultaneously, all bets are off that integrity of information will be maintained.
Because these methods capture changes frequently or even as they happen, recovery typically doesn’t require a roll back. In cases of partial data loss, however, these methods can compromise the integrity of documents and their related metadata because the loss is copied to the secondary location. If data is corrupted, damaged or deleted in the ECM system, it will be in the same state in the secondary copy.
Clearly, ensuring the integrity and recoverability of ECM metadata and content so they can easily be retrieved and authenticated in response to business demands, e-discovery requirements, audits and inspections is a critical business competence. To accomplish this, an ECM-specific granular recovery and integrity solution should be deployed. The goal is to ensure the integrity of information when it’s captured, and ensure its quick recovery with minimal effort and negative impact.
Look for an ECM-specific protection solution with the following capabilities:
- Integrity checks: Performs proactive integrity checks so that corrupt information is not captured by the system, ensuring, for example, that contract metadata can be restored to their original state.
- Granular recovery: The ability to rapidly identify and recover only the lost or corrupted ECM content and metadata affected by a partial information loss incident with integrity ensured.
- Hot capture and recovery:Perform information capture and recovery while the ECM system remains online to avoid disruption to work processes and maintain productivity.
- Incremental capture: At frequent intervals, captures only the information that has been changed or added to the repository.
- Synchronous capture: Capture both content and metadata in a single, synchronous pass while the ECM system is online to prevent data corruption and inaccessibility upon recovery.
- Adherence to records-retention policies: The solution should comply with ECM system and enterprise-wide records retention policies by destroying captured information in accordance with those policies.
- Minimize data loss windows: The solution should enable ECM information captures at a near real-time frequency that will help meet or exceed recovery point objectives.
- Fast recovery: The time it takes to recover from a partial information loss incident should take only minutes, enabling you to meet or exceed recovery time objectives.
- Minimal required recovery resources: The recovery process should be manageable by a single administrator.
Data recovery cannot be treated as the ugly stepsister of enterprise backup, and the special needs that ECM systems place on backup must not be ignored. Regulatory authorities and industry experts are beginning to demand more ECM- and compliance-savvy recovery management strategies, thereby setting new industry-wide legal precedents. One misstep can lead to disaster; however, there are approaches and ECM solutions that help avoid noncompliance, downtime and other incidents.
From the executive suite to the data center, maintaining critical information—and avoiding missteps that can lead to censure, fines and even jail time—requires an understanding of the relationship between ECM content and metadata. Taking steps to deploy a solution that ensures the integrity, recoverability and authenticity of information in ECM repositories is possible with proven technology, which works with leading enterprise backup and recovery solutions. Protecting ECM information is an investment that cannot be overlooked.
Cindy Elliott is vice president, business development and marketing of CYA Technologies. She can be reached at celliott@cya.com.
Information loss in the real world
An outsourced IT services administrator at a Global 250 civil engineering company, believing she was in the ECM development system, began to clean up the repository, issuing a ‘delete’ command. She was actually in the production environment, and wiped out more than 8,500 complex, highly linked supplier bids with just one keystroke.
It was determined that using the enterprise backup and recovery solution to roll back the ECM system to a point in time prior to the incident was not a viable option as it would cause unacceptable ECM system downtime and result in the loss of all additions and changes made to the repository since the time of the backup. The company opted instead to manually retrieve the bids and attempt to recreate the metadata associated with them.
The impact of this incident was significant and resulted in:
- more than four months spent running multiple daily shifts to manually retrieve individual files (content only) from tape;
- countless hours spent attempting to recreate the document workflow and lifecycle information (metadata), which was permanently lost;
- negative brand exposure as the company had to resend new bid IDs and other information to each supplier; and
- the loss of more than eight months of productivity and millions of dollars in business opportunities from project delays.
A year after this incident occurred, the organization still had not recovered all of the lost content and metadata. If the company had had an ECM-specific granular recovery solution, just one person would have been able to recover all the lost information, in its original state, in just a few hours without any system downtime.
|
(
box)
.
|