Why an open archive API is needed to transfer data between cloud services

I know that it might seem like I have spent a lot of time discussing archives recently, but it is an unassailable fact that the more data that is transferred to the cloud, the more we might have to retrieve at some point in the future. Not to bring back on-premises - at least not in the majority of cases - but perhaps to be able to move to a different cloud service. Right now, the cloud vendors have a solid lock on customer data once it's in their repository and that's surely not a good thing. An Open Archive API agreed by the industry would make life much easier all round. I don't think there would be a rush to move (because such an operation will remain expensive) but at least the option would exist.

Microsoft is listed as one of the “Challengers” in Gartner’s Magic Quadrant for Enterprise Information Archiving (29 October 2015), which is testimony to the hard work and effort put into building out the compliance features over the last few years since their debut in Exchange 2010.

Gartner’s market definition is “Enterprise information archiving (EIA) incorporates products and solutions for archiving user messaging content (such as email, IM, and public and business social media data) and other data types (such as files, enterprise file synchronization and sharing [EFFS] and Microsoft SharePoint documents, some structured data, and website content).”

The scope for archiving is much wider than email, yet it seems to me that few who have moved workload into Office 365 have considered how they might extract their data. I covered this topic in November 2015 and some further consideration brings me to a list of the possible content types that I might have to extricate if I wanted to move data from my Office 365 tenant.

Exchange mailbox data is a relatively easy task because hybrid connectivity allows mailboxes to be moved back to on-premises servers if the need arises.

SharePoint documents, lists, and other metadata are more problematic, but companies like Sharegate can help. Remember that SharePoint Online spans traditional sites as well as providing storage for OneDrive for Business and Office 365 group libraries.

Then there’s the new information such as the videos uploaded into the Office 365 Video Portal, which holds its metadata in SharePoint and the transcoded video content in Azure Media Services. Hopefully, you have all the original videos that were uploaded to the portal.

And if Yammer is used, the small matter of how to transfer information held in its groups to some other repository comes into play.

There’s more to consider too, such as the shared notebooks used by Office 365 Groups plus all the configuration information for the tenant and user settings. In short, moving away from Office 365 (or any other cloud service) is a bear.

The costs involved in such an activity can be staggering. For instance, HP’s Digital Safe is a cloud-based archiving solution used by large enterprises. Let’s say that you want to move your archives to a competitor solution, such as those offered by Veritas, Proofpoint, or Mimecast. A large price tag is usually associated with the work necessary to extract the data from the source archive, package it into a format suitable for the target, and ingest it into the new archive. The effort is likely to require a lot of manual intervention that will drive the cost well into six or even seven figures.

Which brings me to the need for something like an Open Archive API to allow for high-fidelity transfer from one archive to another (or one cloud service to another). Ideally, the API would accommodate transfer via “drive shipping” or network uploads, just like the Office 365 Import Service does today. In fact, the Office 365 Import Service is probably further along the path than other archive vendors because it has a specification for ingestion packages that ISVs can use to create feeds from non-Microsoft data sources.

Because of the volume of data held in archives, the API needs to be optimized for high-capacity transfer of information and have sophisticated error-checking and reporting capabilities. It would also need the ability to assure that the chain of custody for data is maintained during a transfer so that legal challenges could not be mounted on the basis that information could have been interfered while being transferred.

An Open Archive API would mean lower cost and greater customer convenience when the time came to move data between repositories. It would eliminate the need for specialized connectors to extract or ingest data. Providing of course the archiving vendors agreed to support the API.

Some might say that the EDRM model would be a good basis because it’s already in use in the eDiscovery space. However, I think we need an API that has much better coverage of the data now in use in cloud services together with the ability to restore items in the target repository as if they were created there initially. And eDiscovery operations often restore to PSTs to move data around from, which seems like a pretty retrograde step.

Cloud services have been with us for a decade. We solved the problem of email interoperability years ago when SMTP won over X.400 to become the de facto standard. Given the increasing amount of data held in cloud services today, isn’t it about time that the industry worked together to give tenants true control over their information and fulfil all those promises that “it is yours to take with you if you decide to leave the service

I think the time is ripe for archive vendors to do a better job of interoperability. Do you agree?

Follow Tony @12Knocksinna

Discuss this Blog Entry 1

on Apr 8, 2016

Hi Tony

I couldn't agree more. I think any organization that has gone through the pain of migrating from a legacy archive will certainly look for better, more convenient extraction capabilities in their future archiving strategy.

My company, HubStor, just came out of stealth mode and we're tackling this problem upfront. We're an Azure-based cloud archive, and the same on-prem software we provide for extending in-house primary storage to our cloud archive service also has data extraction capabilities that can handle export of eDiscovery cases, fine-grained data recovery, and complete tenant extraction.

I think it's a healthy exercise for an archive vendor to run through their own extraction process, because it forces them to look harder at data integrity and deal with things like recovery of complete item metadata, ACLs, and folder structures (and make sure their deduplication is lossless). Plus, if an archive vendor gives you the extraction tools out-of-the-box, making it easy to leave, it says they are confident in the value and quality of their service. In other words: A huge red flag if an archive or cloud service tries to lock you without native extraction.

Please or Register to post comments.

What's Tony Redmond's Exchange Unwashed Blog?

On-premises and cloud-based Microsoft Exchange Server and all the associated technology that runs alongside Microsoft's enterprise messaging server.

Contributors

Tony Redmond

Tony Redmond is a senior contributing editor for Windows IT Pro. His latest books are Office 365 for Exchange Professionals (eBook, May 2015) and Microsoft Exchange Server 2013 Inside Out: Mailbox...
Blog Archive

Sponsored Introduction Continue on to (or wait seconds) ×