Beginning with Exchange 2010, the ability to give users a Personal Archive hosted within Exchange was brought into the core product. Previously users would rely on either PST files held locally or copied to a network share, or by using a third party product like Symantec Enterprise Vault, which with software add-ons and a third party management server allowed messages to be archived and retrieved via "stub" messages.
The personal archive feature (sometimes known as the "Online Archive") works by allocating a user a second mailbox. With SP1, the Personal Archive can be located on a separate database, separate server or even (when Office 365 is released) in the cloud. Server-side policies control when messages are automatically moved to the Personal Archive and this can be accessed by users using Outlook Web App, Outlook 2010 and very soon, Outlook 2007. Effectively, it's a way of splitting the current, important mailbox data from the little used reference material that is kept for compliance or convenience reasons.
However, one of the big strengths in Exchange 2010 is that you don't need to archive just to maintain performance and keep server costs down. Due to the lower disk performance required Exchange architects can design solutions that make use of large 2TB SATA disks to give users massive mailboxes of sizes higher than 25GB in some cases. And when it comes to mailbox server sizing, the advice for databases that will store primary mailboxes and archive mailboxes is pretty much the same.
This lead to the obvious question - if I can give users massive mailboxes, what is the point of the personal archives? Wouldn't it make more sense to just give them the whole mailbox as one?
I've thought about this a little bit and while just using large mailbox does make some sense - especially from a client compatibility point of view (e.g. limitations of IMAP, Mac, Activesync), and because archiving requires Enterprise CALs, there are certainly a number of good, sound reasons to consider the use of personal archives:
Reason 1 - Reducing Outlook Client's Offline Cache Size
If you give your users a 25GB mailbox, then for most clients, that means a 25GB OST needs to be maintained on each workstation. With the rise of remote workers, "bring your own PC", virtual desktops and taking into account the amount of fragmentation that will occur on the local cache file, this might be undesirable. Although many users won't immediately make full use of these massive mailbox sizes, don't underestimate how many users may take the opportunity to (wisely?) import their old PST files into their larger mailboxes.
By splitting the full quota between the primary mailbox and personal archive, you have a predictable offline file size for clients and can still grant users the larger quotas.
Reason 2 - Tiered Database Copy Levels
How important is the archive data? If it's previously been stored on the local users' PCs, then it's probably not mission critical to the business. If it's been on file shares, then again there might only be two copies - one online and one on the file server backups.
So, why not consider having different database "collections" for primary and archive mailboxes, and having a different level of copies for the archive databases. For example, databases dedicated to primary mailboxes may have two on-site copies, a lagged copy and a copy at a DR datacentre. The databases dedicated to archive databases may only need one on-site and one at the DR datacentre. You could reduce the number of mailbox servers and storage required to support a large mailbox infrastructure whilst still maintaining the levels of resilience you want for your most important data.
Reason 3 - Choose Different Backup Policies for Archive Databases
If you're still backing up the traditional way, then you are probably making sure the databases containing the user's primary mailboxes have a reliable, up-to-date backup. It may well be direct to disk then streamed off to tape after a certain amount of time has past.
Does all the email need this level of recoverability? The personal archive will have messages moved to it daily as it hits the policy for removal from the primary mailbox - but those messages about to be archived are already being backed up as part of the main backups. You could consider using a different backup policy altogether for databases dedicated to archives, such as weekly if you do daily, or daily if you do hourly? Maybe backup direct to tape, or even consider relying on database copies alone. You could reduce the amount of infrastructure you need to build out for your backup systems.
Reason 4 - Total Disaster Recovery
By a total disaster, I mean total! We like to think it won't happen to us, and thankfully with Exchange 2010 the prospect of having to do a dial-tone recovery is one many Exchange admins will never face. But what if the worst does happen? At upto 2TB a mailbox database, how long will it take to restore all those massive mailboxes?
Using dedicated archive databases and personal archives could mean in such a scenario you can bring users to near full working ability by bringing their smaller primary mailboxes from backups and then bring the personal archives back afterward. Given most of the users will be able to work normally, management will hopefully stop breathing down your neck and let you get on with the (still mammoth!) task of bringing back those archives.
Reason 5 - Host Archive Databases Centrally with Primary Mailboxes at Branch Offices
It may be be that for branch offices, you aren't even going to consider placing Exchange servers at each location and just have everyone hooked into HQ. But if you are placing Exchange servers at each branch office, then you will be no doubt need to plan for backups. Deploying massive mailboxes could be a problem as then you need to take into account how you will re-seed (if using DB copies) or restore from backups should the need arise.
Hosting personal archives at HQ could be an option to mitigate against these risks, whilst still ensuring the primary mailbox is local to the client. You can deploy smaller servers at the remote offices and reduce the time taken to restore should the need arise.
Reason 6 - Host Archive Mailboxes in the Cloud
Finally, why not host the archives in the cloud? Early next year after the release of Office 365, you'll be able to look at the option of letting Microsoft host your entire Exchange infrastructure. But if that isn't for you, Office 365 will also offer the ability for personal archives to be hosted in the service. Obviously there are a lot of other factors to consider, such as costs compared to hosting these in-house, regulations you need to comply with, and any internal barriers to adoption. But there could be serious savings to be made; and the chance to limit the up-front costs associated with moving to Exchange 2010, whilst still providing large total mailboxes and an on premises Exchange system.
What are your thoughts? Do these reasons make sense to you, or not.. Have any more ideas about why you should or shouldn't use archiving in Exchange 2010?