Nuffnang

Monday, August 11, 2014

RHEL Brings Software Defined Storage to Big Data

cache tier diagram
Cache tiering divides the data cluster so that hot data being accessed regularly can be held on faster storage, typically SSD's, while erasure-coded cold data sits below on cheaper storage media. Image credit: Ceph.
Red Hat last month released the latest version of Inktank Ceph Enterprise, their object and block storage product based on the upstream open source Ceph project. It's notable not only as the first release since Red Hat acquired the two-year-old startup, Inktank, in April, but also for two key features that help open up a

new market for Ceph.



While Ceph gained prominence as the open source software-defined storage tool commonly used on the back end of OpenStack deployments, it's not strictly software for the cloud. With the latest new enterprise feature addition, Ceph has begun to see adoption among a new class of users interested in software-defined storage for big data applications.
The new enterprise features can be used in both legacy systems and in a cloud context, “but there's almost a third category of object storage within an enterprise,” said Sage Weil, Ceph project leader, in an interview at OSCON. “They're realizing that instead of buying expensive systems to store all of this data that's relatively cold, they can use software-defined open platforms to do that.”


Inktank Ceph Enterprise logo








“It's sort of cloudy in the sense that it's scale out,” Weil said, “but it's not really related to compute; it's just storage.”

Two Important New Features

Ceph Enterprise 1.2 contains erasure coding and cache-tiering, two features first introduced in the May release of Ceph Firefly 0.8. Erasure coding can pack more data into the same amount of space and requires less hardware than traditional replicated storage clusters, providing a cost savings benefit to companies that need to keep a lot of archival data around. Cache tiering divides the data cluster so that hot data being accessed regularly can be held on faster storage, typically SSD's, while erasure-coded cold data sits below on cheaper storage media.
Used together, erasure coding and cache tiering allow companies to combine the value of storing relatively cold, unused data in large quantities, with faster performance – all in the same cluster, said Ross Turk, director of product marketing for storage and big data at Red Hat.
It's a set of features that are both useful in a cloud platform context as well as in standalone storage for companies that want to benefit from the scale-out capabilities that the cloud has to offer but aren't entirely ready to move to the cloud.
“In theory it's great to have elastic resources and move it all to the cloud, but training organizations to adapt to that new paradigm and have their own ops teams able to run it, takes time,” Weil said.

Appealing to big data users

OpenStack was a good first use case for Ceph to target because developers and system administrators on those projects understand distributed software, Weil said. Similarly, a greenfield private cloud deployment is a good use case for Ceph because it's easy to stand up a new storage system at the same time “rather than attack legacy use cases head on,” he said.
But enterprise private and hybrid cloud adoption still lags behind public cloud use, according to two recent reports by IDC and Technology Business Research. One reason is that most companies lack the internal IT resources and expertise to move a significant portion of their resources to the cloud, according to a March 2014 enterprise cloud adoption study by Everest Group.
Storage faces an even longer road to adoption than the cloud, given the high standards and premium that companies place on retaining data and keeping it secure.
“People require their storage to be a certain level of quality and stability – you can reboot a server but not a broken disk and get your data back,” Turk said.
By providing an economic advantage to users in the growing cold storage market, Ceph has the added benefit of encouraging enterprise adoption of open source storage in the short term without relying on cloud adoption to fuel it.

The path to the open source data center

Over the long term, cloud computing and the software-defined data center – including storage, compute, and networking – will become the new paradigm for the enterprise, Weil said. And Ceph, already a dominant open source project in this space, will rise along with it.
“A couple of decades ago you had a huge transformation with Linux going from proprietary Unix OSes sold in conjunction with expensive hardware to what we have today in which you can run Linux or BSD or whatever on a huge range of hardware,” Weil said. “I think you'll see the same thing happen in storage, but that battle is just starting to happen.”
Red Hat's acquisition of Inktank will help shepherd Ceph along that path to widespread enterprise adoption -- starting with this first Ceph Enterprise release. Ceph will also eventually integrate with a lot of the other projects Red Hat is involved with, Weil says, including the Linux kernel, provisioning tools, and OpenStack itself.

No comments:

Post a Comment