Understanding cloud storage - Is your organization ready for it
Article by Govind Desikan
- Filed under:
- Cloud Computing
There’s a great deal of hype surrounding cloud delivery models of all flavors, with the assumption that it will usher in a new era of cheaper, better IT. But most organizations struggle to compare the capital, operational, and staffing cost differences between internal and cloud storage. In this post, I try to unravel the cost differences of a common workload - file storage - consumed through the public cloud versus built and deployed in-house.
Cloud Storage is workload-specific
Cloud, in general, is a way of delivering IT services with its own limitations. Having said that, Cloud is a viable platform for specific workloads where it can deliver performance and security experiences that is as good as or better than what can be deployed in-house. In the world of Storage, latency is a big problem as the public cloud is far-away from the servers in the physical data centers. Therefore, customers should do due diligence on latency-sensitive workloads before diving into the Cloud experience
Typical Cloud Storage use-cases
- Whole in-cloud applications with their own storage
- Backup and archiving to Cloud
- File storage in the Cloud
It is imperative that customers need to evaluate the underlying facilities, infrastructure and storage management applications put forth by the Storage Service Provider before latching onto Storage on the Cloud bandwagon.
Evaluating Cloud based storage
In this section, I try to provide a set of guidelines for CIOs & IT Managers on the evaluation methods for Cloud Storage
- Applications that are storage-centric : Jumping onto a Software-as-a-Service (SaaS) environment applications. In such situations, as both servers and storage are colocated in the cloud, the latency of cloud delivery is typically limited to client-to-host traffic and not host-to-storage. So almost all SaaS vendors tend to prefer this deployment model to limit their latency exposure. For example - CRM-on-the-cloud, ERP-on-the-cloud
- Less latency sensitive data such as Backup and archive data : As backup is a secondary copy of the data and archive data are infrequently accessed, these workloads are latency insensitive. Incremental data sent over WAN can happen throughout the day without the negative impact to applications using primary database. Due to the aforesaid, it can be easier to archive/ backup on the cloud, but restoring and entire data-set back may be a challenge over the WAN. Hence, it is advised that the IT teams set appropriate Service Levels to business for recovery
- Files that are generally less performance critical : Files, aka "unstructured data" which are not needed for application to make sense of information. Files have their own metadata which can be read anywhere, regardless of who created it. Departmental file shares wherein performance is not critical makes it a viable cloud storage evaluation use-case
Computing and comparing costs of in-house and Cloud Storage
Many people tend to make the mistake of comparing cost of buying storage and cost of procuring public cloud storage. A simple calculation that CIOs tend to make is - to estimate $/GB of procurement and annualize them over the life expectancy of of the storage that is being procured and compare with the price of the cloud based storage for a year at the prevailing prices. This is not the right comparison.
This method of comparison leaves out significant amount of costs on either side of the equation. A detailed business case needs to include
- You need more capacity than data : In an in-house environment, one has to pay for operations and the redundancy, which is included in Cloud based storage offerings. Typically, operational expenses for storage is estimated to be 100% of the annualized cost of buying the storage. You will also have to consider much more raw storage capacity to deliver enterprise-class reliability and performance and growth buffers
- Cloud Storage includes all, except WAN and transaction charges : A Cloud storage offering includes the operating costs and capacity redundancy which makes measuring costs easier. The elements that are missed out in a cost-calculation of such an offering should include
- Impact on the network bandwidth from sending data across WAN
- Any transactional charges should there been additional storage capacity than the contracted size
- Performance capabilities of specific application requirements
- Does Cloud options have native support for data access :Cloud storage repositories are built on object repositories which allows for massive scalability, custom metadata, geographical spread and low cost. But this will involve modification of existing data or business processes or code to vendor's Application Programming Interfaces (APIs) to use object stores directly. In such case, the cost of this custom integration needs to be added to the base cloud pricing tag.
An accurate cost comparison model requires a detailed model
Cost of storing 100 TB of data internally is way higher than presented above - Some of the key factors in determining the cost of internal storage include
- Varies based on number of years of internal storage : Organizations have different policies pertaining to the service tenure of storage arrays. It can widely vary from organizations that lease storage and hence has a new storage environment almost every 3 years, while other set of organizations run storage till the time the vendor discontinues support for that model. Since we use an annualized model in our calculations for internal storage, this can have a vast impact on the cost calculations
- Acquisition cost drives a significant portion of internal storage model : Depending on the size of storage that is being planned the organizations negotiate various acquisition prices with the storage vendors. While at the same time, the type of storage (NAS/ SAN) could also have a major impact in the cost calculation numbers provided below. In this calculation, it is assumed that the price used includes the RAID protection overhead and system resource overheads.
- Number of redundancy copies : When you send 100 TB of data on the cloud, you pay for 100 TB. However, when you store 100 TB of data internally, the capacity of storage that needs to be procured is much higher. The number of redundancy copies of data can be higher or lower based on availability and other requirements but three (3) is a reasonably conservative copy count, which is what is used in the calculations below
- Low utilization of storage : The burden of how much raw capacity to provide to availability needs in a Cloud Storage model falls on the Service Provider. But in the case of internal storage, organizations struggle with capacity management and have low utilization or raw storage as a result. Typical storage environments see raw capacity utilization of 20% to 40% due to RAID protection, system resource capacity, growth buffers, resulting in much bigger allocations. Since in this calculation, the usable capacity is being used - system overhead and RAID protection is already factored in. Hence the usable capacity in this model is kept at 60%. (To address the mindset of traditional in-house storage environments :) )
- Staff costs : The staff costs for in-house storage environments are typically niche' and tend to be expensive. On the contrary, the Cloud storage environment includes the staff costs that are needed for delivering the committed SLAs.It might be a safe assumption to have one FTE per 50 TB of raw storage and assuming organizations require such FTEs to be available 16 hours in a working day thereby making it 2 FTEs per 50 TB.
- Facilities and Power costs : While it is difficult to ascertain the power and facility cost for the Storage environment alone, in this calculation it is assumed at a conservative value of 5% of the total acquisition costs for power as well as cooling requirements of the Storage environment
- Maintenance costs : As majority of the storage vendors include a 3 year support and maintenance (which is typically 15% of storage costs of the acquisition cost) and considering that the storage environment will be refreshed in 4 years, it is taken as 30% addition to acquisition cost (adjust 15% for the year beyond the 3 years support and maintenance included with the purchase) over a 4 year term for calculation
- Data migration costs : Not only do you have to buy, run, maintain, and power internal storage, but you also have to migrate it from old equipment to new every few years when it’s time for refresh. This process is onerous, risk-laden, and the bane of existence for many storage directors. When you get storage from the cloud, you get perpetual storage that doesn’t have the effort or disruption of migration associated with it. For this calculation it is assumed at INR 25000 per usable TB as migration costs.
Now onto the costs of internal storage environment
Calculating Cloud Storage is straight forward. Cloud vendor published rates are the prime driver of the annualized cost in this calculation.
- Costs for writing data to Cloud : Most cloud providers charge for moving data from your data center to the Cloud. This calculation assumes that you have all the data on day one, which would be the case if you’re migrating existing data from internal storage to the cloud. Once the data is there, you wouldn’t pay for importing it again although the model just assumes this as a component of the annual charge; fortunately it’s not huge, so it doesn’t have a big impact on the end comparison. In the case of ongoing growth, cloud storage provides an advantage over internal storage in that you only pay for it when you generate the data, rather than in advance. In fact, quite a few providers like Netmagic do not charge anything for the data coming into the cloud.
- Costs for reading data from Cloud : Since Service Providers charge when the data is being accessed by the end-users, an active data tends to carry more cost associated with it than archival data. While it is difficult to predict the amount of active data and archive data that you may store in cloud, it is assumed at 50%. Wish to highlight a point - Most employees won’t read half of their primary file data every month. If you read every piece of the 100 TB of data every month of the year, then it would cost you INR 99,00,000. This drops to INR 9,00,000 if you only read 10% of the data each month, so the swing is significant in relation to the total cost.
- Data redundancy costs : As majority of the Service Provider provides a very high-level of SLA (typically 99.99%), many Service Provider keep multiple copies of your data in order to maintain such and SLA. So it might be prudent for customers to leverage this as a redundancy mechanism and the base published price of these providers should cater to the needs of data-redundancy too
- Shoring up WAN link costs : The network is a big factor when considering moving from internal storage to cloud-based storage. Moving significant quantities of data across the WAN as opposed to within the walls of your firm could put a strain on your links to the outside world. However, there are many reasons to increase WAN bandwidth in the era of SaaS, IT consolidation, cloud, and remote workers — so a move to cloud storage might not be a speed-breaker for bandwidth upgrade request within your organization. In this calculation, I attribute INR 25,00,000 as an annual charge (this is a very high estimate used for the calculation) for additional WAN bandwidth that will be required for Cloud Storage
As might be noticed from the details provided above, a Cloud Storage can cost upto 42% lesser when compared to internal storage environment. Organizations should evaluate and consider Cloud Storage for file based storage requirements as this can is a significant cost-reduction that organizations can expect to leverage.