What is a Cloud Data Warehouse?
A cloud data warehouse is a database service hosted online by a public cloud company. It has the functionality of an on-premises database but is managed by a third party, can be accessed remotely and its memory and compute power can be shrunk or grown instantly.
Traditional Vs. Cloud Data Warehouse Differences
A traditional data warehouse is an architecture for organising, storing and accessing ordered data, hosted in a data centre on premises owned by the organisation whose data is stored within it. It is of a finite size and power and is owned by that organisation.
A cloud data warehouse is a flexible volume of storage and compute power, which is part of a much bigger public cloud data centre and is accessed and managed online. Storage and compute power is merely rented. Its physical location is largely irrelevant apart from for countries and/or industries whose regulations dictate their data must be stored in the same country.
Benefits of Cloud Data Warehouse
The benefits of a Cloud Data Warehouse can be summarised in five main points:
Rather than having only physical access to databases in data centres, cloud data warehouses can be accessed remotely from anywhere. As well as being convenient for staff who live near the data centre, who can now troubleshoot from home or anywhere out of hours if needed, this access means companies can hire staff based anywhere, which opens up talent pools that were previously unavailable. Cloud data warehousing is self-service and so its provision does not depend on the availability of specialist staff.
Data centres are expensive to buy and maintain. Property to store them in needs to be properly cooled, insured and expertly staffed, and the databases themselves come at a huge cost. Cloud data warehousing allows the same service to be enjoyed, but you only pay for the computing and storage power you need, when you need it. Now with elastic cloud services such as Snowflake, compute and storage can be bought separately, in different amounts. So you now really only have to pay for what you are using, and you can instantly close or downsize capabilities you no not need.
Cloud service providers compete to offer use of the most performant hardware for a fraction of the coast that would be incurred to reproduce such power on-premises.
Upgrades are performed automatically, so you always have the latest capabilities and do not experience downtime in upgrading to the latest ‘version’. Some on premises databases offer faster performance, but not at the cost and availability of the ‘Infrastructure-as-a-service’ that Cloud providers offer.
Opening a Cloud data warehouse is as simple as opening an account with a provider such as Microsoft Azure, AWS Redshift, Google BigQuery and Snowflake. The account can be grown and shrunk, or even closed instantly. Users are aware of the costs involved before they change the amount of compute or storage they rent. This scalability has led to the coining of the phrase ‘Elastic Cloud’.
Hosting data in a Cloud data warehouse means you can switch providers if and when it suits changes in business strategy. Staying database agnostic means you have the agility to upsize, downsize or switch completely. Metadata-driven automation software like WhereScape allows you to lift and shift entire data infrastructures on and off Cloud data warehouse if desired, and allows different teams within the same company to work with the database and hybrid cloud structure that best suits their needs, as seen at Legal & General.
Choosing a Cloud Data Warehouse Solution
A cost analysis is vital in estimating how much money a Cloud Data Warehouse would save the business. Different Cloud providers have different pricing structures that need bearing in mind. More established providers such as Amazon and Microsoft rent nodes and clusters, so your company uses a defined section of the server. This makes pricing predictable and constant, but sometimes maintenance to your particular node is needed.
Snowflake and Google offer a ‘serverless’ system, which means the cluster locations and numbers are not defined and so are irrelevant. Instead the customer is charged for exactly the amount of compute or processing power it consumes. However, in bigger companies it is often difficult to predict the amount of users and size of a process before it occurs. It is possible for queries to be much bigger was assumed and so cost much more than was
Each cloud provider has its own suite of supporting tools for functions such as data management, visualisation and predictive analytics, so these needs should be factored when deciding on which provider to use.