Centralise Data Storage Using

Azure Data Lake

Businesses are producing an ever-increasing volume of data across a multitude of systems and devices.

A data lake store provides a flexible, centralised, cloud-based location for bringing together low-cost storage.

As existing data volumes grow and the list of potential data sources expand, data lakes scale quickly with the business need to accommodate all type of structured and unstructured data. Having readily available data in the cloud eliminates the first hurdle teams have when faced with a data project – access to the data!

Reasons why we build data lakes

It can be a struggle working out how to bring together disparate data sprawled across a company’s wide-reaching digital footprint. A data lake takes that concern away with centralised data, stored in its native format.

Data lakes provide great flexibility for developers, analysts, data analytics and data scientists to build data-driven projects from a consistent starting foundation. By pushing the schema consideration onto the consuming service, we can build and maintain data lakes with relatively low effort and maintain an agile environment to accommodate future growth and change.

Typical Data Lake Questions

We are often asked the same questions at the start of a Data Lake project, here are some of the common issues concerning decision makers:

Q. Should we only be loading the data I need?

A. The beauty of a data lake is that we load all data, and let the consuming service pull what's necessary. This future proof the data lake.

Q. What is the difference between hierarchical and flat namespace storage?

A. Flat namespace storage, like Azure Blob Storage, will organise data in a flat structure. Hierarchical namespace, like Azure Gen 2 Data Lake will organise data into structured folders and names giving an optimised analytical ability on unstructured data, reducing total cost of ownership.

Q. Where do I start with a data lake?

A. Start with data that has downstream usage and grow out with related data sets of data.

Q. Are the types of data or frequency a limiting factor to a data lake?

A. No, data can be a mixture of types, often storing a mixture of structured, semi-structured and unstructured data. Data can be loaded in real-time (such as streaming from IoT), near real-time or batch loaded.

Q. Is data lake storage expensive?

A. Microsoft Azure Data Lake Gen 2 is priced to store data at mass and therefore a cost effect way to store big data.

How we Create Data Lakes

Data lakes will typically bring together a combination of streaming data, logs, social media, on-premise data, cloud-based data, partner data and competitor data into a central low-cost cloud storage environment. By bringing together all data, in raw format, we ensure that we do not limit the data we consume to just today's needs (by modelling) but cater for all future unknown data project needs.

Data ingestion will be automated to ensure new data is populated in a consistent real-time, near real-time and batch processing on Azure Data Lake Storage Gen 2, providing a highly scalable, highly durable repository. Cataloguing data at the time of insert and maintaining documenting data lineage helps to eliminate ensure corporate knowledge is shared and a data swamp is avoided.

If you have data lake questions we'd love to help you. You can reach us on +44 (0)118 3575588, [email protected] or fill out the form below, and we'll be in touch shortly.

Use this space to summarize your privacy and cookie use policy. Learn More.