Let us talk about the DATA ECOSYSTEM!

Let us talk about the DATA ECOSYSTEM!

It all starts with how well defined your data is, ie.

Structured

Data that follows a rigid format, and can be organized neatly into rows and columns. eg. Spreadsheets, databases

Semi-structured

It's a mix of data that has consistent characteristics and data that doesn't conform to a rigid structure. eg. emails(content)

Unstructured

This is the complex, mostly qualitative information that is impossible to reduce into rows and columns. eg. photos, videos, text files, PDFs, and social media content


Data Repositories

So now we will talk about the Data repositories, the type, format, and sources of data that influence the type of repository used to collect, store, clean, analyze, and mine the data for analysis! Overall, data repositories help to isolate data and make reporting and analytics more efficient and credible while also serving as a data archive.

Every industry needs to process data. But the kind of data, its scope, and its use will illustrate if a data mart, data warehouse, database, or data lake will be the best solution for your enterprise.


Data Warehouse

its the core data analysis tool, in an organization. this retrieves data and information from various sources within the organization, then stores and manages them. All the business decisions using data reports and analysis totally depend upon the data from the Data Warehouse! It stores large quantities of historical data and allows complex data retrieval.

Data Mart

Its the summarized data derived from the data warehouse, offering subject-oriented data benefiting a specific area with an organization(dedicated to one business function, one subset)

##Data Lake It's the organization's raw and processed data at both large and small scales. Compared to a hierarchical data warehouse which stores data in files or folders, a data lake uses a different approach; it uses a flat architecture to store the data.

Big Data Stores

This includes distributed computational and storage infrastructure to store, scale, and process very large data sets