Azure DP-900 Short Notes: Explore Core Data Concepts
Identify the need for data solutions
👉Data is a collection of facts such as numbers, descriptions, and observations used in making of decision.
👉Three types of data
- Structured data
- Tabular data that is represented by rows and columns in a database.
- Tables in this form are called relational databases.
- Semi-structured data
- Information that doesn't reside in a relational database but still has some structure to it.
- Ex: JSON, key-value stores and graph databases
- Unstructured data
- Data with no proper structure.
- Ex: Audio, Video , Binary dat files
👉Based on the type of data, there are multiple ways to store and access data in Azure cloud.
👉Stored data needs to be processed. There are two types of data processing solutions.
- Transaction processing systems
- primary function of business computing.
- work performed by transactional systems is often referred to as Online Transactional Processing (OLTP).
- Data is divided into small pieces for faster processing.
- For example in a database tables are split out into separate groups of columns and this is called normalization.
- Analytical systems
- Support business users who need to query data and gain a big picture view.
- Capturing raw data and generate insights to make future business decisions.
- Common tasks of a analysis system
- Data Ingestion - Capturing the raw data.
- Data Processing - Converting captured data into a common format to be processed.
- Data Querying - Querying data to analyze.
- Data Visualization - Generating charts such as bar charts, line charts out of queried data in order.
Identify types of data and data storage
👉Relational Data and Non-relational Data have different characteristics.
- Most well-understood model for holding data.
- Data normalization helps to reduce any data redundancy within the database.
- Store data in a format that more closely matches the original structure.
- Data duplication present which increases the storage required.
- Due to data duplication, any data modification may cause to update data present at multiple locations.
👉Two different types of workloads.
- Transaction is a sequence of operations that are atomic.
- Mostly commonly use relational databases.
- A transactional database must adhere to the ACID.
- Atomicity = A transaction is treated as a single unit, which either succeeds completely, or fails.
- Consistency = A transaction can only take the data in the database from one valid state to another.
- Isolation = Concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
- Durability = Once a transaction has been committed, it will remain committed even if there's a system failure.
- Read-only systems that store vast volumes of historical data or business metrics.
- Used for data analysis and decision making.
Describe the difference between batch and streaming data
👉Data processing is converting data into meaningful information.
👉There are two types of data processing.
- Batch Processing
- New data elements are collected into a group and the whole group is then processed at a future time as a batch.
- Data Scope = Process all the data in the dataset.
- Data Size = large datasets.
- Performance = latency is a few hours.
- Analysis = performing complex analytics.
- Streaming and real-time data
- In stream processing, each new piece of data is processed when it arrives.
- Beneficial for dynamic data.
- Ideal for time-critical operations that require an instant real-time response.
- Data Scope = Access to the most recent data received.
- Data Size = Individual records or micro batches.
- Performance = latency in the order of seconds or milliseconds.
- Analysis = simple response functions.