Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2
Microsoft Fabric
Jul 24, 2023 6:02 AM

Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2

by HubSite 365 about endjin

Data AnalyticsMicrosoft FabricLearning Selection

Microsoft Fabric End to End Demo - Part 2 - Planning and Architecting a Data Project. For a data platform, we need some #data! In this series we're going to be

Microsoft Fabric is being used in a project to inspect a dataset of 28 million rows in Bronze Lakehouse. This is part 2 of an end-to-end demonstration that focuses on effectively planning and coherently architecting a data project. The dataset used is provided by the UK government and is related to the Land Registry. The dataset, which is around 5GB in size, includes different types of files meant for complete and incremental processing.

  • Microsoft Fabric helps exploit UPSERT-like functionality enabled by DeltaLake, thereby avoiding reloading of all data with every new information receipt.
  • The demonstration previews the data's origin and its format, followed by determining the insights that the series aims to achieve through the analysis.
  • Finally, a sample architecture diagram is employed to visualize the data platforms involved at a high level

More About Microsoft Fabric

Microsoft Fabric provides a scalable and reliable way to manage and analyze large datasets. In this instance, it is being used for processing a 5GB dataset related to land registration. The dataset is analyzed in Microsoft Fabric using DeltaLake's UPSERT functionality, which enables efficient updating or inserting of data. The demonstration showcases the potential to gain valuable insights without having to reload all data, thus improving productivity and reducing resource usage. By featuring a sample architecture diagram, it also visually communicates the sophisticated interworking of data platforms at a macro level.

Learn about Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2

Microsoft Fabric is a powerful tool that can be used to analyze large datasets. In Part 2 of this series, we will be exploring a 28 million row dataset from the UK Land Registry. This dataset is almost 5GB in size and provides various types of files for complete or incremental processing. Through the use of DeltaLake, we can benefit from UPSERT-like functionality without having to load all the data each time we receive new information. We will begin by taking a quick look at the data, where it comes from, and the format it is in.

We will then explore the insights we are hoping to gain from the analysis. Finally, we will step through a sample architecture diagram to visualize the involved data platforms at a high-level. In this series, we will be learning about the various components of Microsoft Fabric. This includes understanding how to inspect large datasets, process data with DeltaLake, and create architecture diagrams. We will also learn about the benefits of using Fabric to analyze massive datasets, such as improved efficiency and better insights.

More links on about Microsoft Fabric: Inspecting 28 MILLION row dataset in Bronze Lakehouse - Part 2

Microsoft Fabric - Inspecting 28 Million row dataset
Jun 26, 2023 — In this video Ed Freeman continues the Microsoft Fabric End-to-End demo series by looking at the dataset we'll be using, and the problem ...
Howard van Rooijen's Post - Microsoft Fabric
Part 2 of our #MicrosoftFabric end-to-end demo covers inspecting a 5GB / 28 million row data set and how we're going to ingest it into the Bronze Lakehouse ...
Lakehouse end-to-end scenario: overview and architecture
Jun 22, 2023 — Microsoft Fabric is an all-in-one analytics solution for ... It uses the medallion architecture where the bronze layer has the raw data, ...
Azure Weekly Newsletter - a free weekly news round up of all ...
Finally, Ed Freeman continues his Microsoft Fabric end-to-end demo series with Inspecting 28 Million row dataset. Read the full issue. Azure Weekly Archive. If ...
endjin
Microsoft Fabric End to End Demo - Part 3 - Ingesting 5GB into a Bronze Lakehouse using Data Factory. In this video we'll see how we can quickly ingest ~5GB ...
Fabric End to End Demo Part 2: Inspecting 28 Million row ...
Jun 27, 2023 — Everything you need to know about Microsoft Fabric: news, resources, ... Fabric End to End Demo Part 2: Inspecting 28 Million row dataset.
Microsoft Fabric, Data Warehouse first impressions
Jun 23, 2023 — This meant I had a few tables ranging from a few rows (Region) to the big table (LineItems). The first thing I did was create a pipeline to get ...
Data Warehousing and Data Science
The Bronze layer contains raw data, the silver layer contains data which has ... For example: Microsoft Word, SQL Server, SQL Data Warehouse and Power BI.

Keywords

Microsoft Fabric, Bronze Lakehouse, End to End Demo, Data Platform, Delta Lake, Insight Discovery