TWIL: October 30, 2022

This week I’m highlighting a really interesting podcast on Azure IP services, a website with technology-focused decision trees, a set of articles on Bot Framework and three articles on Lakehouse architecture, including two about data formats used in data lakes. Finally, if you’re into Machine Learning, take look at Hugging Face.


Podcasts

The Azure Podcast

Episode 442: Azure IP Services
The team catches up with Brian Lehr to take a look at the range of IP services available in Azure. Also, learn about “cold potato” routing and “hot potato” routing.

Episode 443: 5G Modern Connected Apps
The team talks to Cesar De la Torre Llorente, a Principal Architect in the Azure team, about the new style of applications that leverage the 5G wireless network and are managed by Azure. This enables new scenarios running apps on factory floors, smart buildings, stadiums etc. and opens up new microservice style architectures that are all controlled via Azure.


Architecture

Technology-focused Decision Trees – Azure Synapse and Azure Databricks
Selecting a technology for a specific scenario has become extremely hard with all the options available. This website comes to the rescue by providing a set of decision trees to help you choose between services that operate in the same space. Really worth a look!


Bot Framework

Introduction to Bot Framework Composer
Bot Framework Composer, built on the Bot Framework SDK, is an open-source IDE for developers to author, test, provision, and manage conversational experiences. It provides a powerful visual authoring canvas enabling dialogs, language-understanding models, QnAMaker knowledge bases, and language generation responses to be authored from within one canvas and crucially, enables these experiences to be extended with code for more complex tasks such as system integration. Resulting experiences can then be tested within Composer and provisioned into Azure along with any dependent resources.

Set up continuous integration and continuous delivery for a Composer bot
You can set up continuous integration and continuous delivery (CI/CD) to deploy new versions of Bot Framework Composer bots. This article describes how to do so using Composer, Azure DevOps, and git.


Data Lake

Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics
This paper argues that the data warehouse architecture as we know it today will wither in the coming years and be replaced by a new architectural pattern, the Lakehouse, which will (i) be based on open direct-access data formats, such as Apache Parquet, (ii) have first class support for machine learning and data science, and (iii) offer state-of-the-art performance. Lakehouses can help address several major challenges with data warehouses, including data staleness, reliability, total cost of ownership, data lock-in, and limited use-case support. We discuss how the industry is already moving toward Lakehouses and how this shift may affect work in data management. We also report results from a Lakehouse system using Parquet that is competitive with popular cloud data warehouses on TPC-DS.

Hudi vs Delta vs Iceberg Lakehouse Feature Comparisons
With growing popularity of the lakehouse there has been a rising interest in the analysis and comparison of the open source projects which are at the core of this data architecture: Apache Hudi, Delta Lake, and Apache Iceberg. This article will dive into greater depth to highlight technical differentiators of Apache Hudi and how it is a full fledged data lake platform steps ahead of the rest.

Delta vs Iceberg vs hudi : Reassessing Performance
After comparing delta vs iceberg in our previous blog, a lot of people asked for benchmarking their latest versions and for Apache Hudi to be thrown into the mix. So, by popular demand, we did exactly that and we performed TPC-DS on Delta 1.2.0 , Iceberg 0.13.1 and Hudi 0.11.1 using Apache Spark 3.2.0.


Azure Databricks

Azure Databricks: Enable Azure Private Link
This article explains how to use Azure Private Link to enable private connectivity between users and their Databricks workspaces, and also between clusters on the data plane and the core services on the control plane within the Databricks workspace infrastructure.


Azure Synapse Analytics

Synapse Espresso: Introduction to Dedicated SQL Pools
Welcome to the 12th video of the Synapse Espresso series! In this video, we are joined by Stijn and Liliam to learn about dedicated SQL pool and the distributed SQL database capability inside Azure Synapse Analytics.


Cool Stuff

Hugging Face
The Hugging Face Hub is a platform with over 60K models, 6K datasets, and 6K demo apps (Spaces), all open source and publicly available, in an online platform where people can easily collaborate and build ML together. The Hub works as a central place where anyone can explore, experiment, collaborate and build technology with Machine Learning. Are you ready to join the path towards open source Machine Learning?


Have a great week!

Photo by Alexander Hafemann on Unsplash