TWIL: September 3, 2023
This week my focus has been mostly around Power BI and Microsoft Fabric, but I also had to delve a bit into Azure Cosmos DB geospatial support and performance tuning, as well as Azure Cognitive Search guidance on large document chunking. The last episode of the Building the Future AI Portugal Podcast was published, where I had the honor of being the invited guest and have an awesome conversation with the three hosts. Finally, three interesting episodes from the Azure Podcast and a bunch of news from the AI world. Enjoy!
Podcasts
Building the Future AI Portugal Podcast
Portuguese podcast born out of the annual Building the Future event where technologies, ideas and initiatives that transform our world are discussed, with a particular focus on Artificial Intelligence. The episodes of this podcast are spoken in portuguese, as are their descriptions.
Arquiteturas avançadas de Large Language Models (in Portuguese)
Neste episódio, os três anfitriões conversam com André Vala, um arquiteto de soluções cloud em Data and AI da Microsoft Portugal, sobre as Arquiteturas Avançadas de Modelos de Linguagem. Eles exploram como esses modelos podem gerar e compreender texto de forma natural e o impacto que têm em diversas áreas e setores. Eles também discutem as implicações éticas e sociais dessas tecnologias e como elas estão a moldar o futuro da inteligência artificial.
The Azure Podcast
Episode 466: Open AI
Cloud Solution Architect, Dr. Linda Sheard, joins Russell and Sujit for a primer on Azure Open AI. Linda breaks down the service and provides context on how it can be used for the typical scenario where a customer wants to reason over their own data.
Episode 469: Microsoft Fabric
Azure and Data technical specialist Ian Pike, joins us on the Podcast to give us a primer on Fabric and what is means for customers that use various data-related services on Azure.
Episode 471: AI Trends in Financial Services
We talk to Steve Selleny, a Global Partner Development Manager at Microsoft, about the AI and ML trends he has noticed in Financial Services ISVs. Steve has been working in the technology sector for over 21 years in various roles including software development, systems architecture, consulting, sales, and partner management. For the past two years, he has been working with Financial Services Independent Software Vendors (ISVs) to build and market solutions on the Microsoft Cloud that drive digital transformation for customers.
Azure Cognitive Search
Chunking large documents for vector search solutions in Cognitive Search
This article describes several approaches for chunking large documents so that you can generate embeddings for vector search. Chunking is only required if source documents are too large for the maximum input size imposed by models.
Azure Cosmos DB
Geospatial and GeoJSON location data in Azure Cosmos DB for NoSQL
Azure Cosmos DB for NoSQL has built-in geospatial functionality to represent geometric shapes or actual locations/polygons on a map. Geospatial data often involves proximity queries. For example, the question “find all retail locations near my current location” is answered using a proximity query over multiple geospatial data object.
Query performance tips for Azure Cosmos DB SDKs
Azure Cosmos DB is a fast, flexible distributed database that scales seamlessly with guaranteed latency and throughput levels. You don’t have to make major architecture changes or write complex code to scale your database with Azure Cosmos DB. Scaling up and down is as easy as making a single API call.
Tuning query performance with Azure Cosmos DB
Azure Cosmos DB provides a API for NoSQL for querying data, without requiring schema or secondary indexes. This article provides the following information for developers: high-level details on how Azure Cosmos DB’s SQL query execution works, details on query request and response headers, and client SDK options, tips and best practices for query performance and examples of how to utilize SQL execution statistics to debug query performance.
Indexing policies in Azure Cosmos DB
In Azure Cosmos DB, every container has an indexing policy that dictates how the container’s items should be indexed. The default indexing policy for newly created containers indexes every property of every item and enforces range indexes for any string or number. This allows you to get good query performance without having to think about indexing and index management upfront. In some situations, you may want to override this automatic behavior to better suit your requirements. You can customize a container’s indexing policy by setting its indexing mode, and include or exclude property paths.
Index and query GeoJSON location data in Azure Cosmos DB for NoSQL
Geospatial data in Azure Cosmos DB for NoSQL allows you to store location information and perform common queries, including but not limited to: finding if a location is within a defined area, measuring the distance between two locations, determining if a path intersects with a location or area. This guide walks through the process of creating geospatial data, indexing the data, and then querying the data in a container.
Microsoft Fabric
Microsoft Fabric July 2023 Update
Welcome to the July 2023 update. We have features in Core, Synapse, Data Factory, Data Activator, Community, and Power BI.
Tutorial: Fabric for Power BI users
In this tutorial, you learn how to use Dataflows Gen2 and Pipelines to ingest data into a Lakehouse and create a dimensional model. You also learn how to generate a beautiful report automatically to display the latest sales figures from start to finish in just 45 minutes.
Lakehouse end-to-end scenario: overview and architecture
This tutorial walks you through an end-to-end scenario from data acquisition to data consumption. It helps you build a basic understanding of Fabric, including the different experiences and how they integrate, as well as the professional and citizen developer experiences that come with working on this platform. This tutorial isn’t intended to be a reference architecture, an exhaustive list of features and functionality, or a recommendation of specific best practices.
Real-Time Analytics Tutorial- Introduction
This tutorial is based on sample streaming data called New York Yellow Taxi trip data. The dataset contains trip records of New York’s yellow taxis, with fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts. This data doesn’t contain latitude and longitude data, which will be loaded from a blob container and joined together with the streaming data in a later step. You’ll use the streaming and query capabilities of Real-Time Analytics to answer key questions about the trip statistics, taxi demand in the boroughs of New York and related insights, and build Power BI reports.
Data warehouse tutorial introduction
While many concepts in Microsoft Fabric may be familiar to data and analytics professionals, it can be challenging to apply those concepts in a new environment. This tutorial has been designed to walk step-by-step through an end-to-end scenario from data acquisition to data consumption to build a basic understanding of the Microsoft Fabric user experience, the various experiences and their integration points, and the Microsoft Fabric professional and citizen developer experiences.
Power BI
Real-time streaming in Power BI
Power BI with real-time streaming helps you stream data and update dashboards in real time. Any visual or dashboard created in Power BI can display and update real-time data and visuals. The devices and sources of streaming data can be factory sensors, social media sources, service usage metrics, or many other time-sensitive data collectors or transmitters.
Connect Power BI to Azure Databricks
You can connect Power BI Desktop to your Azure Databricks clusters and Databricks SQL warehouses. You can also publish Power BI reports to the Power BI service and enable users to access the underlying Azure Databricks data using single sign-on (SSO), passing along the same Azure Active Directory credentials they use to access the report.
Generative AI
Natural language to SQL in low-code platforms
One of the developers’ biggest challenges in low-code platforms is retrieving data from a database using SQL queries. Here, we propose a pipeline allowing developers to write natural language (NL) to retrieve data. In this study, we collect, label, and validate data covering the SQL queries most often performed by OutSystems users. We use that data to train a NL model that generates SQL. Alongside this, we describe the entire pipeline, which comprises a feedback loop that allows us to quickly collect production data and use it to retrain our SQL generation model. Using crowd-sourcing, we collect 26k NL and SQL pairs and obtain an additional 1k pairs from production data. Finally, we develop a UI that allows developers to input a NL query in a prompt and receive a user-friendly representation of the resulting SQL query. We use A/B testing to compare four different models in production and observe a 240% improvement in terms of adoption of the feature, 220% in terms of engagement rate, and a 90% decrease in failure rate when compared against the first model that we put into production, showcasing the effectiveness of our pipeline in continuously improving our feature.
Google DeepMind has launched a watermarking tool for AI-generated images
Google DeepMind has launched a new watermarking tool that labels whether images have been generated with AI. The tool, called SynthID, will initially be available only to users of Google’s AI image generator Imagen, which is hosted on Google Cloud’s machine learning platform Vertex. Users will be able to generate images using Imagen and then choose whether to add a watermark or not. The hope is that it could help people tell when AI-generated content is being passed off as real, or help protect copyright.
Teaching with AI
We’re sharing a few stories of how educators are using ChatGPT to accelerate student learning and some prompts to help educators get started with the tool. In addition to the examples below, our new FAQ contains additional resources from leading education organizations on how to teach with and about AI, examples of new AI-powered education tools, and answers to frequently asked questions from educators about things like how ChatGPT works, its limitations, the efficacy of AI detectors, and bias.
What’s the future of generative AI? An early view in 15 charts
Since the release of ChatGPT in November 2022, it’s been all over the headlines, and businesses are racing to capture its value. Within the technology’s first few months, McKinsey research found that generative AI (gen AI) features stand to add up to $4.4 trillion to the global economy—annually.
Cool Stuff
Animated Drawings
Children’s drawings have a wonderful inventiveness, energy, and variety. We focus on the consequence of all that variety in their drawings of human figures as we develop an algorithm to bring them to life through automatic animation. The “Animated Drawings” Demo allows parents and guardians to convert two-dimensional children’s’ drawings into fun animations.
Have a brilliant week!