TWIL: July 23, 2023

This was an intense and inspiring (pun intended) week. Microsoft made a host of awesome announcements, from Bing Chat Enterprise and Microsoft 365 Copilot pricing, to the availability of Llama 2 and Whisper models on Azure. There are also news around Vector Search capabilities in Azure Cognitive Search, new Document Generative AI features, and awesome new capabilities in Azure AI Speech. But that’s not all! Azure OpenAI Service now supports function calling and OpenAI has launched customized instructions for ChatGPT. Finally, more information about RetNet (Retentive Network) and LongNet, new architectures for the future of LLMs. Have fun!

Microsoft Inspire Announcements

Microsoft Reveals Monthly Cost of Microsoft 365 Copilot Licenses
Today at the Microsoft Inspire conference, the covers came off the much-anticipated price of an Microsoft 365 Copilot add-on license. If you want a digital assistant powered by artificial intelligence and large language models, you need a suitable (“eligible”) base license before shelling out an extra $30/user per month for the privilege. That’s a big chunk of change to pay for help to compose better emails, documents, and presentations.

Furthering our AI ambitions – Announcing Bing Chat Enterprise and Microsoft 365 Copilot pricing
At Microsoft Inspire, we’re excited to unveil the next steps in our journey: First, we’re significantly expanding Bing to reach new audiences with Bing Chat Enterprise, delivering AI-powered chat for work, and rolling out today in Preview – which means that more than 160 million people already have access. Second, to help commercial customers plan, we’re sharing that Microsoft 365 Copilot will be priced at $30 per user, per month for Microsoft 365 E3, E5, Business Standard and Business Premium customers, when broadly available; we’ll share more on timing in the coming months. Third, in addition to expanding to more audiences, we continue to build new value in Bing Chat and are announcing Visual Search in Chat, a powerful new way to search, now rolling out broadly in Bing Chat.

Fueling partner growth and profitability in the era of AI
Today at Microsoft Inspire we announced the new Microsoft AI Cloud Partner Program, the next generation of our partner program empowering every partner to deliver customer value by leveraging Microsoft AI and the Microsoft Cloud. In this post, I am providing more information on the benefits for our partners as well as the other key investments announced today, including new Solutions Partner designations, expanded ISV Success benefits, and investments in the commercial marketplace.

Microsoft Inspire: Accelerating AI transformation through partnership
This year’s Microsoft Inspire continues our push to make AI a transformative tool for our customers and partners. We’re excited to share even more AI-powered solutions and show how Microsoft partners can apply these AI innovations across their organizations in a variety of ways, from expansion of AI skilling to new products and services that drive customer success. Read on for some of the top announcements at this year’s event.

Microsoft and Meta expand their AI partnership with Llama 2 on Azure and Windows
Meta and Microsoft announced support for the Llama 2 family of large language models (LLMs) on Azure and Windows. Llama 2 is designed to enable developers and organizations to build generative AI-powered tools and experiences. Meta and Microsoft share a commitment to democratizing AI and its benefits and we are excited that Meta is taking an open approach with Llama 2. We offer developers choice in the types of models they build on, supporting open and frontier models and are thrilled to be Meta’s preferred partner as they release their new version of Llama 2 to commercial customers for the first time.

OpenAI Whisper is Coming Soon to Azure OpenAI Service and Azure AI Speech
Today at Microsoft Inspire, our Azure OpenAI Service and Azure AI Speech teams announced that OpenAI Whisper will be in preview soon.  The OpenAI Whisper model has multi-lingual capabilities that offer precise and efficient transcription of human speech in 57 languages, and translation into English. It also creates transcripts with enhanced readability. The benefits of running the OpenAI Whisper model in Azure include enterprise-grade security, privacy controls, and data processing capabilities that allow for customized solutions to fit specific business needs.

Announcing Vector Search in Azure Cognitive Search Public Preview
A key capability that underpins how search can both leverage and enhance Generative AI technologies is Vector search. We are delighted to announce the public preview of Vector search in Azure Cognitive Search a fundamental capability for building applications powered by large language models.

Document Generative AI: the Power of Azure AI Document Intelligence & Azure OpenAI Service Combined
Imagine being able to chat with your documents, generate captivating content from them, and access the power of Azure OpenAI models on your data. This is what Document Generative AI, a breakthrough solution from Azure AI Document Intelligence (former aka Azure Form Recognizer) and Azure OpenAI Service, can do for you.

Creating a branded AI voice that conveys emotions and speaks multiple languages
Today at Microsoft Inspire 2023, we’re excited to announce the general availability (GA) of the new multi-style and multi-lingual custom neural voice (CNV) features inside Text to Speech, part of the Azure AI Speech capability. This new technology allows you to create a natural branded voice capable of expressing different emotions and speaking different languages.

Announcing the public preview of Real-time Diarization in Azure AI Speech
Real-time diarization enables conversations to be transcribed in real-time while simultaneously identifying speakers. Diarization refers to the ability to tell who spoke and when. It differentiates speakers in mono channel audio input based on their voice characteristics. This allows for the identification of speakers during conversations and can be useful in a variety of scenarios such as doctor-patient conversations, agent-customer interactions, and court proceedings.

Microsoft Fabric

What is data warehousing in Microsoft Fabric?
Microsoft Fabric introduces a lake centric data warehouse built on an enterprise grade distributed processing engine that enables industry leading performance at scale while eliminating the need for configuration and management. Through an easy to use SaaS experience that is tightly integrated with Power BI for easy analysis and reporting, Warehouse in Microsoft Fabric converges the world of data lakes and warehouses with a goal of greatly simplifying an organizations investment in their analytics estate.

Migrating Azure Synapse Dedicated SQL to Microsoft Fabric
If all those posts about Microsoft Fabric have made you excited, you might want to consider it as your next data platform. Since it is very new, not all features are available yet and most are still in preview. You could already adopt it, but if you want to deploy this to a production scenario, you’ll want to wait a bit longer. In the meantime, you can already start preparing for the migration. Let’s dive into the steps to migrate to Microsoft Fabric. Today: starting from Synapse Dedicated SQL Pools.

Data Architecture

How to Build a Data Platform: Data Processing
How and where you process data is one of the most important decisions you’ll make when architecting a Data Platform, as it will enable / restrict what kind of features you can bring to your platform. I would consider it the core of your Data Platform, so you generally pick your Data Processing solution and then build around that.

Azure OpenAI Service

Function calling is now available in Azure OpenAI Service
Function calling is now available in Azure OpenAI Service and gives the latest 0613 versions of gpt-35-turbo and gpt-4 the ability to produce structured JSON outputs based on functions that you describe in the request. This provides a native way for these models to formulate API calls and structure data outputs, all based on the functions you specify. It’s important to note that while the models can generate these calls, it’s up to you to execute them, ensuring you remain in control.

How to use function calling with Azure OpenAI Service
The latest versions of gpt-35-turbo and gpt-4 have been fine-tuned to work with functions and are able to both determine when and how a function should be called. If one or more functions are included in your request, the model will then determine if any of the functions should be called based on the context of the prompt. When the model determines that a function should be called, it will then respond with a JSON object including the arguments for the function.


OpenAI launches customized instructions for ChatGPT
OpenAI just launched custom instructions for ChatGPT users, so they don’t have to write the same instruction prompts to the chatbot every time they interact with it — inputs like “Write the answer under 1,000 words” or “Keep the tone of response formal.” The company said this feature lets you “share anything you’d like ChatGPT to consider in its response.” For example, a teacher can say they are teaching fourth-grade math or a developer can specify the code language they prefer when asking for suggestions. A person can also specify their family size, so ChatGPT can give responses about meals, grocery and vacation planning accordingly.

Custom instructions for ChatGPT
We’re introducing custom instructions so that you can tailor ChatGPT to better meet your needs. This feature will be available in beta starting with the Plus plan today, expanding to all users in the coming weeks. Custom instructions allow you to add preferences or requirements that you’d like ChatGPT to consider when generating its responses.

Large Language Models

Retentive Network: A Successor to Transformer for Large Language Models
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models.

Microsoft Just Showed us the Future of ChatGPT with LongNet
Imagine a chatbot that could ingest the entire Internet at once. Well, that’s what Microsoft’s newest architecture, LongNet, may be able to deliver sometime in the close future, while right now it’s already promising 1-billion-token prompts, a human’s lifetime read count, in half a second.

Semantic Kernel

Semantic Kernel and PromptFlow! (a preview) | Intro to Semantic Kernel
Curious about how the Semantic Kernel works with PromptFlow? Watch Matthew Bolanos, product manager on the Semantic Kernel team, show you how you can use PromptFlow to help visually guide your LLM execution and complex plugin chaining!

Have a brilliant week!