Science-fiction visions have boldly entered reality. OpenAI had to turn off the ChatGPT voice, which resembled Scarlett Johansson a little bit too much. And it is only the tip of the iceberg regarding AI-related news from May.
The month also brought a new, state-of-the-art multimodal model by Meta and a completely new model, codenamed Falcon 2, that allows the United Arab Emirates to enter the AI scene.
05.06.2024 OpenAI partners with StackOverflow
One of the leading tech content platforms for software developers forged a partnership with OpenAI. The AI company will get access to a gargantuan code library with comments and context. The details of the deal remain undisclosed, yet according to the official statement, the goal is to improve the coding performance of ChatGPT and support the development of OverflowAI.
More details can be found in the official statement.
05.07.2025 People partner with OpenAI
Of course, not all people, but People, Better Homes & Gardens, Investopedia, and Food & Wine, among others. Namely, Dotdash Meredith is the publishing giant behind all the titles listed. OpenAI will collaborate with the publisher to enrich their offer with AI features. In exchange, the publisher will license its content to train their AI models. For example, their list of the sexiest men alive.
More details can be found in TheVerge.
05.13.2024 United Arab Emirates release Falcon 2 AI models
A UAE government-backed research institute has released two AI models under the codename Falcon 2. Both models are 11 Billion parameters large, with the first being text-to-text and the second image-to-text. The former can interact with the user in a chat form, produce texts from prompts, and conduct a conversation. The latter can interpret and process images to deliver a written response, for example, a short description of an image or entity identification. Both models are free to use in commercial applications.
More can be found in Reuters.
05.13.2024 OpenAI releases GPT-4o
The “o” in GPT-4o stands for “omni.” The new model brings full multimodality to the experience, enabling users to generate images or upload ones for the model to analyze. The key is in the fact that all types of data are processed by the same neural network, not analyzed separately. The new model is now available for paid users and will be rolled out for all users, but with limits of usage.
More can be found in the official announcement.
05.14.2024 Google challenges OpenAI’s Sora with Veo
OpenAI’s Sora gained renown for its ability to deliver consistent and lifelike short videos of exceptional quality. The tool remained unchanged since then, with no model getting close to its capabilities. The situation changed during Google’s recent annual I/O developer conference, where Google unveiled a new model named Veo. The model was developed by Google Deepmind.
More about the model can be found on Deepmind’s page.
05.14.2024 Google rolls out Project Astra
The tech giant is going to leverage generative AI capabilities in its key product – the search engine. The engine will show its users AI-generated summary responses based on web content in the US. The summary will be called “AI Overview.”
More can be found in The Guardian.
05.14.2024 Google unveils LearnML to help with homework
According to the company’s statement, the model is designed to help students with their homework and is based on another of Google’s models – Gemini. This model will be seen all throughout Google products, including Android and YouTube, so one who is looking for some homework-related information will be able to get the information one needs.
More about the model can be found in The Verge.
05.14.2024 Google introduces Gemini Nano into Chrome desktop client
Google plans to make the Gemini model natively accessible within its Chrome browser. Using the browser features combined with built-in LLM, developers can build more next-gen, AI-powered applications. The LLM may be used for applications like translating, transcribing, or captioning images.
More can be found in VentureBeat.
05.15.2024 OpenAI’s Ilya Sutskever leaves
Ilya Sutskever, one of OpenAI’s founders and Chief Science Officer, has officially left the company. The role of Chief Science Officer was handed over to Jakub Pachocki, the company’s director of research. Ilya Sutskever didn’t disclose what project he was transferring to.
More can be found in The Verge.
05.16.2025 Garnett adds AI-generated summaries to its stories
The publisher behind USA Today will use generative AI to create bullet points that will appear just below the headline in the company’s stories. The AI summary aims to “enhance the reporting process” and “elevate the audience experience.”
More information can be found in The Verge.
05.20.2024 ChatGPT pauses the Scarlett Johansson-like voice
OpenAI added multiple voices to its apps. One of them, named Sky, highly resembled Scarlett Johansson’s own voice, as heard in the movie “Her.” The movie plot revolves around a protagonist who slowly falls in love with his AI-powered assistant, voiced by Johansson. What makes things even more complicated, Johannson issued a statement where she revealed that OpenAI’s Sam Altman reached out to her with a request to voice ChatGPT, but she refused.
More about the story can be found in VentureBeat.
05.21.2024 Meta introduces Chameleon, a new multimodal model
Chameleon is a new family of models designed to be natively multi-modal, contrary to the popular approach of delivering components with different modalities. According to the benchmarks, the model performs state-of-the-art tasks usually assigned to multimodal models, including image captioning and visual question answering.
More about the model can be found in this Arxiv paper.
05.22.2024 OpenAI partners with News Corp., Wall Street Journal publisher
Going further along the path of forging cooperation with various media vendors, OpenAI partners with News Corp., a media company owned by Rupert Murdoch. The partnership enables the AI company to access the content of News Corp’s archives of major media publications, including the Wall Street Journal, MarketWatch, The Times, The Sunday Times, The Daily Telegraph, and more.
More about the partnership can be found in this blog post.
05.25.2024 Google needs to remove weird AI responses manually
Shortly after Google introduced Project Astra, the company witnessed a flood of weird or surprising responses. Users shared multiple examples of odd answers, from suggesting adding glue to pizza to asking users to eat rocks.
The situation is even weirder because the company has been testing AI-generated responses since May 2023, when the beta version of the service was first launched.
More can be found in The Verge.
05.26.2024 YouTube lets the user search for a song by humming it
One of the new features of the Android YouTube music app is a Gemini-based feature that identifies a song from a melody hummed by the user. To use the feature, the user must click a wave button near the microphone, say, “Hey, what’s that song that goes…” and start humming.
More about the new feature can be found in The Verge.
05.21.2024 Microsoft partners with Khan Academy
Microsoft is collaborating with Khan Academy to demonstrate AI’s transformative impact on education. This partnership centers around moving Khan Academy’s Khanmigo AI agent into Microsoft’s Azure OpenAI Service, providing free access to the tool for all K-12 educators in the U.S. Furthermore, Microsoft will use its Phi-3 model to enhance Khan Academy’s math tutoring service and work together to create high-quality educational content. The partnership also aims to increase the availability of courses within Microsoft Copilot and Microsoft Teams for Education.
More can be found in VentureBeat.
05.21.2024 Microsoft releases Phi-3
Microsoft has made its Phi-3 lightweight model family accessible to developers, including versions such as Phi-3-medium, Phi-3-small, and Phi-3-mini, with Phi-3-mini being integrated into Azure AI. Additionally, the company is introducing Phi-3-vision, a multimodal variant of the small model with 4.2 billion parameters. Created by Microsoft Research, Phi-3 is a 3 billion parameter language model that delivers strong reasoning abilities at a lower cost than larger models. This marks the fourth generation of Microsoft’s compact language models, following the predecessors, Phi-1, Phi-1.5, and Phi-2.
More information can be found in VentureBeat.