Tooploox CS and AI News 36

  • Scope:
  • Artificial Intelligence
  • Generative AI
Tooploox CS and AI News 36
Date: December 12, 2023 Author: Konrad Budek 6 min read

This month’s edition is dominated by LLM-related news, with Large Language Models using steganography, Amazon’s grand entrance into the race, and OpenAI enabling users to build their own LLMs. 

Also, there is new, exciting information from Samsung and Meta, with both companies releasing AI models either to deliver movies from prompts or provide translations on the go. 

11.06.2023 OpenAI launches GPT platform

The GPT platform is a no-code environment that allows users to build multiple GPT-based agents to handle different tasks. This feature is a follow-up to the custom instructions that were launched in July of this year. 

One of the sample uses enumerated by The Verge editors was a creative writing coach GPT system, which provides users with a critique of an uploaded text. 

Depending on the system that is to be designed, the agent can gain access to DALL-E or a code interpreter. Also, an app built this way can gain access to external tools, for example, an email service, Canva or Zapier. Systems built in this way can later be published, and the creator can specify how one wishes the model to interact with its users. 

More about the solution can be found in TheVerge

11.08.2023 Amazon enters the LLM race with Olympus

According to Reuters’ information, Amazon is building a team to launch its own Large Language Model (LLM), codenamed Olympus. The system is said to have two-trillion parameters, compared to the one-trillion parameters of the GPT-4 model delivered by OpenAI. 

Olympus is not Amazon’s first venture into the LLM field. The company previously trained the Titan model and partnered with Anthropic, the company behind Claude, one of OpenAI and ChatGPT’s key competitors. 

More on the topic can be found in Reuters news

11.08.2023 Google adds generative AI to ads

The company provides users with a new feature that will automatically generate a new image or enrich an existing one. This feature supports the new Google performance ads, which are aimed to run in multiple channels and test multiple versions to pick the one that performs best. 

Yet the challenge is to provide the system with a high enough number of creations and versions to allow the algorithm to experiment and gather data. The system can, for example, change or replace a background color or an entire setting. This means the footwear manufacturer can move a photographed object from the mountains to a forest to muddy swamps with just a few clicks. 

More about the new feature can be found on The Verge

11.09.2023 Research – LLMs can obscure their reasoning using steganography

According to researchers from Redwood Research, Large Language Models can master “encoded reasoning.” This technique subtly embeds the reasoning steps into a generated text while keeping the human reading the text utterly oblivious to it. This ability can be leveraged to boost the system’s performance at little to no cost. 

On the other hand, showing the chain of reasoning is currently one of the best ways to make a model work in a transparent way. Yet the technique is challenging – there are few examples of chains of reasoning in the dataset the systems are trained on. Also, LLMs do not necessarily work in a human-comparable way, making the source of their reasoning prone to mistakes or even pure nonsense – simply consider hallucinating systems. 

Yet the research paper reveals that there is a way to overcome this challenge – asking the model to paraphrase its reasoning. More on the matter can be found in this research paper on Arxiv

11.09.2023 Adobe Research transforms 2d images into 3d models 

A team of researchers from Adobe and Australian National University have delivered a neural network that can transform a 2D image into a high-quality 3d model in no more than 5 seconds. 

This tool can be used in the gaming industry, support industrial design, and deliver new augmented reality and virtual reality applications. 

More about the research can be found in this Arxiv paper.

11.09.2023 Samsung joins the AI arms race with real-time translations

Samsung has announced that their new line of Galaxy phones, which are to be launched in the first half of 2024, will be available with a new AI model enabling users to have their speech translated on the go. 

The feature will be a part of the mobile device, so there will be no need to have a secure internet connection to enjoy the translating feature. 

More about the new feature and model itself can be found on Samsung’s blog

11.16.2023 Talking faces created from an audio file and photo of a person

A team of researchers from Nanyang Technological University have created a system that can deliver a realistic human face from a photograph. The moves and facial expressions are consistent and realistic and also stick to the text. 

According to the team, one of the key challenges was keeping the facial expressions consistent and fitting to the text – depending on the context, people may either smile or frown when saying the same words. 

More about the research can be found on ScienceDirect. However, the Nanyang Technological University team is one of many that managed to achieve this effect. The Tooploox research team has also delivered a comparable effect, producing believable talking faces from only one headshot and voice sample, overcoming the challenges of facial expressions and lip sync. The research will be presented during the upcoming WACV conference on January 2024. Examples of how the Tooploox-delivered paper works can be found in the video below and in the research paper

11.16.2023 Meta introduces the Emu text-to-video model

Meta, the company behind Facebook and Instagram, has unveiled two new models  – Emu Edit, aimed towards image editing, and Emu Video, a text-to-video model. The new models will be used to enhance the performance and user experience of Meta’s other services. 

According to the company’s press release, the goal is not only to produce believable and good-looking images, but also to have strict control over the pixels modified. For example, the model can be tasked to modify the background behind a human or change the model into a raccoon without modifying their clothes. 

A live demo of the system can be found on the EmuVideo webpage.

11.22.2023 Anthropic scales up Claude 2.1 to 200K tokens – nearly two times more than GPT-4

Claude, an LLM model from Anthropic, is one of the most interesting competitors for ChatGPT, claiming to be a more secure and business-friendly service. To make the model even more powerful than before, the creators extended it to a 200K tokens-long context window. As such, the model is claimed to be able to work on texts counting up to 150 thousand words. This counts as a 500-page long novel. 

This long context window enables the system to work on long content pieces, for example, providing the user with a comprehensive and reliable summary of a long report. Or with basically any other task, the tool is able to keep consistency and context, as well as simply work better. 

Read more on Anthropic’s blog

11.22.2023 Google Bard can now watch YouTube videos

The Google-operated model, a competitor to ChatGPT, was recently enriched with the ability to watch videos on YouTube with the aim of extracting knowledge from them, preparing summaries, or using the content in any other way. This makes the system more deeply rooted in Google’s ecosystem and provides more options for users to get new information. 

A sample use case may include extracting the recipe from a video about cooking or asking Bard about the best option to pick from a comparison video. 

More can be found on The Verge

11.23.2023 Algorithm identifies nearly 200 new CRISPR systems in bacteria

Researchers from the McGovern Institute for Brain Research at MIT, the Broad Institute of MIT and Harvard, and the National Center for Biotechnology Information (NCBI) at the National Institutes of Health have produced a new search algorithm that has found 188 new CRISPR systems in bacterial genomes. 

CRISPR stands for Clustered Regularly Interspaced Short Palindromic Repeats. It is a component of the bacterial immune system that forms the basis of genome editing technology. Basically, these tools allow researchers to target specific chains of genes and edit them, for example, to deliver a cure or vaccine. 

Finding new systems will provide the research teams with new possibilities to target particular genes and, by doing so, significantly increase the number of potential use cases. 

More can be found on the MIT website

11.24.2023 Swiss researchers have developed a new technique that significantly boosts the performance of LLMs

A team of researchers from ETH Zurich has developed a new technique that boosts the speed of work of LLM-powering Neural Networks. One of the key elements of the system is the layers of a neural network, which need to be active over the full time of processing input and producing output. 

The new approach uses “fast feedforward” layers that leverage a mathematical operation known as conditional matrix multiplication. Using this tool can significantly reduce the computational load and, thus, the time and cost of performing a given task. According to the research team, the improvements can reach up to 300-times greater speeds. 

More about this new approach can be found on VentureBeat

11.28.2023 Amazon launches an AI Assistant – Amazon Q

Amazon Q is a new offer among Amazon’s Web Services that aims to automate multiple tasks, mostly using the client’s own data. Also, Amazon Q will be added to other tools available on the AWS platform, including Connect, CodeCatalyst, and QuickSight. 

The technology aims to directly compete with ChatGPT, yet will be available exclusively in the AWS ecosystem. 

More about the tool can be found on TheVerge.

11.27.2023 UK security guidelines signed by 18 countries 

The UK government has published a set of guidelines to ensure that AI-powered solutions are delivered safely and securely to protect both the systems and their users against cyber attacks. The guidelines cover design, development, operations, and maintenance. The guidelines were published during the launch event in London among over 100 industry, government, and international partners, including Microsoft, the Alan Turing Institute, and cyber agencies from the UK, Canada, and Germany, among others. 

More about the guidelines can be found in AI News