Tooploox at WACV 2024: Where Talking Heads Meet Lifelong Learning

  • Scope:
  • Artificial Intelligence
  • Generative AI
Tooploox at WACV 2024
Date: January 18, 2024 Author: Maciek Zieba 4 min read

Tooploox Research, the team responsible for the R&D endeavors and pushing the limits of our understanding of Artificial Intelligence, has published four affiliated research papers for the recent Winter Conference on Applications of Computer Vision. 

The conference took place in Waikonda (Hawaii) and gathered researchers from around the world, sharing their knowledge and the latest discoveries in computer vision. 

What is WACV?

The Winter Conference on Applications of Computer Vision is one of the biggest events in the international computer vision community. The conference has been held since 1992 with nearly no gap years. The event started with workshops on the practical applications of computers in vision-related fields. With recent technological advancements, the event has become one of the most significant in the field of CV and CV-related image processing. Also, the winter event is always held in sunshine-filled locations like Florida or, as it was this year, Hawaii. 

The Tooploox research team is a regular contributor to the event, promoting new applications of the technology and sharing cutting-edge expertise. This year, the team delivered four research papers. Tooploox-affiliated researchers performed all the research described below with teams from various Higher Education Facilities.

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

Generating talking heads from images is an iconic machine learning task, putting to test multiple capabilities of the network. The produced “talking head” needs to behave in a natural way (the hair needs to follow the rules of gravity, the opened mouth needs to contain a natural number of teeth, the eyes need to follow expected patterns, etc.). Also, facial expressions need to fit the text spoken, delivering angry, sad, or happy feelings when speaking. 

The challenge becomes even greater when facing the Uncanny Valley phenomenon. Researchers and developers bringing human-like entities to life need to remember that being too close to a real human can be greatly disturbing for the user – a lifelike robot with the intent of being helpful can be scary instead. 

But the talking heads delivered by the Tooploox team can easily overcome the Uncanny Valley and appear as real as any other talking head. The research was delivered by a team consisting of Michał Stypułkowski, Konstantinos Vougioukas, Sen He, Maciej Zięba, Stavros Petridis, and Maja Pantic. Details can be found in this paper

Use cases

Apart from the research point of view, where the talking head is a commonly observed benchmark, this research can be applied in fields like media, entertainment, or education. Einstein himself can provide students with explanations of his equations. Napoleon Bonaparte can share his experiences from the wars he fought, and Seneca can share his wisdom. 

The same goes for the entertainment and movie industries, where non-existent people or entities can participate in movie roles. 

Towards More Realistic Membership Inference Attacks on Large Diffusion Models

Generative Diffusion Models are renowned for their ability to produce outstanding images from prompts, be it a painting that perfectly imitates the style of a particular painter, a photorealistic image, or a drawing. Basically, any image is possible as long as the user crafts a good prompt. 

Yet concerns around the models arise, with artists protesting against prompts that aim to imitate their style. Also, many models have been trained using data scraped from the Internet, resulting in a shady legal situation where creators didn’t give direct permission to use their work in training the system. 

The research team that includes Tooploox-affiliated scientists has developed a new tool to perform a Membership Inference Attack on Large Diffusion Models – this type of attack aims to determine whether a particular image was used in a training dataset. The team consists of Jan Dubiński, Antoni Kowalczuk, Stanisław Pawlak, Przemyslaw Rokita, Tomasz Trzciński, and Paweł Morawiecki. More details can be found in this paper

Use cases

This tool can be used to determine whether an image was used in a training dataset or not – making it a decent tool to support building fair and compliant AI models. This issue will only gain traction with the increasing popularity of AI-based tools in business, education, and the personal lives of people around the world.

CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free

One of the key limitations of existing image recognition models is the fact that they are trained narrowly to solve particular problems. A model trained to recognize a cat in an image will recognize a cat only – unable to spot a tree, a fire engine, or the firefighter who is holding the cat. The situation changed with CLIP’s introduction – this OpenAI-delivered model has opened the way for open-world perception. Yet CLIP struggles with more complex tasks, like image segmentation.  The tool can recognize entities, yet labeling them and providing the user with labels can be a challenge hard to overcome. 

The research team with Tooploox-affiliated scientists has introduced the CLIP-DIY approach, where, without additional training or data, embeddings offered by CLIP can be applied to segmentation to achieve high-quality results.

This research was delivered by a team consisting of Monika Wysoczańska, Michaël Ramamonjisoa, Tomasz Trzciński, and Oriane Siméoni. More details can be found in this paper.

Use case

Improving a model’s abilities to operate in the open world safely and efficiently is a way to introduce robotics and assistants in the real world where current approaches fail due to a lack of model robustness.

Adapt Your Teacher: Improving Knowledge Distillation for Exemplar-Free Continual Learning

One of the key challenges in machine learning is forgetting – a phenomenon that occurs when a neural network gains new skills and old ones are overridden by the new ones. Continual learning is a way to ensure that neural networks gain new skills while retaining the ones previously mastered. Continual learning is a field of machine learning that aims to train models that gradually accumulate knowledge, similar to humans that build their competencies over time. Yet machines fail miserably at this challenge. 

Tooploox-delivered research shows the way to overcome this limitation. Even more important is that it can be seamlessly integrated into existing popular approaches. The research was delivered by a team consisting of Filip Szatkowski, Mateusz Pyla, Marcin Przewięźlikowski, Sebastian Cygert, Bartłomiej Twardowski, and Tomasz Trzciński. More details can be found in this paper

Use cases

Continual Learning will only gain traction in the near future, with the increasing popularity of neural networks and Artificial Intelligence solutions in nearly every aspect of life. Continual learning is a way to leverage the increasing number of users and the data they generate without the need to retrain the whole network every time, making the training and improvement process easier and cheaper. 


The WACV conference was an opportunity to share experience and exchange knowledge regarding the latest Computer Vision Technologies. It was also a way to spot the latest trends and technologies that will gain traction in the research community. 

Similar Posts

See all posts