Incom ist die Kommunikations-Plattform des #SemesterHack - Wir hacken das digitale Sommersemester!

In seiner Funktionalität auf die Lehre in gestalterischen Studiengängen zugeschnitten... Schnittstelle für die moderne Lehre


3_06_Bored or exited? Emotion detection in online meetings


A good presentation provides the basis for listerens to understand and learn. To achive this environment, a good presenter adapts to the audience and picks up on nonverbal cues. In digital education and lectures, presenters often struggle with non-existing non-verbal feedback from their listeners. Those presentations can easily become (extraordinarily) one-sided and boring for the students as well as the lecturer.


The basis for ClassAct was developed during the HackaTUM C0dev1d19. Accordingly, we set out into this hackathon with a web-app porotype allowing emotion recognition, sharing presenter's slides and audio as well as video footage. Detected emotions are summarized and presented in a diagram in real-time.

What's new?

During the Hackathon „Digitalisierung Hochschullehre“ we were able to expand the ClassAct webapp by audio features (ambient sound, notification sounds, speech rate feedback for the presenter) as well as the recognition of basic gestures (nodding and head-shaking) to assess (dis-)agreement in the audience.

Emotion Recognition - But Why?

Current solutions to enable online lectures mainly involve activation of participants' webcams, so that the lecturer can interact with learners and use their non-verbal communication cues to adapt to the learners’ needs. While this might be a viable solution for small seminars, bandwidth restrictions as well as the inability to capture hundreds of video footages, this solution is  unsuited for bigger lectures. Moreover, participants can feel uncomfortable showing themselves and their personal environments to everyone. 

ClassAct attemps a novel approach by inferring the emotional state of the learners from their video footage and giving an anonymized overall feedback to the lecturer and the audience. Furthermore, ClassAct overcomes the eerie feeling of speaking into the perfect silence of a webmeeting in which everyone's microphone is deactivated by offering pleasant background murmurs just like they would occur during a real lecture.

Ambient Sound, Comfort and Influencing Speech Rate

Ambient Sound &  Comfort

As we have have experienced ourselves and heard from a lot of lecturers and students, speaking into the absolute silence of an online conference can feel unnatural and make us uncomfortable. ClassAct meets this challenge by providing an unobtrusive ambient sound that is natural and typical for teaching environments. Moreover, ambient sound can help concentrating on what we are doing (DeLoach, Carter & Braasch, 2015) and even enhance creativity (Mehta, Zhu & Cheema, 2012). 

Notifications for Questions

Questions come with an appropriate notification sound, in order to preserve the presenter's visual channel for the presentation slides and subtly draw  attention to the questions' section only if necessary. Of course, both ambient and notification sounds can be deactivated by the presenter.

Speech Rate

Literature suggests, that our recognition of rhythm is closley linked to our behaviour, such as walking, and can even be used to improve gait in Parkinson's Disease patients (Enzensberger, Oerländer & Stecker, 1997). Furthermore, rhythm (such as music or heard speech) can influence one's own talking speed (Jungers & Hupp, 2009; Jungers, Hupp & Dickerson, 2016; Tooley, Konopka & Watson, 2018). As we are all well aware, it can be hard to maintain an optimal speech rate (words per minute) in presentations. Too low a speech rate makes us sound boring to learners, while speaking too fast makes it hard to follow the content. We therefore extended ClassAct by a feedback option, where listeners are able to feedback the presenters' pace of talking via button press. Based on the literature cited above, we implemented an intuitive and unobtrusive way of conveying this information to the presenter. Therefore, the presenter is presented with a low rhythmic sound in order to subconciously adapt his or her speech rate and regain an optimal talking speed. This rhythm is subtly mixed in with the ambient sound and is thus unobtrusive and subliminal.

Advantages for Lecturers and Learners

Good teaching highly depends on the lecturer’s ability to adapt to the audience, e.g. by adjusting talking speed and giving further explanations on the subject. Lecturers often do not only use explicit feedback from students, like questions, but also rely on non-verbal communication cues. Usually, if a lecturer notices a lot of confused faces or an increasingly sleepy audience.

It is one thing to go to a real-life lecture and a totally different beast to have to live-stream oneself via video in a lecture. We believe (and experienced), that this can cause feelings of discomfort and unease. This reduces learning and understanding of the students. 

ClassAct provides a way to deliver this viable information in a suitable, applicable, and easy to use way, while still respecting everyone's privacy. Hence, it is able to combine the best of both worlds, e-learning from the comfort of our own homes and feeling connected to our classmates or audience.

What Makes a Good Presentation and How Does ClassAct Help Holding One?

In summary, ClassAct addresses multiple challenges occurring in online lectures:

1) Audience feedback

Emotion recognition, sleepiness rate, and agreeance are unobtrusively obtained from each listener and transmitted in an understandable, aggregated way to the presenter without the need of streaming one's video footage.

2) Ambient sound

Many presenters (and the audience as well) feel uncomfortable speaking into the silent void when everyone has deactivated their microphones. Furthermore, research suggests that subtle ambient sound can enhance learning outcomes. Therefore, ClassAct provides an option to activate a pleansant, learnig-environment specific ambient sound.

3) Talking speed

Speaking too fast or too slow can impede how well the audience can follow your presentation. ClassAct provides an intuitive way of conveying whether the audience thinks the talk is being too fast or too slow by meddling in a rhythmic sound into the ambient sound that causes an intuitive talking speed adaptation, regaining the optimal speech rate.

Technical Details

We built ClassAct incorporating a deep-learning algortithm for emotion detection in matlab and added gesture recognition to it. The latter is able to differenciate between nodding, head shaking and neutral head movements, like soft bobbing to music. So the gestures are classified into  levels of agreement ranging between „agree“, „disagree“ and „neutral“.

The emotion and face recognition is executed on the students' hardware and only an arbitrary optional UserID, the classified emotional state, and level of agreement with a timestamp are transmitted to the database. Therefore the information stored in the database is anonymized.

In an online dashboard, the data is processed and presented in a customizable way to fit the needs of the presenters and students.


DeLoach, A. G., Carter, J. P., & Braasch, J. (2015). Tuning the cognitive environment: Sound masking with “natural” sounds in open-plan offices. The Journal of the Acoustical Society of America, 137(4), 2291-2291.

Enzensberger, W., Oberländer, U., & Stecker, K. (1997). Metronomtherapie bei Parkinson-Patienten. Der Nervenarzt, 68(12), 972-977.

Jungers, M. K., & Hupp, J. M. (2009). Speech priming: Evidence for rate persistence in unscripted speech. Language and Cognitive Processes, 24(4), 611-624.

Jungers, M. K., Hupp, J. M., & Dickerson, S. D. (2016). Language priming by music and speech: Evidence of a shared processing mechanism. Music Perception: An Interdisciplinary Journal, 34(1), 33-39.

Mehta, R., Zhu, R., & Cheema, A. (2012). Is noise always bad? Exploring the effects of ambient noise on creative cognition. Journal of Consumer Research, 39(4), 784-799.

Tooley, K. M., Konopka, A. E., & Watson, D. G. (2018). Assessing priming for prosodic representations: Speaking rate, intonational phrase boundaries, and pitch accenting. Memory & cognition, 46(4), 625-641.


Thanks a lot to the Nick-o-Meter-Team, for the synergetic discussions and the creative input!

Previos work on ClassAct:

turn-in for HackaTUM C0dev1d19