From d493aab19289067314b075dc34b977df242d1477 Mon Sep 17 00:00:00 2001 From: Fernando Alva-Manchego Date: Mon, 26 Feb 2024 12:12:13 +0000 Subject: [PATCH] Update abstract --- content/event/2024-02-29/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/event/2024-02-29/index.md b/content/event/2024-02-29/index.md index 7d65356..a4b354d 100644 --- a/content/event/2024-02-29/index.md +++ b/content/event/2024-02-29/index.md @@ -12,7 +12,7 @@ location: Abacws # postcode: # country: summary: Talk by [Hosein Mohebbi](https://hmohebbi.github.io/) (Tilburg University, Netherlands) -abstract: "In both text and speech processing, variants of the Transformer architecture have become ubiquitous. The key advantage of this neural network topology lies in the modeling of pairwise relations between elements of the input (tokens): the representation of a token at a particular Transformer layer is a function of the weighted sum of the transformed representations of all the tokens in the previous layer. This feature of Transformers is known as 'context mixing' and understanding how it functions in specific model layers is crucial for tracing the overall information flow. In this talk, I will first introduce Value Zeroing, as measure of context mixing, and show that the token importance scores obtained through Value Zeroing offer better interpretations compared to previous analysis methods in terms of plausibility, faithfulness, and agreement with probing. Next, by applying Value Zeroing to models of spoken language, we will see how patterns of context mixing can reveal striking differences between the behavior of encoder-only and encoder-decoder speech Transformers." +abstract: "In both text and speech processing, variants of the Transformer architecture have become ubiquitous. The key advantage of this neural network topology lies in the modeling of pairwise relations between elements of the input (tokens): the representation of a token at a particular Transformer layer is a function of the weighted sum of the transformed representations of all the tokens in the previous layer. This feature of Transformers is known as \'context mixing\' and understanding how it functions in specific model layers is crucial for tracing the overall information flow. In this talk, I will first introduce Value Zeroing, as measure of context mixing, and show that the token importance scores obtained through Value Zeroing offer better interpretations compared to previous analysis methods in terms of plausibility, faithfulness, and agreement with probing. Next, by applying Value Zeroing to models of spoken language, we will see how patterns of context mixing can reveal striking differences between the behavior of encoder-only and encoder-decoder speech Transformers." # Talk start and end times. # End time can optionally be hidden by prefixing the line with `#`.