-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
157 lines (143 loc) · 6.57 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="description"
content="Palm: Predicting Actions through Language Models">
<meta name="keywords" content="ldmoa">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Palm: Predicting Actions through Language Models</title>
</script>
<link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
rel="stylesheet">
<link rel="stylesheet" href="./static/css/bulma.min.css">
<link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
<link rel="stylesheet" href="./static/css/bulma-slider.min.css">
<link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
<link rel="stylesheet"
href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
<link rel="stylesheet" href="./static/css/index.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
<script defer src="./static/js/fontawesome.all.min.js"></script>
<script src="./static/js/bulma-carousel.min.js"></script>
<script src="./static/js/bulma-slider.min.js"></script>
<script src="./static/js/index.js"></script>
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-72PW1FZDE4"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag(){dataLayer.push(arguments);}
gtag('js', new Date());
gtag('config', 'G-72PW1FZDE4');
</script>
</head>
<body>
<section class="hero">
<div class="hero-body">
<div class="container is-max-desktop">
<div class="columns is-centered">
<div class="column has-text-centered">
<h1 class="title is-1 publication-title">Palm: Predicting Actions through Language Models</h1>
<div class="is-size-5 publication-authors">
<span class="author-block">
<a>Daoji Huang</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://ait.ethz.ch/people/hilliges">Otmar Hilliges</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://ee.ethz.ch/the-department/faculty/professors/person-detail.OTAyMzM=.TGlzdC80MTEsMTA1ODA0MjU5.html/">Luc Van Gool</a><sup>1</sup>,</span>
<span class="author-block">
<a href="https://ait.ethz.ch/people/xiwang">Xi Wang</a><sup>1</sup>
</span>
</div>
<div class="is-size-5 publication-authors">
<span class="author-block"><sup>1</sup>ETH Zurich, Switzerland</span>
<br>
<span class="author-block"> <b>In tbd, 2023.</b></span>
</div>
<div class="column has-text-centered">
<div class="publication-links">
<!-- PDF Link. -->
<span class="link-block">
<a href="https://arxiv.org/abs/2306.16545"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-pdf"></i>
</span>
<span>Paper</span>
</a>
</span>
<span class="link-block">
<a href="https://github.com/DanDoge/Palm"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fas fa-file-code"></i>
</span>
<span>Code</span>
</a>
</span>
<!-- Video Link. -->
<!-- <span class="link-block">
<a href="https://youtu.be/YB1_xKlueUI"
class="external-link button is-normal is-rounded is-dark">
<span class="icon">
<i class="fab fa-youtube"></i>
</span>
<span>Video</span>
</a>
</span> -->
</div>
</div>
</div>
</div>
</section>
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<div class="diagram">
<img src="static/images/palm_teaser.png" alt="Overview" height="600" width="600" />
</div>
</div>
</div>
<section class="section">
<div class="container is-max-desktop">
<!-- Abstract. -->
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Abstract</h2>
<div class="content has-text-justified">
<p>
We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models. Given an input video with annotated action periods, the LTA task aims to predict possible future actions. We hypothesize that an optimal solution should capture the interdependency between past and future actions, and be able to infer future actions based on the structure and dependency encoded in the past actions. Large language models have demonstrated remarkable commonsense-based reasoning ability. Inspired by that, Palm chains an image captioning model and a large language model. It predicts future actions based on frame descriptions and action labels extracted from the input videos. Our method outperforms other participants in the EGO4D LTA challenge and achieves the best performance in terms of action prediction. Our code is available at <a href="https://github.com/DanDoge/Palm">https://github.com/DanDoge/Palm</a>.
</p>
</div>
</div>
</div>
</div>
</section>
<section class="section_video">
<div class="container is-max-desktop">
<div class="columns is-centered has-text-centered">
<div class="column is-four-fifths">
<h2 class="title is-3">Video</h2>
<div class="publication-video">
<iframe lass="embed-responsive-item "width="960" height="520" src="./static/images/lta-video-trimmed.mp4" frameborder="0" allowfullscreen></iframe>
</div>
<br/>
<h2 class="subtitle has-text-centered">
We present Palm, a solution to the Long-Term Action Anticipation (LTA) task utilizing vision-language and large language models.
</h2>
</div>
</div>
</div>
</section>
<section class="section" id="BibTeX">
<div class="container is-max-desktop content">
<h2 class="title">BibTeX</h2>
<pre><code>@inproceedings{tbd,
title={Palm: Predicting Actions through Language Models},
author={Daoji Huang, Otmar Hilliges, Luc Van Gool, Xi Wang},
booktitle={tbd},
year={2023}
}</code></pre>
</div>
</section>
</body>
</html>