-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdummy.html
113 lines (103 loc) · 6.6 KB
/
dummy.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>DPM-TSE: A DIFFUSION PROBABILISTIC MODEL FOR TARGET SOUND EXTRACTION</title>
<style>
body {
margin: 0 auto;
width: 80%; /* You can adjust this value to control the page width */
}
.results-container {
overflow-x: auto;
margin: 0 auto;
}
</style>
</style>
</head>
<body>
<div align="center">
<h1>DPM-TSE: A DIFFUSION PROBABILISTIC MODEL FOR TARGET SOUND EXTRACTION</h1>
<div>
<p>Jiarui Hai, Helin Wang, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali</p>
</div>
<a href="https://github.com/haidog-yaqub/DPMTSE/tree/main">Repository</a> | <a href="[ArticleLink]">Article</a>
</div>
<h2>Abstract</h2>
<p style="font-size: 20px;"">Common target sound extraction (TSE) approaches primarily relied on discriminative approaches in order to separate the target sound while minimizing interference from the unwanted sources, with varying success in separating the target from the background. This study introduces DPM-TSE, a first generative method based on diffusion probabilistic modeling (DPM) for target sound extraction, to achieve both cleaner target renderings as well as improved separability from unwanted sounds. The technique also tackles common background noise issues with DPM by introducing a correction method for noise schedules and sample steps. This approach is evaluated using both objective and subjective quality metrics on the FSD Kaggle 2018 dataset. The results show that DPM-TSE has a significant improvement in perceived quality in terms of target extraction and purity.</p>
<h2>Model Framework</h2>
<div align="center">
<img src="./media/figures/model.jpeg" alt="Image Description" width="500" height="500" />
</div>
<h2>Results</h2>
<!-- Add your results content here -->
<div class="results-container">
<table>
<tr>
<th>Mixture</th>
<th>Target Sound (GT)</th>
<th>DPM-TSE (Ours)</th>
<th>TSENET</th>
<th>WaveFormer</th>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Applause_test_1082.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Applause_test_1082.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Applause_test_1082.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Applause_test_1082.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Applause_test_1082.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Bark_test_625.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Bark_test_625.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Bark_test_625.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Bark_test_625.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Bark_test_625.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Harmonica_test_1423.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Harmonica_test_1423.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Harmonica_test_1423.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Harmonica_test_1423.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Harmonica_test_1423.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Meow_test_4.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Meow_test_4.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Meow_test_4.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Meow_test_4.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Meow_test_4.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Shatter_test_924.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Shatter_test_924.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Shatter_test_924.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Shatter_test_924.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Shatter_test_924.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Snare_drum_test_844.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Snare_drum_test_844.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Snare_drum_test_844.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Snare_drum_test_844.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Snare_drum_test_844.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Squeak_test_797.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Squeak_test_797.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Squeak_test_797.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Squeak_test_797.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Squeak_test_797.wav" type="audio/wav"></audio></td>
</tr>
<tr>
<td><audio controls><source src="./media/mixture/Writing_test_1374.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/gt/Writing_test_1374.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/dpm/Writing_test_1374.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/tsenet/Writing_test_1374.wav" type="audio/wav"></audio></td>
<td><audio controls><source src="./media/waveformer/Writing_test_1374.wav" type="audio/wav"></audio></td>
</tr>
<!-- Repeat the above rows for additional audio samples -->
</table>
</div>
</body>
</html>