CVPR-2024-Papers

官网链接：https://cvpr.thecvf.com/

研讨会 🔔：6 月 17-18 日

主会 🔔：6 月 19-21 日

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2024 年论文分类汇总戳这里

↘️WACV-2024-Papers ↘️CVPR-2024-Papers ↘️ECCV-2024-Papers

2023 年论文分类汇总戳这里

↘️CVPR-2023-Papers ↘️WACV-2023-Papers ↘️ICCV-2023-Papers

2022 年论文分类汇总戳这里

2021 年论文分类汇总戳这里

2020 年论文分类汇总戳这里

💥💥💥收录论文已全部更新，并全部分类完成！！！

🏆Best Papers

🏅Best Paper Runners-Up

🥇Best Student Papers

🥈Best Student Paper Runner-Ups

🐱	🐶	🐯	🐺
1.其它(other)	2.Image Segmentation(图像分割)	3.Image Classification(图像分类)	4.Image/Video Super-Resolution(图像超分辨率)
5.Image/Video Compression(图像/视频压缩)	6.Image/Video Captioning(图像/视频字幕)	7.Image Progress(图像处理)	8.Image Synthesis(图像生成)
9.Face(人脸)	10.Medical Image Progress(医学影响处理)	11.3D	12.Video
13.HPE(人体姿态估计)	14.HAR(人体动作识别检测)	15.Object Detection(目标检测)	16.Point Cloud(点云)
17.Automated Driving(自动驾驶)	18.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)(机器人)	19.Object Pose Estimation(物体姿态估计)	20.Optical Flow Estimation(光流估计)
21.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)	22.Deepfake Detection	23.Sound(语音处理)	24.ML(机器学习)
25.Object Tracking(目标跟踪)	26.Information Security(信息安全)	27.Vision-Language(视觉语言)	28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)
29.MC/KD/Pruning(模型压缩/知识蒸馏/剪枝)	30.Person Re-Id(人员重识别)	31.Edge Detection(边缘检测)	32.NLP(自然语言处理)
33.NeRF	34.Human–Computer Interaction(人机交互)	35.Scene Understanding(场景理解)	36.4D Reconstruction(4D 重建)
37.OCR	38.VQA(视觉问答)	39.Motion Generation(动作生成)	40.Scene Graph Generation(场景图生成)
41.Graph Generative Network(GNN/GCN)	42.Image Retrieval(图像检索)	43.Image Matching(图像匹配)	44.Image Fusion(图像融合)
45.NAS(神经架构搜索)	46.Industrial Anomaly Detection(工业缺陷检测)	47.Dense Predictions(密集预测)	48.Semi/self-supervised learning(半/自监督)
49.Dataset(数据集)	50.OOD Detection	51.Style Transfer(风格迁移)	52.Biomedical
53.Light-Field(光场)	54.ViT	55.REC(指代表达理解)	56.Visual emotion recognition(视觉情绪识别)
57.Visual Relationship Detection(视觉关系检测)	58.Fisheye Images(鱼眼图像)	59.Clustering(聚类)	60.Sketch(草图)
61.Gaze	62.全家桶

62.全家桶

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
⭐code用于音频、视频、点云、时间序列和图像识别的通用感知大内核卷积网络
GPT4Point: A Unified Framework for Point-Language Understanding and Generation点语言理解和生成的统一框架
AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond用于运动理解、规划、生成等的一体化框架

61.Gaze

60.Sketch(草图)

59.Clustering(聚类)

58.Fisheye Images(鱼眼图像)

Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption鱼眼图像

57.Visual Relationship Detection(视觉关系检测)

Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
⭐code

56.Visual emotion recognition(视觉情绪识别)

EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
⭐code视觉情感理解
多模态意图识别
- Contextual Augmented Global Contrast for Multimodal Intent Recognition

55.Referring Expression Comprehension(指代表达理解)

54.Vision Transformers

53.Light-Field(光场)

52.Biomedical

51.Style Transfer(风格迁移)

50.OOD Detection

49.Dataset(数据集)

48.Semi/self-supervised learning(半/自监督)

弱监督学习
- 部分标签学习
  - CroSel: Cross Selection of Confident Pseudo Labels for Partial-Label Learning部分标签学习-弱监督学习问题
半监督
- Targeted Representation Alignment for Open-World Semi-Supervised Learning
- SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder
- CDMAD: Class-Distribution-Mismatch-Aware Debiasing for Class-Imbalanced Semi-Supervised Learning
- BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
- 正样本标签学习
  - Positive-Unlabeled Learning by Latent Group-Aware Meta DisambiguationPositive-Unlabeled Learning(正样本标签学习)半监督学习的一个重要分支
自监督学习
无监督学习
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos

47.Dense Predictions(密集预测)

46.Industrial Anomaly Detection(工业缺陷检测)

45.Neural Architecture Search(神经架构搜索)

44.Image Fusion(图像融合)

43.Image Matching(图像匹配)

XFeat: Accelerated Features for Lightweight Image Matching
🏠project图像匹配
图像-文本
- Composing Object Relations and Attributes for Image-Text Matching

42.Image Retrieval(图像检索)

41.Graph Generative Network(GNN/GCN)

40.Scene Graph Generation(场景图生成)

39.Motion Generation(动作生成)

38.Vision Question Answering(视觉问答)

37.OCR

场景文本识别
场景文本图像合成
- Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
场景文本理解
- LayoutFormer: Hierarchical Text Detection Towards Scene Text Understanding
化学结构识别
- Atom-Level Optical Chemical Structure Recognition with Limited Supervision
  ⭐code
文档色度检测
- CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images
  ⭐code
文本检测
文档理解
- LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
  ⭐code
- HRVDA: High-Resolution Visual Document Assistant
字体生成
- Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models

36.4D Reconstruction(4D 重建)

Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
🏠project
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
🏠project
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
⭐code
🏠project
文本和图像引导 4D 场景生成
- A Unified Approach for Text- and Image-guided 4D Scene Generation
  🏠project
4D视图合成
- 4K4D: Real-Time 4D View Synthesis at 4K Resolution
  ⭐code
  🏠project
语言到 4D 建模
- L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
  ⭐code

35.Scene Understanding(场景理解)

34.Human–Computer Interaction(人机交互)

33.NeRF

32.NLP(自然语言处理)

31.Edge Detection(边缘检测)

30.Person Re-Identification(人员重识别)

29.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

27.Vision-Language(视觉语言)

26.Information Security(信息安全)

25.Object Tracking(目标跟踪)

24.Machine Learning(机器学习)

23.Sound

Hearing Anything Anywhere
🏠project
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
AV-RIR: Audio-Visual Room Impulse Response Estimation
📺video
DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
⭐code
Weakly-Supervised Audio-Visual Video Parsing with Prototype-based Pseudo-Labeling
视听对话
- The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective
  🏠project
视听导航
- RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation
视听分割
- Audio-Visual Segmentation via Unlabeled Frame Exploitation
- Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
- Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
- Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
  ⭐code
- Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
  ⭐code
语音识别
- A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
  ⭐code
语音定位
- Learning to Visually Localize Sound Sources from Mixtures without Prior Source Knowledge
  ⭐code
音-视语音表示学习
- ES³: Evolving Self-Supervised Learning of Robust Audio-Visual Speech Representations
  🏠project
  👍VILP
文本驱动的语音定位
- T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
从图像和语言提示合成音乐
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
耳音频生成和定位
- Cyclic Learning for Binaural Audio Generation and Localization
视频和音频同步
- DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
视听表征学习
- Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
说话人检测
- LoCoNet: Long-Short Context Network for Active Speaker Detection
  ⭐code
音频描述
- MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
  🏠project
视听语音翻译
- AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
  🏠project视听语音到视听语音翻译

22.Deepfake Detection

21.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)

20.Optical Flow Estimation(光流估计)

19.Object Pose Estimation(物体姿态估计)

18.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)(机器人)

17.Automated Driving(自动驾驶)

16.Point Cloud(点云)

15.Object Detection(目标检测)

14.Human Action Recognition(人体动作识别)

13.Human Pose Estimation(人体姿态估计)

12.Video

11.3D

10.Medical Image Progress(医学影响处理)

9.Face(人脸)

8.GAN/Image Synthesis(图像生成)

7.Image Progress(图像处理)

6.Image/Video Captioning(图像/视频字幕)

Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
⭐code
🏠project
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
⭐code
MeaCap: Memory-Augmented Zero-shot Image Captioning
⭐code
Sieve: Multimodal Dataset Pruning using Image Captioning Models
[EVCap: Retrieval-Augmented Image Captioning with External Visual--Name Memory for Open-World Comprehension]
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
视频描述/字幕
密集字幕
- A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
- DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
生成图解说明
- Generating Illustrated Instructions
  ⭐code
  🏠project

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
README.md		README.md

Ani7700/CVPR-2024-Papers

Folders and files

Latest commit

History

Repository files navigation

CVPR-2024-Papers

官网链接：https://cvpr.thecvf.com/

研讨会 🔔：6 月 17-18 日

主会 🔔：6 月 19-21 日

历年综述论文分类汇总戳这里↘️CV-Surveys施工中~~~~~~~~~~

2024 年论文分类汇总戳这里

2023 年论文分类汇总戳这里

2022 年论文分类汇总戳这里

2021 年论文分类汇总戳这里

2020 年论文分类汇总戳这里

💥💥💥收录论文已全部更新，并全部分类完成！！！

🏆Best Papers

🏅Best Paper Runners-Up

🥇Best Student Papers

🥈Best Student Paper Runner-Ups

目录

62.全家桶

61.Gaze

60.Sketch(草图)

59.Clustering(聚类)

58.Fisheye Images(鱼眼图像)

57.Visual Relationship Detection(视觉关系检测)

56.Visual emotion recognition(视觉情绪识别)

55.Referring Expression Comprehension(指代表达理解)

54.Vision Transformers

53.Light-Field(光场)

52.Biomedical

51.Style Transfer(风格迁移)

50.OOD Detection

49.Dataset(数据集)

48.Semi/self-supervised learning(半/自监督)

47.Dense Predictions(密集预测)

46.Industrial Anomaly Detection(工业缺陷检测)

45.Neural Architecture Search(神经架构搜索)

44.Image Fusion(图像融合)

43.Image Matching(图像匹配)

42.Image Retrieval(图像检索)

41.Graph Generative Network(GNN/GCN)

40.Scene Graph Generation(场景图生成)

39.Motion Generation(动作生成)

38.Vision Question Answering(视觉问答)

37.OCR

36.4D Reconstruction(4D 重建)

35.Scene Understanding(场景理解)

34.Human–Computer Interaction(人机交互)

33.NeRF

32.NLP(自然语言处理)

31.Edge Detection(边缘检测)

30.Person Re-Identification(人员重识别)

29.Model Compression/Knowledge Distillation/Pruning(模型压缩/知识蒸馏/剪枝)

28.UAV/Remote Sensing/Satellite Image(无人机/遥感/卫星图像)

27.Vision-Language(视觉语言)

26.Information Security(信息安全)

25.Object Tracking(目标跟踪)

24.Machine Learning(机器学习)

23.Sound

22.Deepfake Detection

21.Few/Zero-Shot Learning/DG/A(小/零样本/域泛化/域适应)

20.Optical Flow Estimation(光流估计)

19.Object Pose Estimation(物体姿态估计)

18.SLAM/AR/VR/Robotics(增强/虚拟现实/机器人)(机器人)

17.Automated Driving(自动驾驶)

16.Point Cloud(点云)

15.Object Detection(目标检测)

14.Human Action Recognition(人体动作识别)

13.Human Pose Estimation(人体姿态估计)

12.Video

11.3D

10.Medical Image Progress(医学影响处理)

9.Face(人脸)

8.GAN/Image Synthesis(图像生成)

7.Image Progress(图像处理)

6.Image/Video Captioning(图像/视频字幕)

5.Image/Video Compression(图像/视频压缩)

4.Image/Video Super-Resolution(图像超分辨率)

Packages