Research
I have strong interests in:
- A. Image, Video, and 3D generation with diffusion and auto-regressive models.
- B. Robot imitation learning from videos for general policy generation.
Representative papers are highlighted.
|
Updates
- [Feb 26, 2025] Our effort of multimodal computational photography is accepeted by CVPR25. π·
- [Jan 22, 2025] The generative field community now wins the second attempt by us in ICLR25. π§
- [Jan 16, 2025] Defensed my phd thesis. I am a Doctor now !!! π¨ββοΈ
- [Jan 15, 2025] I gave a talk (in Chinese) about my recent efficent image generation work at TechBeat π°
|
|
The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei,
Hossein Talebi,
Mojtaba Ardakani,
Vishal M. Patel,
Peyman Milanfar
Mauricio Delbracio
CVPR 2025
PDF /
project page
A novel approach that leverages the rich contextual information available in multiple modalities including depth, segmentation, edges, and text prompts to learn a powerful generative prior for SISR within a diffusion model framework
|
|
Field-DiT: Diffusion Transformer on Unified Video, 3D, and Game Field Generation
Kangfu Mei,
Mo Zhou,
Vishal M. Patel,
ICLR 2025
project page / OpenReview / Talk / Code
A generative filed model that can generate image, video, and 3D data in a single unified network architecture. The unification can boost video generation with 3D prior in multi-task learning, with the 10 times smaller model size than SORA.
|
|
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei,
Zhengzhong Tu,
Mauricio Delbracio,
Hossein Talebi,
Vishal M. Patel,
Peyman Milanfar
TMLR 2024
arXiv
The first work that throughly investigates the scaling properties of the recent trending latent diffsuion models (e.g., the representative StableDiffusion).
|
|
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei,
Mauricio Delbracio,
Hossein Talebi,
Zhengzhong Tu,
Vishal M. Patel,
Peyman Milanfar
CVPR 2024
project page /
arXiv
Faster conditional diffusion that produces high-quality images with 1-4 sampling steps.
|
|
VIDM: Video Implicit Diffusion Models
Kangfu Mei,
Vishal M. Patel,
AAAI, 2023 Oral Presentation
project page /
arXiv
Video Generation Diffusion Models By Using Implicit Motiion Condition.
|
|
Deep Semantic Statistics Matching (D2SM) Denoising Network
Kangfu Mei,
Vishal M. Patel,
Rui Huang
ECCV, 2022
project page /
arXiv /
poster
A New General Plug-and-play Component For Denoising
|
|
Latent Feature-Guided Diffusion Models for Shadow Removal
Kangfu Mei, Luis Figueroa, Zhe Lin, Zhihong Ding, Scott Cohen,
Vishal M. Patel
WACV, 2023
project page /
code /
demo /
arXiv
We (together with Adobe) conducted this very early exploration of applying diffusion models on shadow removal. This proposes the first instance-level shadow removal task.
|
|
LTT-GAN: Looking Through Turbulence by Inverting GANs
Kangfu Mei,
Vishal M. Patel
IEEE Journal of Selected Topics in Signal Processing [IF: 7.695]
arXiv
The first turbulence mitigation algorithm that can clearly recover face images captured in a range of 300 meters long.
|
|
AttaNet: Attention-Augmented Network for Fast and Accurate Scene Parsing
Qi Song, Kangfu Mei,
Rui Huang
AAAI, 2021
code /
arXiv
Strip Attention Module (SAM) and Attention Fusion Module (AFM) are proposed for enhancing
the accuracy of semantic segmentation networks with limited computational complexity.
|
|
Higher-resolution network for image demosaicing and enhancing
Kangfu Mei, Juncheng Li, Jiajie Zhang, Haoyu Wu, Jie Li, Rui Huang
ICCV AIM RAW to RGB mapping challenge, 2019
code /
arXiv
For the first time, a neural ISP has outperformed a traditional ISP (like Huawei's mobile ISP) and achieved visual quality comparable to that of a DSLR.
|
|
Progressive Feature Fusion Network for Realistic Image Dehazing
Kangfu Mei, Aiwen Jiang, Juncheng Li, Mingwen Wang
ACCV, 2019
code /
arXiv
PFFNet was the first dehazing netowrk that uses fully end-to-end neural network architecture without physical gating unit. More than 100 works (untill 2024) adapot our strategy and show its effectiveness.
|
|
Multi-scale Residual Network for Image Super-resolution
Juncheng Li, Faming Fang,
Kangfu Mei, Guixu Zhang
ECCV, 2018
code /
bibtex
We built MSRN in 2018. It quickly becomes the foundamental component in all image restoration works and has more than 800 citations untill 2024.
|
Service
Reviewer for CVPR, ICCV, ECCV, WACV, ICPR
Reviewer of International Journal of Computer Vision (IJCV)
Reviewer of IEEE Transactions on Image Processing (TIP)
Reviewer of IEEE Transactions on Multimedia (TMM)
Reviewer of IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
Reviewer of Computer Vision and Image Understanding (CVIU)
|
Prizes
AIM2019 Mobile Raw to DSLR RGB Image Mapping Challenge (ICCV2019 Workshop):
Top 1
Alibaba Youku Video Enhancement and Super-Resolution Challenge 2019: Top 4
NTIRE2018 Image Dehazing Challenge (CVPR2018 Workshop): Honorable Mention Award & Top 6
University Computer Software Programming Challenge 2018 in The Pearl River
Delta: Gold Award & Best innovative Award
|
|