LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning

Overview of LL3DA.
Xin Chen
Xin Chen
陈欣 | Senior Research Scientist

My research interests include large-scale video generation, audio-visual foundation models, and unified multi-modal models.