2025-07-03 |
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory |
Yuqi Wu et.al. |
2507.02863v1 |
null |
2025-07-03 |
LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans |
Zhening Huang et.al. |
2507.02861v1 |
null |
2025-07-03 |
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation |
Jiaer Xia et.al. |
2507.02859v1 |
null |
2025-07-03 |
AnyI2V: Animating Any Conditional Image with Motion Control |
Ziye Li et.al. |
2507.02857v1 |
null |
2025-07-03 |
MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis |
Kunyu Zhang et.al. |
2507.02847v1 |
null |
2025-07-03 |
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection |
Ziqi Miao et.al. |
2507.02844v1 |
null |
2025-07-03 |
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion |
Fangfu Liu et.al. |
2507.02813v1 |
null |
2025-07-03 |
Multimodal Mathematical Reasoning with Diverse Solving Perspective |
Wenhao Shi et.al. |
2507.02804v1 |
null |
2025-07-03 |
No time to train! Training-Free Reference-Based Instance Segmentation |
Miguel Espinosa et.al. |
2507.02798v1 |
null |
2025-07-03 |
RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation |
Liheng Zhang et.al. |
2507.02792v1 |
null |
2025-07-03 |
From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images |
Danrong Zhang et.al. |
2507.02781v1 |
null |
2025-07-03 |
Discovery and Preliminary Characterization of a Third Interstellar Object: 3I/ATLAS |
Darryl Z. Seligman et.al. |
2507.02757v1 |
null |
2025-07-03 |
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics |
Alex Colagrande et.al. |
2507.02748v1 |
null |
2025-07-03 |
DexVLG: Dexterous Vision-Language-Grasp Model at Scale |
Jiawei He et.al. |
2507.02747v1 |
null |
2025-07-03 |
Prompt learning with bounding box constraints for medical image segmentation |
Mélanie Gaillochet et.al. |
2507.02743v1 |
null |
2025-07-03 |
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models |
Yuxuan Wang et.al. |
2507.02714v1 |
null |
2025-07-03 |
UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation |
Qin Guo et.al. |
2507.02713v1 |
null |
2025-07-03 |
SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment |
Qi Xu et.al. |
2507.02705v1 |
null |
2025-07-03 |
APT: Adaptive Personalized Training for Diffusion Models with Limited Data |
JungWoo Chae et.al. |
2507.02687v1 |
null |
2025-07-03 |
Learning few-step posterior samplers by unfolding and distillation of diffusion models |
Charlesquin Kemajou Mbakam et.al. |
2507.02686v1 |
null |
2025-07-03 |
Real-time Image-based Lighting of Glints |
Tom Kneiphof et.al. |
2507.02674v1 |
null |
2025-07-03 |
Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs |
Francesco Di Salvo et.al. |
2507.02671v1 |
null |
2025-07-03 |
MEGANet-W: A Wavelet-Driven Edge-Guided Attention Framework for Weak Boundary Polyp Detection |
Zhe Yee Tan et.al. |
2507.02668v1 |
null |
2025-07-03 |
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models |
Ziyin Zhou et.al. |
2507.02664v1 |
null |
2025-07-03 |
Insights into Chromospheric Large-Scale Flows using Nobeyama 17 GHz Radio Observations I. The Differential Rotation Profile |
Srinjana Routh et.al. |
2507.02630v1 |
null |
2025-07-03 |
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation |
François Rozet et.al. |
2507.02608v1 |
null |
2025-07-03 |
Addressing Camera Sensors Faults in Vision-Based Navigation: Simulation and Dataset Development |
Riccardo Gallon et.al. |
2507.02602v1 |
null |
2025-07-03 |
ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects |
Qiaojun Yu et.al. |
2507.02600v1 |
null |
2025-07-03 |
AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models |
Chenhao Xue et.al. |
2507.02598v1 |
null |
2025-07-03 |
Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning |
Tan Pan et.al. |
2507.02581v1 |
null |