Image Caption

Publish Date	Title	Authors	PDF	Code
2025-07-03	Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory	Yuqi Wu et.al.	2507.02863v1	null
2025-07-03	LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans	Zhening Huang et.al.	2507.02861v1	null
2025-07-03	Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation	Jiaer Xia et.al.	2507.02859v1	null
2025-07-03	AnyI2V: Animating Any Conditional Image with Motion Control	Ziye Li et.al.	2507.02857v1	null
2025-07-03	MvHo-IB: Multi-View Higher-Order Information Bottleneck for Brain Disorder Diagnosis	Kunyu Zhang et.al.	2507.02847v1	null
2025-07-03	Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection	Ziqi Miao et.al.	2507.02844v1	null
2025-07-03	LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion	Fangfu Liu et.al.	2507.02813v1	null
2025-07-03	Multimodal Mathematical Reasoning with Diverse Solving Perspective	Wenhao Shi et.al.	2507.02804v1	null
2025-07-03	No time to train! Training-Free Reference-Based Instance Segmentation	Miguel Espinosa et.al.	2507.02798v1	null
2025-07-03	RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation	Liheng Zhang et.al.	2507.02792v1	null
2025-07-03	From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images	Danrong Zhang et.al.	2507.02781v1	null
2025-07-03	Discovery and Preliminary Characterization of a Third Interstellar Object: 3I/ATLAS	Darryl Z. Seligman et.al.	2507.02757v1	null
2025-07-03	Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics	Alex Colagrande et.al.	2507.02748v1	null
2025-07-03	DexVLG: Dexterous Vision-Language-Grasp Model at Scale	Jiawei He et.al.	2507.02747v1	null
2025-07-03	Prompt learning with bounding box constraints for medical image segmentation	Mélanie Gaillochet et.al.	2507.02743v1	null
2025-07-03	FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models	Yuxuan Wang et.al.	2507.02714v1	null
2025-07-03	UniMC: Taming Diffusion Transformer for Unified Keypoint-Guided Multi-Class Image Generation	Qin Guo et.al.	2507.02713v1	null
2025-07-03	SIU3R: Simultaneous Scene Understanding and 3D Reconstruction Beyond Feature Alignment	Qi Xu et.al.	2507.02705v1	null
2025-07-03	APT: Adaptive Personalized Training for Diffusion Models with Limited Data	JungWoo Chae et.al.	2507.02687v1	null
2025-07-03	Learning few-step posterior samplers by unfolding and distillation of diffusion models	Charlesquin Kemajou Mbakam et.al.	2507.02686v1	null
2025-07-03	Real-time Image-based Lighting of Glints	Tom Kneiphof et.al.	2507.02674v1	null
2025-07-03	Embedding-Based Federated Data Sharing via Differentially Private Conditional VAEs	Francesco Di Salvo et.al.	2507.02671v1	null
2025-07-03	MEGANet-W: A Wavelet-Driven Edge-Guided Attention Framework for Weak Boundary Polyp Detection	Zhe Yee Tan et.al.	2507.02668v1	null
2025-07-03	AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models	Ziyin Zhou et.al.	2507.02664v1	null
2025-07-03	Insights into Chromospheric Large-Scale Flows using Nobeyama 17 GHz Radio Observations I. The Differential Rotation Profile	Srinjana Routh et.al.	2507.02630v1	null
2025-07-03	Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation	François Rozet et.al.	2507.02608v1	null
2025-07-03	Addressing Camera Sensors Faults in Vision-Based Navigation: Simulation and Dataset Development	Riccardo Gallon et.al.	2507.02602v1	null
2025-07-03	ArtGS:3D Gaussian Splatting for Interactive Visual-Physical Modeling and Manipulation of Articulated Objects	Qiaojun Yu et.al.	2507.02600v1	null
2025-07-03	AC-Refiner: Efficient Arithmetic Circuit Optimization Using Conditional Diffusion Models	Chenhao Xue et.al.	2507.02598v1	null
2025-07-03	Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning	Tan Pan et.al.	2507.02581v1	null