2025-07-03 |
MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real |
Renhao Wang et.al. |
2507.02864v1 |
null |
2025-07-03 |
Point3R: Streaming 3D Reconstruction with Explicit Spatial Pointer Memory |
Yuqi Wu et.al. |
2507.02863v1 |
null |
2025-07-03 |
LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans |
Zhening Huang et.al. |
2507.02861v1 |
null |
2025-07-03 |
RefTok: Reference-Based Tokenization for Video Generation |
Xiang Fan et.al. |
2507.02862v1 |
null |
2025-07-03 |
Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching |
Xin Zhou et.al. |
2507.02860v1 |
null |
2025-07-03 |
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation |
Jiaer Xia et.al. |
2507.02859v1 |
null |
2025-07-03 |
AnyI2V: Animating Any Conditional Image with Motion Control |
Ziye Li et.al. |
2507.02857v1 |
null |
2025-07-03 |
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection |
Ziqi Miao et.al. |
2507.02844v1 |
null |
2025-07-03 |
Neutrino mixing parameters and masses from $Δ(96)\rtimes H_{CP}$ in the tri-direct CP approach |
Li-Na Yan et.al. |
2507.02840v1 |
null |
2025-07-03 |
USAD: An Unsupervised Data Augmentation Spatio-Temporal Attention Diffusion Network |
Ying Yu et.al. |
2507.02827v1 |
null |
2025-07-03 |
Confidence-driven Gradient Modulation for Multimodal Human Activity Recognition: A Dynamic Contrastive Dual-Path Learning Approach |
Panpan Ji et.al. |
2507.02826v1 |
null |
2025-07-03 |
DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift |
Po-Heng Chou et.al. |
2507.02824v1 |
null |
2025-07-03 |
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion |
Fangfu Liu et.al. |
2507.02813v1 |
null |
2025-07-03 |
HyperGaussians: High-Dimensional Gaussian Splatting for High-Fidelity Animatable Face Avatars |
Gent Serifi et.al. |
2507.02803v1 |
null |
2025-07-03 |
AREE-Based Decoupled Design of Hybrid Beamformers in mmWave XL-MIMO Systems |
Jiazhe Li et.al. |
2507.02802v1 |
null |
2025-07-03 |
Time-Masked Transformers with Lightweight Test-Time Adaptation for Neural Speech Decoding |
Ebrahim Feghhi et.al. |
2507.02800v1 |
null |
2025-07-03 |
No time to train! Training-Free Reference-Based Instance Segmentation |
Miguel Espinosa et.al. |
2507.02798v1 |
null |
2025-07-03 |
A Highly Carbon-Rich Dayside and Disequilibrium Chemistry in the Ultra-Hot Jupiter WASP-19b |
Suman Saha et.al. |
2507.02797v1 |
null |
2025-07-03 |
Ultrafast optical excitation of magnons in 2D antiferromagnets via spin torque exerted by photocurrent of excitons: Signatures in charge pumping and THz emission |
Jalil Varela-Manjarres et.al. |
2507.02793v1 |
null |
2025-07-03 |
RichControl: Structure- and Appearance-Rich Training-Free Spatial Control for Text-to-Image Generation |
Liheng Zhang et.al. |
2507.02792v1 |
null |
2025-07-03 |
From Long Videos to Engaging Clips: A Human-Inspired Video Editing Framework with Multimodal Narrative Understanding |
Xiangfeng Wang et.al. |
2507.02790v1 |
null |
2025-07-03 |
From Pixels to Damage Severity: Estimating Earthquake Impacts Using Semantic Segmentation of Social Media Images |
Danrong Zhang et.al. |
2507.02781v1 |
null |
2025-07-03 |
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs |
Ken Tsui et.al. |
2507.02778v1 |
null |
2025-07-03 |
Grounding Intelligence in Movement |
Melanie Segado et.al. |
2507.02771v1 |
null |
2025-07-03 |
Fast and Simplex: 2-Simplicial Attention in Triton |
Aurko Roy et.al. |
2507.02754v1 |
null |
2025-07-03 |
Partial Weakly-Supervised Oriented Object Detection |
Mingxin Liu et.al. |
2507.02751v1 |
null |
2025-07-03 |
Linear Attention with Global Context: A Multipole Attention Mechanism for Vision and Physics |
Alex Colagrande et.al. |
2507.02748v1 |
null |
2025-07-03 |
DexVLG: Dexterous Vision-Language-Grasp Model at Scale |
Jiawei He et.al. |
2507.02747v1 |
null |
2025-07-03 |
Prompt learning with bounding box constraints for medical image segmentation |
Mélanie Gaillochet et.al. |
2507.02743v1 |
null |
2025-07-03 |
Leveraging Transformer Models to Capture Multi-Scale Dynamics in Biomolecules by nano-GPT |
Wenqi Zeng et.al. |
2507.02734v1 |
null |