2025-07-03 |
MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real |
Renhao Wang et.al. |
2507.02864v1 |
null |
2025-07-03 |
AnyI2V: Animating Any Conditional Image with Motion Control |
Ziye Li et.al. |
2507.02857v1 |
null |
2025-07-03 |
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection |
Ziqi Miao et.al. |
2507.02844v1 |
null |
2025-07-03 |
Confidence-driven Gradient Modulation for Multimodal Human Activity Recognition: A Dynamic Contrastive Dual-Path Learning Approach |
Panpan Ji et.al. |
2507.02826v1 |
null |
2025-07-03 |
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion |
Fangfu Liu et.al. |
2507.02813v1 |
null |
2025-07-03 |
Quasinormal modes of Floquet media slabs |
Benjamin Vial et.al. |
2507.02784v1 |
null |
2025-07-03 |
Grounding Intelligence in Movement |
Melanie Segado et.al. |
2507.02771v1 |
null |
2025-07-03 |
DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment |
Ke-Han Lu et.al. |
2507.02768v1 |
null |
2025-07-03 |
A Proof-Theoretic View of Basic Intuitionistic Conditional Logic (Extended Version) |
Tiziano Dalmonte et.al. |
2507.02767v1 |
null |
2025-07-03 |
Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms |
Shiyi Liu et.al. |
2507.02724v1 |
null |
2025-07-03 |
Optimizing Start Locations in Ergodic Search for Disaster Response |
Ananya Rao et.al. |
2507.02708v1 |
null |
2025-07-03 |
Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search |
Jiajie Jin et.al. |
2507.02652v1 |
null |
2025-07-03 |
Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data |
Irena Girshovitz et.al. |
2507.02628v1 |
null |
2025-07-03 |
Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning |
Tan Pan et.al. |
2507.02581v1 |
null |
2025-07-03 |
MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention |
Zunhui Xia et.al. |
2507.02488v1 |
null |
2025-07-03 |
Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization |
De Cheng et.al. |
2507.02288v1 |
null |
2025-07-03 |
NLP4Neuro: Sequence-to-sequence learning for neural population decoding |
Jacob J. Morra et.al. |
2507.02264v1 |
null |
2025-07-02 |
Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach |
Elena Ryumina et.al. |
2507.02205v1 |
null |
2025-07-02 |
PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training |
Neil K. R. Sehgal et.al. |
2507.02122v1 |
null |
2025-07-02 |
The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems |
Reza Yousefi Maragheh et.al. |
2507.02097v1 |
null |
2025-07-02 |
Energy-Based Transformers are Scalable Learners and Thinkers |
Alexi Gladstone et.al. |
2507.02092v1 |
null |
2025-07-02 |
TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation |
Yubeen Lee et.al. |
2507.02080v1 |
null |
2025-07-02 |
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation |
Sixiang Chen et.al. |
2507.01961v2 |
null |
2025-07-02 |
IC-Custom: Diverse Image Customization via In-Context Learning |
Yaowei Li et.al. |
2507.01926v1 |
null |
2025-07-02 |
Modality-agnostic, patient-specific digital twins modeling temporally varying digestive motion |
Jorge Tapias Gomez et.al. |
2507.01909v2 |
null |
2025-07-02 |
Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning |
Qingdong He et.al. |
2507.01908v1 |
null |
2025-07-02 |
TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types |
Yuhao Lin et.al. |
2507.01857v1 |
null |
2025-07-02 |
How Do Vision-Language Models Process Conflicting Information Across Modalities? |
Tianze Hua et.al. |
2507.01790v1 |
null |
2025-07-02 |
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy |
Ming Dai et.al. |
2507.01738v1 |
null |
2025-07-02 |
Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach |
Hao Wei et.al. |
2507.01728v1 |
null |