Skip to content

Multi modal

Multi-modal

Publish Date Title Authors PDF Code
2025-07-03 MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real Renhao Wang et.al. 2507.02864v1 null
2025-07-03 AnyI2V: Animating Any Conditional Image with Motion Control Ziye Li et.al. 2507.02857v1 null
2025-07-03 Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection Ziqi Miao et.al. 2507.02844v1 null
2025-07-03 Confidence-driven Gradient Modulation for Multimodal Human Activity Recognition: A Dynamic Contrastive Dual-Path Learning Approach Panpan Ji et.al. 2507.02826v1 null
2025-07-03 LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion Fangfu Liu et.al. 2507.02813v1 null
2025-07-03 Quasinormal modes of Floquet media slabs Benjamin Vial et.al. 2507.02784v1 null
2025-07-03 Grounding Intelligence in Movement Melanie Segado et.al. 2507.02771v1 null
2025-07-03 DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment Ke-Han Lu et.al. 2507.02768v1 null
2025-07-03 A Proof-Theoretic View of Basic Intuitionistic Conditional Logic (Extended Version) Tiziano Dalmonte et.al. 2507.02767v1 null
2025-07-03 Hierarchical Multi-Label Contrastive Learning for Protein-Protein Interaction Prediction Across Organisms Shiyi Liu et.al. 2507.02724v1 null
2025-07-03 Optimizing Start Locations in Ergodic Search for Disaster Response Ananya Rao et.al. 2507.02708v1 null
2025-07-03 Decoupled Planning and Execution: A Hierarchical Reasoning Framework for Deep Search Jiajie Jin et.al. 2507.02652v1 null
2025-07-03 Medical Data Pecking: A Context-Aware Approach for Automated Quality Evaluation of Structured Medical Data Irena Girshovitz et.al. 2507.02628v1 null
2025-07-03 Structure-aware Semantic Discrepancy and Consistency for 3D Medical Image Self-supervised Learning Tan Pan et.al. 2507.02581v1 null
2025-07-03 MedFormer: Hierarchical Medical Vision Transformer with Content-Aware Dual Sparse Selection Attention Zunhui Xia et.al. 2507.02488v1 null
2025-07-03 Prompt Disentanglement via Language Guidance and Representation Alignment for Domain Generalization De Cheng et.al. 2507.02288v1 null
2025-07-03 NLP4Neuro: Sequence-to-sequence learning for neural population decoding Jacob J. Morra et.al. 2507.02264v1 null
2025-07-02 Team RAS in 9th ABAW Competition: Multimodal Compound Expression Recognition Approach Elena Ryumina et.al. 2507.02205v1 null
2025-07-02 PAL: Designing Conversational Agents as Scalable, Cooperative Patient Simulators for Palliative-Care Training Neil K. R. Sehgal et.al. 2507.02122v1 null
2025-07-02 The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems Reza Yousefi Maragheh et.al. 2507.02097v1 null
2025-07-02 Energy-Based Transformers are Scalable Learners and Thinkers Alexi Gladstone et.al. 2507.02092v1 null
2025-07-02 TAGF: Time-aware Gated Fusion for Multimodal Valence-Arousal Estimation Yubeen Lee et.al. 2507.02080v1 null
2025-07-02 AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation Sixiang Chen et.al. 2507.01961v2 null
2025-07-02 IC-Custom: Diverse Image Customization via In-Context Learning Yaowei Li et.al. 2507.01926v1 null
2025-07-02 Modality-agnostic, patient-specific digital twins modeling temporally varying digestive motion Jorge Tapias Gomez et.al. 2507.01909v2 null
2025-07-02 Reasoning to Edit: Hypothetical Instruction-Based Image Editing with Visual Reasoning Qingdong He et.al. 2507.01908v1 null
2025-07-02 TypeTele: Releasing Dexterity in Teleoperation by Dexterous Manipulation Types Yuhao Lin et.al. 2507.01857v1 null
2025-07-02 How Do Vision-Language Models Process Conflicting Information Across Modalities? Tianze Hua et.al. 2507.01790v1 null
2025-07-02 DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy Ming Dai et.al. 2507.01738v1 null
2025-07-02 Token Communication in the Era of Large Models: An Information Bottleneck-Based Approach Hao Wei et.al. 2507.01728v1 null