LLaVA-OneVision-2: Towards Next-Generation Perceptual Intelligence Paper • 2605.25979 • Published 8 days ago • 25
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation Paper • 2605.23271 • Published 11 days ago • 78
GEM: Generative Supervision Helps Embodied Intelligence Paper • 2605.28548 • Published 6 days ago • 39
ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation Paper • 2605.28293 • Published 6 days ago • 82
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 6 days ago • 410
SpatialBench: Is Your Spatial Foundation Model an All-Round Player? Paper • 2605.27367 • Published 7 days ago • 68
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 7 days ago • 130
MolmoAct2: Action Reasoning Models for Real-world Deployment Paper • 2605.02881 • Published 29 days ago • 347
Generated Reality: Human-centric World Simulation using Interactive Video Generation with Hand and Camera Control Paper • 2602.18422 • Published Feb 20 • 30
RISE: Self-Improving Robot Policy with Compositional World Model Paper • 2602.11075 • Published Feb 11 • 29
GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning Paper • 2602.12099 • Published Feb 12 • 62
Xiaomi-Robotics-0: An Open-Sourced Vision-Language-Action Model with Real-Time Execution Paper • 2602.12684 • Published Feb 13 • 7
RLinf-Co: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models Paper • 2602.12628 • Published Feb 13 • 12
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Paper • 2602.08683 • Published Feb 9 • 52
Olaf-World: Orienting Latent Actions for Video World Modeling Paper • 2602.10104 • Published Feb 10 • 27
WorldCompass: Reinforcement Learning for Long-Horizon World Models Paper • 2602.09022 • Published Feb 9 • 21
Recurrent-Depth VLA: Implicit Test-Time Compute Scaling of Vision-Language-Action Models via Latent Iterative Reasoning Paper • 2602.07845 • Published Feb 8 • 71
MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents Paper • 2602.02474 • Published Feb 2 • 63