About 1,270,000 results
Open links in new tab
  1. Qwen-VL: A Versatile Vision-Language Model for Understanding ...

    Sep 19, 2023 · In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models (LVLMs) designed to perceive and understand both texts and images. Starting from the …

  2. Gated Attention for Large Language Models: Non-linearity, Sparsity,...

    Sep 18, 2025 · The authors response that they will add experiments in QWen architecture, give the hyperparameters, and promise to open-source one of the models. Reviewer bMKL is the only …

  3. LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation

    Jan 22, 2025 · Superior Performance: LLaVA-MoD surpasses larger models like Qwen-VLChat-7B in various benchmarks, demonstrating the effectiveness of its knowledge distillation approach.

  4. MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context ...

    Jan 22, 2025 · (a) Summary of Scientific Claims and Findings The paper presents MagicDec, a speculative decoding technique aimed at improving throughput and reducing latency for long-context …

  5. In this paper, we explore a way out and present the newest members of the open-sourced Qwen fam-ilies: Qwen-VL series. Qwen-VLs are a series of highly performant and versatile vision-language …

  6. ADIFF: Explaining audio difference using natural language

    Jan 22, 2025 · We evaluate our model using objective metrics and human evaluation and show our model enhancements lead to significant improvements in performance over naive baseline and SoTA …

  7. LiveVQA: Assessing Models with Live Visual Knowledge

    May 6, 2025 · We introduce LiveVQA, an automatically collected dataset of latest visual knowledge from the Internet with synthesized VQA problems. LiveVQA consists of 3,602 single- and multi-hop visual …

  8. Towards Interpretable Time Series Foundation Models - OpenReview

    Jun 9, 2025 · Leveraging a synthetic dataset of mean-reverting time series with systematically varied trends and noise levels, we generate natural language annotations using a large multimodal model …

  9. pi-Flow: Policy-Based Few-Step Generation via Imitation Distillation

    Sep 1, 2025 · Few-step diffusion or flow-based generative models typically distill a velocity-predicting teacher into a student that predicts a shortcut towards denoised data. This format mismatch has led to...

  10. Towards Understanding Distilled Reasoning Models: A...

    Mar 5, 2025 · To explore this, we train a crosscoder on Qwen-series models and their fine-tuned variants. Our results suggest that the crosscoder learns features corresponding to various types of …