Qwen 3.5 Flash

Qwen 3.5 Flash is Alibaba's production-hosted multimodal model built on a hybrid linear-attention MoE architecture, offering a context window of 1M tokens and sub-second responsiveness for high-throughput agentic workloads.

Vision (Image)Explicit CachingFile InputReasoningTool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3.5-flash',
  prompt: 'Why is the sky blue?'
})

Overview Playground About Providers Throughput Latency Uptime Status Similar FAQ

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

2.6s

68tps

$1.25/M

$3.75/M

Read:$0.25/M

Write:$1.56/M

—

05/21/2026

240K

2.5s

52tps

$1.30/M

$7.80/M

Read:

$0.26/M

Write:

$1.63/M

—

04/20/2026

0.7s

107tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

1.2s

110tps

$0.40/M

$2.40/M

Read:

$0.04/M

Write:

$0.5/M

—

02/16/2026

256K

2.0s

87tps

$0.50/M

$1.20/M

—

07/22/2025

33K

$0.02/M

—

06/05/2025

AI Cloud

Core Platform

Security

Company

Learn

Open Source

Use Cases

Tools

Users

Qwen 3.5 Flash

More models by Alibaba