#llm

2 posts found

DeepSeek V4: 1.6T MoE Model with 1M Context on EU Server

DeepSeek V4 launches Pro (1.6T) and Flash (284B) MoE models with 1M token context, hybrid attention architecture, and three reasoning modes for EU self-hosting.

April 24, 2026

ai compression quantization llm google cloud

TurboQuant: Google's AI Compression That Now Runs on CPU

Google's TurboQuant achieves 6x KV cache compression with zero accuracy loss, making AI inference on regular CPUs a production reality.

April 1, 2026

#llm

DeepSeek V4: 1.6T MoE Model with 1M Context on EU Server

TurboQuant: Google's AI Compression That Now Runs on CPU

Related Tags