#llm

2 posts found

DeepSeek V4: 1.6T MoE Model with 1M Context on EU Server
aideepseekllm

DeepSeek V4: 1.6T MoE Model with 1M Context on EU Server

DeepSeek V4 launches Pro (1.6T) and Flash (284B) MoE models with 1M token context, hybrid attention architecture, and three reasoning modes for EU self-hosting.

TurboQuant: Google's AI Compression That Now Runs on CPU
aicompressionquantizationllmgooglecloud

TurboQuant: Google's AI Compression That Now Runs on CPU

Google's TurboQuant achieves 6x KV cache compression with zero accuracy loss, making AI inference on regular CPUs a production reality.