Skip to content

Naive Speculate

Project Structure Overview

The source code lives under src/naive_speculate/.

Here I briefly introduce the main sub-packages and their responsibilities.

infer

This is the core sub-package. It provides abstractions and implementations for language-model inference and KV-cache management. Other sub-packages, such as autoregress and speculate, depend on interfaces defined in infer/interface and their implementations.

In infer/interface, three main protocols are defined: LanguageModel, Inferencer, and KVCache. Implementations are organized under lm, inferencer, and kvcache:

  • LanguageModel is the foundational protocol. It specifies the behavior required from a model used for inference. Current implementations mainly wrap corresponding models from Hugging Face transformers.
  • Inferencer defines prefill and decode behaviors. While the protocol itself is decoupled, concrete inferencers use LanguageModel.forward internally.
  • KVCache defines cache behavior used during inference. Inferencer.prefill and Inferencer.decode both accept a KVCache instance and update it as a side effect.

speculate

This sub-package implements speculative decoding.

The primary class is SpeculativeDecoder in decoder.py. It relies on Drafter and Scorer (from drafter.py and scorer.py). Drafter and Scorer wrap an Inferencer or LanguageModel instance to run the underlying inference process.

autoregress

This sub-package implements auto-regressive decoding for comparison with speculative decoding.

The main class is AutoregressiveDecoder in decoder.py. It reuses Drafter to drive the inference loop.

config

This sub-package defines both user-facing and internal configuration models.

The user-facing format specifies how config.toml is structured. It is converted into an internal format with the concrete details required to initialize LanguageModel, Inferencer, KVCache, and related components.

dependency

This sub-package defines how runtime objects are assembled.

It provides DependencyContainer, which holds initialized SpeculativeDecoder, AutoregressiveDecoder, and Tokenizer instances ready for execution.

utils

This sub-package contains shared utilities, such as Tokenizer and Timer.

More Information

For API reference, see the subsections under reference/naive_speculate or docstrings in the source code.

For setup and CLI usage, see the project README.