Naive Speculate¶
Project Structure Overview¶
The source code lives under src/naive_speculate/.
Here I briefly introduce the main sub-packages and their responsibilities.
infer¶
This is the core sub-package. It provides abstractions and implementations for language-model inference and KV-cache management. Other sub-packages, such as autoregress and speculate, depend on interfaces defined in infer/interface and their implementations.
In infer/interface, three main protocols are defined: LanguageModel, Inferencer, and KVCache. Implementations are organized under lm, inferencer, and kvcache:
LanguageModelis the foundational protocol. It specifies the behavior required from a model used for inference. Current implementations mainly wrap corresponding models from Hugging Facetransformers.Inferencerdefinesprefillanddecodebehaviors. While the protocol itself is decoupled, concrete inferencers useLanguageModel.forwardinternally.KVCachedefines cache behavior used during inference.Inferencer.prefillandInferencer.decodeboth accept aKVCacheinstance and update it as a side effect.
speculate¶
This sub-package implements speculative decoding.
The primary class is SpeculativeDecoder in decoder.py. It relies on Drafter and Scorer (from drafter.py and scorer.py). Drafter and Scorer wrap an Inferencer or LanguageModel instance to run the underlying inference process.
autoregress¶
This sub-package implements auto-regressive decoding for comparison with speculative decoding.
The main class is AutoregressiveDecoder in decoder.py. It reuses Drafter to drive the inference loop.
config¶
This sub-package defines both user-facing and internal configuration models.
The user-facing format specifies how config.toml is structured. It is converted into an internal format with the concrete details required to initialize LanguageModel, Inferencer, KVCache, and related components.
dependency¶
This sub-package defines how runtime objects are assembled.
It provides DependencyContainer, which holds initialized SpeculativeDecoder, AutoregressiveDecoder, and Tokenizer instances ready for execution.
utils¶
This sub-package contains shared utilities, such as Tokenizer and Timer.
More Information¶
For API reference, see the subsections under reference/naive_speculate or docstrings in the source code.
For setup and CLI usage, see the project README.