naive_speculate.infer.inferencer.basic¶
Define BasicInferencer, implementing Inferencer.
BasicInferencer
¶
Bases: Inferencer
Basic Inferencer implements the Inferencer protocol.
BasicInferencer delegates the forward computation to LanguageModel,
and utilizes it to provide simple implementations for the prefill and decode methods.
Attributes:
| Name | Type | Description |
|---|---|---|
language_model |
LanguageModel
|
The language model used for forwarding. |
decode(query_token_ids, kv_cache, max_new_tokens, sample_strategy)
¶
Process query_token_ids and auto-regressively generate next new tokens.
Check for EOS token after each generation iteration, which means device synchronization will happen at each iteration.
Refers to the interface Inferencer.decode for more details.