naive_speculate.autoregress.decoder¶

Define AutoregressiveDecoder, performing autoregressive decoding using a drafter.

`AutoregressiveDecodeOut` ¶

The output of AutoregressiveDecoder.decode method.

Attributes:

Name	Type	Description
`token_ids`	`Tensor`	The generated token ids. Shape: `[batch_size, num_generated_tokens]`, where `num_generated_tokens <= max_new_tokens`.

Performs autoregressive decoding.

AutoregressiveDecoder essentially wraps Drafter, since Drafter already implements the functionality of autoregressive decoding.

Attributes:

Name	Type	Description
`drafter`	`Drafter`	The drafter used for generating tokens.
`drafter_kvcache`	`KVCache`	The key-value cache for the drafter.

Perform autoregressive decoding using underlying Drafter.

Currently supports batch_size=1 only.

Decoding stops when <eos> is generated, or when the number of returned tokens would exceed max_new_tokens.

Parameters:

Name	Type	Description	Default
`query_token_ids`	`Tensor`	Ids of the query tokens. Shape: `[batch_size, num_query_tokens]`.	required
`max_new_tokens`	`int`	The maximum number of new tokens to generate.	required
`sample_strategy`	`SampleStrategy`	Sampling strategy for drafting tokens.	required

Returns:

Name	Type	Description
`AutoregressiveDecodeOut`	`AutoregressiveDecodeOut`	The output of autoregressive decoding.