Skip to content

naive_speculate.autoregress.decoder

Define AutoregressiveDecoder, performing autoregressive decoding using a drafter.

AutoregressiveDecodeOut

Bases: NamedTuple

The output of AutoregressiveDecoder.decode method.

Attributes:

Name Type Description
token_ids Tensor

The generated token ids. Shape: [batch_size, num_generated_tokens], where num_generated_tokens <= max_new_tokens.

AutoregressiveDecoder

Performs autoregressive decoding.

AutoregressiveDecoder essentially wraps Drafter, since Drafter already implements the functionality of autoregressive decoding.

Attributes:

Name Type Description
drafter Drafter

The drafter used for generating tokens.

drafter_kvcache KVCache

The key-value cache for the drafter.

decode(query_token_ids, max_new_tokens, sample_strategy)

Perform autoregressive decoding using underlying Drafter.

Currently supports batch_size=1 only.

Decoding stops when <eos> is generated, or when the number of returned tokens would exceed max_new_tokens.

Parameters:

Name Type Description Default
query_token_ids Tensor

Ids of the query tokens. Shape: [batch_size, num_query_tokens].

required
max_new_tokens int

The maximum number of new tokens to generate.

required
sample_strategy SampleStrategy

Sampling strategy for drafting tokens.

required

Returns:

Name Type Description
AutoregressiveDecodeOut AutoregressiveDecodeOut

The output of autoregressive decoding.