naive_speculate.autoregress.decoder¶
Define AutoregressiveDecoder, performing autoregressive decoding using a drafter.
AutoregressiveDecodeOut
¶
Bases: NamedTuple
The output of AutoregressiveDecoder.decode method.
Attributes:
| Name | Type | Description |
|---|---|---|
token_ids |
Tensor
|
The generated token ids. Shape: |
AutoregressiveDecoder
¶
Performs autoregressive decoding.
AutoregressiveDecoder essentially wraps Drafter, since Drafter already
implements the functionality of autoregressive decoding.
Attributes:
| Name | Type | Description |
|---|---|---|
drafter |
Drafter
|
The drafter used for generating tokens. |
drafter_kvcache |
KVCache
|
The key-value cache for the drafter. |
decode(query_token_ids, max_new_tokens, sample_strategy)
¶
Perform autoregressive decoding using underlying Drafter.
Currently supports batch_size=1 only.
Decoding stops when <eos> is generated, or when the number of returned tokens
would exceed max_new_tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
query_token_ids
|
Tensor
|
Ids of the query tokens. Shape: |
required |
max_new_tokens
|
int
|
The maximum number of new tokens to generate. |
required |
sample_strategy
|
SampleStrategy
|
Sampling strategy for drafting tokens. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
AutoregressiveDecodeOut |
AutoregressiveDecodeOut
|
The output of autoregressive decoding. |