Skip to content

naive_speculate.speculate.scorer

Define Scorer, implementing token scoring (logits computing) functionality.

ScoreOut

Bases: NamedTuple

Output of Scorer.score method.

Contains the logits at each position of the query tokens. The logits at position i are used to predict the token at position i+1.

Attributes:

Name Type Description
token_logits Tensor

Logits at the query token positions. Shape [batch_size, num_query_tokens, vocab_size].

Scorer

Scorer is able to process given tokens and produce corresponding token logits (scores).

Scorer delegates scoring to a LanguageModel instance.

Scorer only scores the given query tokens, and does not generate any new tokens even though the scores can be used to do so.

In the context of speculative decoding, scoring is part of the verification process, where the speculative decoder will do speculative sampling based on the token logits produced by the scorer.

Attributes:

Name Type Description
language_model LanguageModel

The language model used for scoring.

score(query_token_ids, kv_cache)

Score the given query tokens using the provided key-value cache.

kv_cache will be updated internally as a side effect of this method.

Return the output includes logits for all query token positions, where position i gives the logits for predicting token i+1.

Parameters:

Name Type Description Default
query_token_ids Tensor

Query tokens to be scored. Shape [batch_size, num_query_tokens].

required
kv_cache KVCache

Keys and values tensor for past context tokens.

required

Returns:

Name Type Description
ScoreOut ScoreOut

The scoring result containing token logits. Shape [batch_size, num_query_tokens, vocab_size].