vllm.entrypoints.pooling.embed.protocol ¶
EmbeddingRequest module-attribute ¶
EmbeddingRequest: TypeAlias = (
EmbeddingCompletionRequest | EmbeddingChatRequest
)
EmbeddingBytesResponse ¶
Bases: OpenAIBaseModel
Source code in vllm/entrypoints/pooling/embed/protocol.py
EmbeddingChatRequest ¶
Bases: OpenAIBaseModel
Source code in vllm/entrypoints/pooling/embed/protocol.py
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 | |
add_generation_prompt class-attribute instance-attribute ¶
add_generation_prompt: bool = Field(
default=False,
description="If true, the generation prompt will be added to the chat template. This is a parameter used by chat template in tokenizer config of the model.",
)
add_special_tokens class-attribute instance-attribute ¶
add_special_tokens: bool = Field(
default=False,
description="If true, special tokens (e.g. BOS) will be added to the prompt on top of what is added by the chat template. For most models, the chat template takes care of adding the special tokens so this should be set to false (as is the default).",
)
chat_template class-attribute instance-attribute ¶
chat_template: str | None = Field(
default=None,
description="A Jinja template to use for this conversion. As of transformers v4.44, default chat template is no longer allowed, so you must provide a chat template if the tokenizer does not define one.",
)
chat_template_kwargs class-attribute instance-attribute ¶
chat_template_kwargs: dict[str, Any] | None = Field(
default=None,
description="Additional keyword args to pass to the template renderer. Will be accessible by the chat template.",
)
embed_dtype class-attribute instance-attribute ¶
embed_dtype: EmbedDType = Field(
default="float32",
description="What dtype to use for encoding. Default to using float32 for base64 encoding to match the OpenAI python client behavior. This parameter will affect base64 and binary_response.",
)
endianness class-attribute instance-attribute ¶
endianness: Endianness = Field(
default="native",
description="What endianness to use for encoding. Default to using native for base64 encoding to match the OpenAI python client behavior.This parameter will affect base64 and binary_response.",
)
mm_processor_kwargs class-attribute instance-attribute ¶
mm_processor_kwargs: dict[str, Any] | None = Field(
default=None,
description="Additional kwargs to pass to the HF processor.",
)
normalize class-attribute instance-attribute ¶
normalize: bool | None = Field(
default=None,
description="Whether to normalize the embeddings outputs. Default is True.",
)
priority class-attribute instance-attribute ¶
priority: int = Field(
default=0,
description="The priority of the request (lower means earlier handling; default: 0). Any priority other than 0 will raise an error if the served model does not use priority scheduling.",
)
request_id class-attribute instance-attribute ¶
request_id: str = Field(
default_factory=random_uuid,
description="The request_id related to this request. If the caller does not set it, a random_uuid will be generated. This id is used through out the inference process and return in response.",
)
truncate_prompt_tokens class-attribute instance-attribute ¶
check_generation_prompt classmethod ¶
Source code in vllm/entrypoints/pooling/embed/protocol.py
EmbeddingCompletionRequest ¶
Bases: OpenAIBaseModel
Source code in vllm/entrypoints/pooling/embed/protocol.py
add_special_tokens class-attribute instance-attribute ¶
add_special_tokens: bool = Field(
default=True,
description="If true (the default), special tokens (e.g. BOS) will be added to the prompt.",
)
embed_dtype class-attribute instance-attribute ¶
embed_dtype: EmbedDType = Field(
default="float32",
description="What dtype to use for encoding. Default to using float32 for base64 encoding to match the OpenAI python client behavior. This parameter will affect base64 and binary_response.",
)
endianness class-attribute instance-attribute ¶
endianness: Endianness = Field(
default="native",
description="What endianness to use for encoding. Default to using native for base64 encoding to match the OpenAI python client behavior.This parameter will affect base64 and binary_response.",
)
normalize class-attribute instance-attribute ¶
normalize: bool | None = Field(
default=None,
description="Whether to normalize the embeddings outputs. Default is True.",
)
priority class-attribute instance-attribute ¶
priority: int = Field(
default=0,
description="The priority of the request (lower means earlier handling; default: 0). Any priority other than 0 will raise an error if the served model does not use priority scheduling.",
)
request_id class-attribute instance-attribute ¶
request_id: str = Field(
default_factory=random_uuid,
description="The request_id related to this request. If the caller does not set it, a random_uuid will be generated. This id is used through out the inference process and return in response.",
)
truncate_prompt_tokens class-attribute instance-attribute ¶
EmbeddingResponse ¶
Bases: OpenAIBaseModel
Source code in vllm/entrypoints/pooling/embed/protocol.py
created class-attribute instance-attribute ¶
id class-attribute instance-attribute ¶
id: str = Field(
default_factory=lambda: f"embd-{random_uuid()}"
)
EmbeddingResponseData ¶
Bases: OpenAIBaseModel