Embeddings
Users have a few options to choose from when it comes to embeddings.
OpenAIEmbedding: the default embedding class. Defaults to "text-embedding-ada-002"LangchainEmbedding: a wrapper around Langchain's embedding models.
OpenAI embeddings file.
- llama_index.embeddings.openai.OAEMM
- llama_index.embeddings.openai.OAEMT
- class llama_index.embeddings.openai.OpenAIEmbedding(mode: str = OpenAIEmbeddingMode.TEXT_SEARCH_MODE, model: str = OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002, deployment_name: Optional[str] = None, embed_batch_size: int = 10, tokenizer: Optional[Callable] = None, callback_manager: Optional[CallbackManager] = None, **kwargs: Any)
OpenAI class for embeddings.
- 参数
mode (str) --
Mode for embedding. Defaults to OpenAIEmbeddingMode.TEXT_SEARCH_MODE. Options are:
OpenAIEmbeddingMode.SIMILARITY_MODE
OpenAIEmbeddingMode.TEXT_SEARCH_MODE
model (str) --
Model for embedding. Defaults to OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002. Options are:
OpenAIEmbeddingModelType.DAVINCI
OpenAIEmbeddingModelType.CURIE
OpenAIEmbeddingModelType.BABBAGE
OpenAIEmbeddingModelType.ADA
OpenAIEmbeddingModelType.TEXT_EMBED_ADA_002
deployment_name (Optional[str]) -- Optional deployment of model. Defaults to None. If this value is not None, mode and model will be ignored. Only available for using AzureOpenAI.
- async aget_queued_text_embeddings(text_queue: List[Tuple[str, str]]) Tuple[List[str], List[List[float]]]
Asynchronously get a list of text embeddings.
Call async embedding API to get embeddings for all queued texts in parallel. Argument text_queue must be passed in to avoid updating it async.
- get_agg_embedding_from_queries(queries: List[str], agg_fn: Optional[Callable[[...], List[float]]] = None) List[float]
Get aggregated embedding from multiple queries.
- get_query_embedding(query: str) List[float]
Get query embedding.
- get_queued_text_embeddings() Tuple[List[str], List[List[float]]]
Get queued text embeddings.
Call embedding API to get embeddings for all queued texts.
- get_text_embedding(text: str) List[float]
Get text embedding.
- property last_token_usage: int
Get the last token usage.
- queue_text_for_embedding(text_id: str, text: str) None
Queue text for embedding.
Used for batching texts during embedding calls.
- similarity(embedding1: List, embedding2: List, mode: SimilarityMode = SimilarityMode.DEFAULT) float
Get embedding similarity.
- property total_tokens_used: int
Get the total tokens used so far.
- class llama_index.embeddings.openai.OpenAIEmbeddingModeModel(value)
OpenAI embedding mode model.
- class llama_index.embeddings.openai.OpenAIEmbeddingModelType(value)
OpenAI embedding model type.
- async llama_index.embeddings.openai.aget_embedding(text: str, engine: Optional[str] = None, **kwargs: Any) List[float]
Asynchronously get embedding.
NOTE: Copied from OpenAI's embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- async llama_index.embeddings.openai.aget_embeddings(list_of_text: List[str], engine: Optional[str] = None, **kwargs: Any) List[List[float]]
Asynchronously get embeddings.
NOTE: Copied from OpenAI's embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- llama_index.embeddings.openai.get_embedding(text: str, engine: Optional[str] = None, **kwargs: Any) List[float]
Get embedding.
NOTE: Copied from OpenAI's embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- llama_index.embeddings.openai.get_embeddings(list_of_text: List[str], engine: Optional[str] = None, **kwargs: Any) List[List[float]]
Get embeddings.
NOTE: Copied from OpenAI's embedding utils: https://github.com/openai/openai-python/blob/main/openai/embeddings_utils.py
Copied here to avoid importing unnecessary dependencies like matplotlib, plotly, scipy, sklearn.
- llama_index.embeddings.openai.get_engine(mode: str, model: str, mode_model_dict: Dict[Tuple[OpenAIEmbeddingMode, str], OpenAIEmbeddingModeModel]) OpenAIEmbeddingModeModel
Get engine.
We also introduce a LangchainEmbedding class, which is a wrapper around Langchain's embedding models.
A full list of embeddings can be found here.
Langchain Embedding Wrapper Module.
- class llama_index.embeddings.langchain.LangchainEmbedding(langchain_embedding: Embeddings, **kwargs: Any)
External embeddings (taken from Langchain).
- 参数
langchain_embedding (langchain.embeddings.Embeddings) -- Langchain embeddings class.
- async aget_queued_text_embeddings(text_queue: List[Tuple[str, str]]) Tuple[List[str], List[List[float]]]
Asynchronously get a list of text embeddings.
Call async embedding API to get embeddings for all queued texts in parallel. Argument text_queue must be passed in to avoid updating it async.
- get_agg_embedding_from_queries(queries: List[str], agg_fn: Optional[Callable[[...], List[float]]] = None) List[float]
Get aggregated embedding from multiple queries.
- get_query_embedding(query: str) List[float]
Get query embedding.
- get_queued_text_embeddings() Tuple[List[str], List[List[float]]]
Get queued text embeddings.
Call embedding API to get embeddings for all queued texts.
- get_text_embedding(text: str) List[float]
Get text embedding.
- property last_token_usage: int
Get the last token usage.
- queue_text_for_embedding(text_id: str, text: str) None
Queue text for embedding.
Used for batching texts during embedding calls.
- similarity(embedding1: List, embedding2: List, mode: SimilarityMode = SimilarityMode.DEFAULT) float
Get embedding similarity.
- property total_tokens_used: int
Get the total tokens used so far.