Vector Store

Vector stores.

class llama_index.vector_stores.ChatGPTRetrievalPluginClient(endpoint_url: str, bearer_token: Optional[str] = None, retries: Optional[Retry] = None, batch_size: int = 100, **kwargs: Any)

ChatGPT Retrieval Plugin Client.

In this client, we make use of the endpoints defined by ChatGPT.

参数

endpoint_url (str) -- URL of the ChatGPT Retrieval Plugin.
bearer_token (Optional[str]) -- Bearer token for the ChatGPT Retrieval Plugin.
retries (Optional[Retry]) -- Retry object for the ChatGPT Retrieval Plugin.
batch_size (int) -- Batch size for the ChatGPT Retrieval Plugin.

add(embedding_results: List[NodeWithEmbedding]) → List[str]: Add embedding_results to index.

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Get nodes for response.

class llama_index.vector_stores.ChromaVectorStore(chroma_collection: Any, **kwargs: Any)

Chroma vector store.

In this vector store, embeddings are stored within a ChromaDB collection.

During query time, the index uses ChromaDB to query for the top k most similar nodes.

参数: chroma_collection (chromadb.api.models.Collection.Collection) -- ChromaDB collection instance

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.DeepLakeVectorStore(dataset_path: str = 'llama_index', token: Optional[str] = None, read_only: Optional[bool] = False, ingestion_batch_size: int = 1024, ingestion_num_workers: int = 4, overwrite: bool = False)

The DeepLake Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a deeplake dataset. This implemnetation allows the use of an already existing deeplake dataset if it is one that was created this vector store. It also supports creating a new one if the dataset doesnt exist or if overwrite is set to True.

参数

deeplake_path (str, optional) -- Path to the deeplake dataset, where data will be
"llama_index". (stored. Defaults to) --
overwrite (bool, optional) -- Whether to overwrite existing dataset with same name. Defaults to False.
token (str, optional) -- the deeplake token that allows you to access the dataset with proper access. Defaults to None.
read_only (bool, optional) -- Whether to open the dataset with read only mode.
ingestion_batch_size (bool, 1024) -- used for controlling batched data injestion to deeplake dataset. Defaults to 1024.
injestion_num_workers (int, 1) -- number of workers to use during data injestion. Defaults to 4.
overwrite -- Whether to overwrite existing dataset with the new dataset with the same name.

抛出

ImportError -- Unable to import deeplake.
UserNotLoggedinException -- When user is not logged in with credentials or token.
TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.
InvalidTokenException -- If the specified token is invalid

返回

Vectorstore that supports add, delete, and query.

返回类型

DeepLakeVectorstore

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add the embeddings and their nodes into DeepLake.

参数

embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.

抛出

UserNotLoggedinException -- When user is not logged in with credentials or token.
TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.
InvalidTokenException -- If the specified token is invalid

返回

List of ids inserted.

返回类型

List[str]

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.FaissVectorStore(faiss_index: Any)

Faiss Vector Store.

Embeddings are stored within a Faiss index.

During query time, the index uses Faiss to query for the top k embeddings, and returns the corresponding indices.

参数: faiss_index (faiss.Index) -- Faiss index instance

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

NOTE: in the Faiss vector store, we do not store text in Faiss.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return the faiss index.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) → None

Save to file.

This method saves the vector store to disk.

参数: persist_path (str) -- The save_path of the file.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.LanceDBVectorStore(uri: str, table_name: str = 'vectors', nprobes: int = 20, refine_factor: Optional[int] = None, **kwargs: Any)

The LanceDB Vector Store.

Stores text and embeddings in LanceDB. The vector store will open an existing: LanceDB dataset or create the dataset if it does not exist.

参数

uri (str, required) -- Location where LanceDB will store its files.
table_name (str, optional) -- The table name where the embeddings will be stored. Defaults to "vectors".
nprobes (int, optional) -- The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.
refine_factor -- (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

抛出

ImportError -- Unable to import lancedb.

返回

VectorStore that supports creating LanceDB datasets and: querying it.

返回类型

LanceDBVectorStore

add(embedding_results: List[NodeWithEmbedding]) → List[str]: Add embedding results to vector store.

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Query index for top k most similar nodes.

class llama_index.vector_stores.MetalVectorStore(api_key: str, client_id: str, index_id: str)

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeEmbeddingResult]: list of embedding results

property client: Any: Return Metal client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Query vector store.

class llama_index.vector_stores.MilvusVectorStore(collection_name: str = 'llamalection', index_params: Optional[dict] = None, search_params: Optional[dict] = None, dim: Optional[int] = None, host: str = 'localhost', port: int = 19530, user: str = '', password: str = '', use_secure: bool = False, overwrite: bool = False, **kwargs: Any)

The Milvus Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a Milvus collection. This implemnetation allows the use of an already existing collection if it is one that was created this vector store. It also supports creating a new one if the collection doesnt exist or if overwrite is set to True.

参数

collection_name (str, optional) -- The name of the collection where data will be stored. Defaults to "llamalection".
index_params (dict, optional) -- The index parameters for Milvus, if none are provided an HNSW index will be used. Defaults to None.
search_params (dict, optional) -- The search parameters for a Milvus query. If none are provided, default params will be generated. Defaults to None.
dim (int, optional) -- The dimension of the embeddings. If it is not provided, collection creation will be done on first insert. Defaults to None.
host (str, optional) -- The host address of Milvus. Defaults to "localhost".
port (int, optional) -- The port of Milvus. Defaults to 19530.
user (str, optional) -- The username for RBAC. Defaults to "".
password (str, optional) -- The password for RBAC. Defaults to "".
use_secure (bool, optional) -- Use https. Required for Zilliz Cloud. Defaults to False.
overwrite (bool, optional) -- Whether to overwrite existing collection with same name. Defaults to False.

抛出

ImportError -- Unable to import pymilvus.
MilvusException -- Error communicating with Milvus, more can be found in logging under Debug.

返回

Vectorstore that supports add, delete, and query.

返回类型

MilvusVectorstore

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add the embeddings and their nodes into Milvus.

参数: embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.
抛出: MilvusException -- Failed to insert data.
返回: List of ids inserted.
返回类型: List[str]

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.
抛出: MilvusException -- Failed to delete the doc.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes
doc_ids (Optional[List[str]]) -- list of doc_ids to filter by

class llama_index.vector_stores.MyScaleVectorStore(myscale_client: Optional[Any] = None, table: str = 'llama_index', database: str = 'default', index_type: str = 'IVFFLAT', metric: str = 'cosine', batch_size: int = 32, index_params: Optional[dict] = None, search_params: Optional[dict] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any)

MyScale Vector Store.

In this vector store, embeddings and docs are stored within an existing MyScale cluster.

During query time, the index uses MyScale to query for the top k most similar nodes.

参数

myscale_client (httpclient) -- clickhouse-connect httpclient of an existing MyScale cluster.
table (str, optional) -- The name of the MyScale table where data will be stored. Defaults to "llama_index".
database (str, optional) -- The name of the MyScale database where data will be stored. Defaults to "default".
index_type (str, optional) -- The type of the MyScale vector index. Defaults to "IVFFLAT".
metric (str, optional) -- The metric type of the MyScale vector index. Defaults to "cosine".
batch_size (int, optional) -- the size of documents to insert. Defaults to 32.
index_params (dict, optional) -- The index parameters for MyScale. Defaults to None.
search_params (dict, optional) -- The search parameters for a MyScale query. Defaults to None.
service_context (ServiceContext, optional) -- Vector store service context. Defaults to None

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

drop() → None: Drop MyScale Index and table

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数: query (VectorStoreQuery) -- query

class llama_index.vector_stores.OpensearchVectorClient(endpoint: str, index: str, dim: int, embedding_field: str = 'embedding', text_field: str = 'content', extra_info_field: str = 'extra_info', method: Optional[dict] = None, auth: Optional[dict] = None)

Object encapsulating an Opensearch index that has vector search enabled.

If the index does not yet exist, it is created during init. Therefore, the underlying index is assumed to either: 1) not exist yet or 2) be created due to previous usage of this class.

参数

endpoint (str) -- URL (http/https) of elasticsearch endpoint
index (str) -- Name of the elasticsearch index
dim (int) -- Dimension of the vector
embedding_field (str) -- Name of the field in the index to store embedding array in.
text_field (str) -- Name of the field to grab text from
method (Optional[dict]) -- Opensearch "method" JSON obj for configuring the KNN index. This includes engine, metric, and other config params. Defaults to: {"name": "hnsw", "space_type": "l2", "engine": "faiss", "parameters": {"ef_construction": 256, "m": 48}}

delete_doc_id(doc_id: str) → None

Delete a document.

参数: doc_id (str) -- document id

do_approx_knn(query_embedding: List[float], k: int) → VectorStoreQueryResult: Do approximate knn.

index_results(results: List[NodeWithEmbedding]) → List[str]: Store results in the index.

class llama_index.vector_stores.OpensearchVectorStore(client: OpensearchVectorClient)

Elasticsearch/Opensearch vector store.

参数: client (OpensearchVectorClient) -- Vector index client to use for data insertion/querying.

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.PineconeVectorStore(pinecone_index: Optional[Any] = None, index_name: Optional[str] = None, environment: Optional[str] = None, namespace: Optional[str] = None, insert_kwargs: Optional[Dict] = None, add_sparse_vector: bool = False, tokenizer: Optional[Callable] = None, **kwargs: Any)

Pinecone Vector Store.

In this vector store, embeddings and docs are stored within a Pinecone index.

During query time, the index uses Pinecone to query for the top k most similar nodes.

参数

pinecone_index (Optional[pinecone.Index]) -- Pinecone index instance
insert_kwargs (Optional[Dict]) -- insert kwargs during upsert call.
add_sparse_vector (bool) -- whether to add sparse vector to index.
tokenizer (Optional[Callable]) -- tokenizer to use to generate sparse

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return Pinecone client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.QdrantVectorStore(collection_name: str, client: Optional[Any] = None, **kwargs: Any)

Qdrant Vector Store.

In this vector store, embeddings and docs are stored within a Qdrant collection.

During query time, the index uses Qdrant to query for the top k most similar nodes.

参数

collection_name -- (str): name of the Qdrant collection
client (Optional[Any]) -- QdrantClient instance from qdrant-client package

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return the Qdrant client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数: query (VectorStoreQuery) -- query

class llama_index.vector_stores.RedisVectorStore(index_name: str, index_prefix: str = 'llama_index', index_args: Optional[Dict[str, Any]] = None, redis_url: str = 'redis://localhost:6379', overwrite: bool = False, **kwargs: Any)

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to the index.

参数: embedding_results (List[NodeWithEmbedding]) -- List of embedding results to add to the index.
返回: List of ids of the documents added to the index.
返回类型: List[str]
抛出: ValueError -- If the index already exists and overwrite is False.

property client: RedisType: Return the redis client instance

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

delete_index() → None: Delete the index and all documents.

persist(persist_path: str, fs: Optional[AbstractFileSystem] = None, in_background: bool = True) → None

Persist the vector store to disk.

参数

persist_path (str) -- Path to persist the vector store to. (doesn't apply)
in_background (bool, optional) -- Persist in background. Defaults to True.
fs (fsspec.AbstractFileSystem, optional) -- Filesystem to persist to. (doesn't apply)

抛出

redis.exceptions.RedisError -- If there is an error persisting the index to disk.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query the index.

参数

query (VectorStoreQuery) -- query object

返回

query result

返回类型

VectorStoreQueryResult

抛出

ValueError -- If query.query_embedding is None.
redis.exceptions.RedisError -- If there is an error querying the index.
redis.exceptions.TimeoutError -- If there is a timeout querying the index.

class llama_index.vector_stores.SimpleVectorStore(data: Optional[SimpleVectorStoreData] = None, fs: Optional[AbstractFileSystem] = None, **kwargs: Any)

Simple Vector Store.

In this vector store, embeddings are stored within a simple, in-memory dictionary.

参数: simple_vector_store_data_dict (Optional[dict]) -- data dict containing the embeddings and doc_ids. See SimpleVectorStoreData for more details.

add(embedding_results: List[NodeWithEmbedding]) → List[str]: Add embedding_results to index.

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

classmethod from_persist_dir(persist_dir: str = './storage', fs: Optional[AbstractFileSystem] = None) → SimpleVectorStore: Load from persist dir.

classmethod from_persist_path(persist_path: str, fs: Optional[AbstractFileSystem] = None) → SimpleVectorStore: Create a SimpleKVStore from a persist directory.

get(text_id: str) → List[float]: Get embedding.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) → None: Persist the SimpleVectorStore to a directory.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Get nodes for response.

class llama_index.vector_stores.SupabaseVectorStore(postgres_connection_string: str, collection_name: str, dimension: int = 1536, **kwargs: Any)

Supbabase Vector.

In this vector store, embeddings are stored in Postgres table using pgvector.

During query time, the index uses pgvector/Supabase to query for the top k most similar nodes.

参数

postgres_connection_string (str) -- postgres connection string
collection_name (str) -- name of the collection to store the embeddings in

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete doc.

参数: doc_id (str) -- document id

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数: query (List[float]) -- query embedding

class llama_index.vector_stores.WeaviateVectorStore(weaviate_client: Optional[Any] = None, class_prefix: Optional[str] = None, **kwargs: Any)

Weaviate vector store.

In this vector store, embeddings and docs are stored within a Weaviate collection.

During query time, the index uses Weaviate to query for the top k most similar nodes.

参数

weaviate_client (weaviate.Client) -- WeaviateClient instance from weaviate-client package
class_prefix (Optional[str]) -- prefix for Weaviate classes

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Query index for top k most similar nodes.

向量存储

Vector stores.

class llama_index.vector_stores.ChatGPTRetrievalPluginClient(endpoint_url: str, bearer_token: Optional[str] = None, retries: Optional[Retry] = None, batch_size: int = 100, **kwargs: Any)

ChatGPT Retrieval Plugin Client.

In this client, we make use of the endpoints defined by ChatGPT.

参数

endpoint_url (str) -- URL of the ChatGPT Retrieval Plugin.
bearer_token (Optional[str]) -- Bearer token for the ChatGPT Retrieval Plugin.
retries (Optional[Retry]) -- Retry object for the ChatGPT Retrieval Plugin.
batch_size (int) -- Batch size for the ChatGPT Retrieval Plugin.

add(embedding_results: List[NodeWithEmbedding]) → List[str]: Add embedding_results to index.

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Get nodes for response.

class llama_index.vector_stores.ChromaVectorStore(chroma_collection: Any, **kwargs: Any)

Chroma vector store.

In this vector store, embeddings are stored within a ChromaDB collection.

During query time, the index uses ChromaDB to query for the top k most similar nodes.

参数: chroma_collection (chromadb.api.models.Collection.Collection) -- ChromaDB collection instance

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

The DeepLake Vector Store.

参数

deeplake_path (str, optional) -- Path to the deeplake dataset, where data will be
"llama_index". (stored. Defaults to) --
overwrite (bool, optional) -- Whether to overwrite existing dataset with same name. Defaults to False.
token (str, optional) -- the deeplake token that allows you to access the dataset with proper access. Defaults to None.
read_only (bool, optional) -- Whether to open the dataset with read only mode.
ingestion_batch_size (bool, 1024) -- used for controlling batched data injestion to deeplake dataset. Defaults to 1024.
injestion_num_workers (int, 1) -- number of workers to use during data injestion. Defaults to 4.
overwrite -- Whether to overwrite existing dataset with the new dataset with the same name.

抛出

ImportError -- Unable to import deeplake.
UserNotLoggedinException -- When user is not logged in with credentials or token.
TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.
InvalidTokenException -- If the specified token is invalid

返回

Vectorstore that supports add, delete, and query.

返回类型

DeepLakeVectorstore

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add the embeddings and their nodes into DeepLake.

参数

embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.

抛出

UserNotLoggedinException -- When user is not logged in with credentials or token.
TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.
InvalidTokenException -- If the specified token is invalid

返回

List of ids inserted.

返回类型

List[str]

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.FaissVectorStore(faiss_index: Any)

Faiss Vector Store.

Embeddings are stored within a Faiss index.

During query time, the index uses Faiss to query for the top k embeddings, and returns the corresponding indices.

参数: faiss_index (faiss.Index) -- Faiss index instance

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

NOTE: in the Faiss vector store, we do not store text in Faiss.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return the faiss index.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) → None

Save to file.

This method saves the vector store to disk.

参数: persist_path (str) -- The save_path of the file.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.LanceDBVectorStore(uri: str, table_name: str = 'vectors', nprobes: int = 20, refine_factor: Optional[int] = None, **kwargs: Any)

The LanceDB Vector Store.

Stores text and embeddings in LanceDB. The vector store will open an existing: LanceDB dataset or create the dataset if it does not exist.

参数

uri (str, required) -- Location where LanceDB will store its files.
table_name (str, optional) -- The table name where the embeddings will be stored. Defaults to "vectors".
nprobes (int, optional) -- The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.
refine_factor -- (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

抛出

ImportError -- Unable to import lancedb.

返回

VectorStore that supports creating LanceDB datasets and: querying it.

返回类型

LanceDBVectorStore

add(embedding_results: List[NodeWithEmbedding]) → List[str]: Add embedding results to vector store.

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Query index for top k most similar nodes.

class llama_index.vector_stores.MetalVectorStore(api_key: str, client_id: str, index_id: str)

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeEmbeddingResult]: list of embedding results

property client: Any: Return Metal client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Query vector store.

The Milvus Vector Store.

参数

collection_name (str, optional) -- The name of the collection where data will be stored. Defaults to "llamalection".
index_params (dict, optional) -- The index parameters for Milvus, if none are provided an HNSW index will be used. Defaults to None.
search_params (dict, optional) -- The search parameters for a Milvus query. If none are provided, default params will be generated. Defaults to None.
dim (int, optional) -- The dimension of the embeddings. If it is not provided, collection creation will be done on first insert. Defaults to None.
host (str, optional) -- The host address of Milvus. Defaults to "localhost".
port (int, optional) -- The port of Milvus. Defaults to 19530.
user (str, optional) -- The username for RBAC. Defaults to "".
password (str, optional) -- The password for RBAC. Defaults to "".
use_secure (bool, optional) -- Use https. Required for Zilliz Cloud. Defaults to False.
overwrite (bool, optional) -- Whether to overwrite existing collection with same name. Defaults to False.

抛出

ImportError -- Unable to import pymilvus.
MilvusException -- Error communicating with Milvus, more can be found in logging under Debug.

返回

Vectorstore that supports add, delete, and query.

返回类型

MilvusVectorstore

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add the embeddings and their nodes into Milvus.

参数: embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.
抛出: MilvusException -- Failed to insert data.
返回: List of ids inserted.
返回类型: List[str]

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.
抛出: MilvusException -- Failed to delete the doc.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes
doc_ids (Optional[List[str]]) -- list of doc_ids to filter by

MyScale Vector Store.

In this vector store, embeddings and docs are stored within an existing MyScale cluster.

During query time, the index uses MyScale to query for the top k most similar nodes.

参数

myscale_client (httpclient) -- clickhouse-connect httpclient of an existing MyScale cluster.
table (str, optional) -- The name of the MyScale table where data will be stored. Defaults to "llama_index".
database (str, optional) -- The name of the MyScale database where data will be stored. Defaults to "default".
index_type (str, optional) -- The type of the MyScale vector index. Defaults to "IVFFLAT".
metric (str, optional) -- The metric type of the MyScale vector index. Defaults to "cosine".
batch_size (int, optional) -- the size of documents to insert. Defaults to 32.
index_params (dict, optional) -- The index parameters for MyScale. Defaults to None.
search_params (dict, optional) -- The search parameters for a MyScale query. Defaults to None.
service_context (ServiceContext, optional) -- Vector store service context. Defaults to None

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

drop() → None: Drop MyScale Index and table

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数: query (VectorStoreQuery) -- query

Object encapsulating an Opensearch index that has vector search enabled.

If the index does not yet exist, it is created during init. Therefore, the underlying index is assumed to either: 1) not exist yet or 2) be created due to previous usage of this class.

参数

endpoint (str) -- URL (http/https) of elasticsearch endpoint
index (str) -- Name of the elasticsearch index
dim (int) -- Dimension of the vector
embedding_field (str) -- Name of the field in the index to store embedding array in.
text_field (str) -- Name of the field to grab text from
method (Optional[dict]) -- Opensearch "method" JSON obj for configuring the KNN index. This includes engine, metric, and other config params. Defaults to: {"name": "hnsw", "space_type": "l2", "engine": "faiss", "parameters": {"ef_construction": 256, "m": 48}}

delete_doc_id(doc_id: str) → None

Delete a document.

参数: doc_id (str) -- document id

do_approx_knn(query_embedding: List[float], k: int) → VectorStoreQueryResult: Do approximate knn.

index_results(results: List[NodeWithEmbedding]) → List[str]: Store results in the index.

class llama_index.vector_stores.OpensearchVectorStore(client: OpensearchVectorClient)

Elasticsearch/Opensearch vector store.

参数: client (OpensearchVectorClient) -- Vector index client to use for data insertion/querying.

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

Pinecone Vector Store.

In this vector store, embeddings and docs are stored within a Pinecone index.

During query time, the index uses Pinecone to query for the top k most similar nodes.

参数

pinecone_index (Optional[pinecone.Index]) -- Pinecone index instance
insert_kwargs (Optional[Dict]) -- insert kwargs during upsert call.
add_sparse_vector (bool) -- whether to add sparse vector to index.
tokenizer (Optional[Callable]) -- tokenizer to use to generate sparse

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return Pinecone client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query_embedding (List[float]) -- query embedding
similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.QdrantVectorStore(collection_name: str, client: Optional[Any] = None, **kwargs: Any)

Qdrant Vector Store.

In this vector store, embeddings and docs are stored within a Qdrant collection.

During query time, the index uses Qdrant to query for the top k most similar nodes.

参数

collection_name -- (str): name of the Qdrant collection
client (Optional[Any]) -- QdrantClient instance from qdrant-client package

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Return the Qdrant client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数: query (VectorStoreQuery) -- query

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to the index.

参数: embedding_results (List[NodeWithEmbedding]) -- List of embedding results to add to the index.
返回: List of ids of the documents added to the index.
返回类型: List[str]
抛出: ValueError -- If the index already exists and overwrite is False.

property client: RedisType: Return the redis client instance

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

delete_index() → None: Delete the index and all documents.

persist(persist_path: str, fs: Optional[AbstractFileSystem] = None, in_background: bool = True) → None

Persist the vector store to disk.

参数

persist_path (str) -- Path to persist the vector store to. (doesn't apply)
in_background (bool, optional) -- Persist in background. Defaults to True.
fs (fsspec.AbstractFileSystem, optional) -- Filesystem to persist to. (doesn't apply)

抛出

redis.exceptions.RedisError -- If there is an error persisting the index to disk.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query the index.

参数

query (VectorStoreQuery) -- query object

返回

query result

返回类型

VectorStoreQueryResult

抛出

ValueError -- If query.query_embedding is None.
redis.exceptions.RedisError -- If there is an error querying the index.
redis.exceptions.TimeoutError -- If there is a timeout querying the index.

class llama_index.vector_stores.SimpleVectorStore(data: Optional[SimpleVectorStoreData] = None, fs: Optional[AbstractFileSystem] = None, **kwargs: Any)

Simple Vector Store.

In this vector store, embeddings are stored within a simple, in-memory dictionary.

参数: simple_vector_store_data_dict (Optional[dict]) -- data dict containing the embeddings and doc_ids. See SimpleVectorStoreData for more details.

add(embedding_results: List[NodeWithEmbedding]) → List[str]: Add embedding_results to index.

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

classmethod from_persist_dir(persist_dir: str = './storage', fs: Optional[AbstractFileSystem] = None) → SimpleVectorStore: Load from persist dir.

classmethod from_persist_path(persist_path: str, fs: Optional[AbstractFileSystem] = None) → SimpleVectorStore: Create a SimpleKVStore from a persist directory.

get(text_id: str) → List[float]: Get embedding.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) → None: Persist the SimpleVectorStore to a directory.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Get nodes for response.

class llama_index.vector_stores.SupabaseVectorStore(postgres_connection_string: str, collection_name: str, dimension: int = 1536, **kwargs: Any)

Supbabase Vector.

In this vector store, embeddings are stored in Postgres table using pgvector.

During query time, the index uses pgvector/Supabase to query for the top k most similar nodes.

参数

postgres_connection_string (str) -- postgres connection string
collection_name (str) -- name of the collection to store the embeddings in

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: None: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete doc.

参数: doc_id (str) -- document id

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult

Query index for top k most similar nodes.

参数: query (List[float]) -- query embedding

class llama_index.vector_stores.WeaviateVectorStore(weaviate_client: Optional[Any] = None, class_prefix: Optional[str] = None, **kwargs: Any)

Weaviate vector store.

In this vector store, embeddings and docs are stored within a Weaviate collection.

During query time, the index uses Weaviate to query for the top k most similar nodes.

参数

weaviate_client (weaviate.Client) -- WeaviateClient instance from weaviate-client package
class_prefix (Optional[str]) -- prefix for Weaviate classes

add(embedding_results: List[NodeWithEmbedding]) → List[str]

Add embedding results to index.

Args: embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any: Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) → None

Delete nodes using with ref_doc_id.

参数: ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) → VectorStoreQueryResult: Query index for top k most similar nodes.