Vector Store

Vector stores.

class llama_index.vector_stores.ChatGPTRetrievalPluginClient(endpoint_url: str, bearer_token: Optional[str] = None, retries: Optional[Retry] = None, batch_size: int = 100, **kwargs: Any)

ChatGPT Retrieval Plugin Client.

In this client, we make use of the endpoints defined by ChatGPT.

参数
  • endpoint_url (str) -- URL of the ChatGPT Retrieval Plugin.

  • bearer_token (Optional[str]) -- Bearer token for the ChatGPT Retrieval Plugin.

  • retries (Optional[Retry]) -- Retry object for the ChatGPT Retrieval Plugin.

  • batch_size (int) -- Batch size for the ChatGPT Retrieval Plugin.

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding_results to index.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Get nodes for response.

class llama_index.vector_stores.ChromaVectorStore(chroma_collection: Any, **kwargs: Any)

Chroma vector store.

In this vector store, embeddings are stored within a ChromaDB collection.

During query time, the index uses ChromaDB to query for the top k most similar nodes.

参数

chroma_collection (chromadb.api.models.Collection.Collection) -- ChromaDB collection instance

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.DeepLakeVectorStore(dataset_path: str = 'llama_index', token: Optional[str] = None, read_only: Optional[bool] = False, ingestion_batch_size: int = 1024, ingestion_num_workers: int = 4, overwrite: bool = False)

The DeepLake Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a deeplake dataset. This implemnetation allows the use of an already existing deeplake dataset if it is one that was created this vector store. It also supports creating a new one if the dataset doesnt exist or if overwrite is set to True.

参数
  • deeplake_path (str, optional) -- Path to the deeplake dataset, where data will be

  • "llama_index". (stored. Defaults to) --

  • overwrite (bool, optional) -- Whether to overwrite existing dataset with same name. Defaults to False.

  • token (str, optional) -- the deeplake token that allows you to access the dataset with proper access. Defaults to None.

  • read_only (bool, optional) -- Whether to open the dataset with read only mode.

  • ingestion_batch_size (bool, 1024) -- used for controlling batched data injestion to deeplake dataset. Defaults to 1024.

  • injestion_num_workers (int, 1) -- number of workers to use during data injestion. Defaults to 4.

  • overwrite -- Whether to overwrite existing dataset with the new dataset with the same name.

抛出
  • ImportError -- Unable to import deeplake.

  • UserNotLoggedinException -- When user is not logged in with credentials or token.

  • TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.

  • InvalidTokenException -- If the specified token is invalid

返回

Vectorstore that supports add, delete, and query.

返回类型

DeepLakeVectorstore

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add the embeddings and their nodes into DeepLake.

参数

embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.

抛出
  • UserNotLoggedinException -- When user is not logged in with credentials or token.

  • TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.

  • InvalidTokenException -- If the specified token is invalid

返回

List of ids inserted.

返回类型

List[str]

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.FaissVectorStore(faiss_index: Any)

Faiss Vector Store.

Embeddings are stored within a Faiss index.

During query time, the index uses Faiss to query for the top k embeddings, and returns the corresponding indices.

参数

faiss_index (faiss.Index) -- Faiss index instance

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

NOTE: in the Faiss vector store, we do not store text in Faiss.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return the faiss index.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) None

Save to file.

This method saves the vector store to disk.

参数

persist_path (str) -- The save_path of the file.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.LanceDBVectorStore(uri: str, table_name: str = 'vectors', nprobes: int = 20, refine_factor: Optional[int] = None, **kwargs: Any)

The LanceDB Vector Store.

Stores text and embeddings in LanceDB. The vector store will open an existing

LanceDB dataset or create the dataset if it does not exist.

参数
  • uri (str, required) -- Location where LanceDB will store its files.

  • table_name (str, optional) -- The table name where the embeddings will be stored. Defaults to "vectors".

  • nprobes (int, optional) -- The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.

  • refine_factor -- (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

抛出

ImportError -- Unable to import lancedb.

返回

VectorStore that supports creating LanceDB datasets and

querying it.

返回类型

LanceDBVectorStore

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to vector store.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

class llama_index.vector_stores.MetalVectorStore(api_key: str, client_id: str, index_id: str)
add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeEmbeddingResult]: list of embedding results

property client: Any

Return Metal client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

class llama_index.vector_stores.MilvusVectorStore(collection_name: str = 'llamalection', index_params: Optional[dict] = None, search_params: Optional[dict] = None, dim: Optional[int] = None, host: str = 'localhost', port: int = 19530, user: str = '', password: str = '', use_secure: bool = False, overwrite: bool = False, **kwargs: Any)

The Milvus Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a Milvus collection. This implemnetation allows the use of an already existing collection if it is one that was created this vector store. It also supports creating a new one if the collection doesnt exist or if overwrite is set to True.

参数
  • collection_name (str, optional) -- The name of the collection where data will be stored. Defaults to "llamalection".

  • index_params (dict, optional) -- The index parameters for Milvus, if none are provided an HNSW index will be used. Defaults to None.

  • search_params (dict, optional) -- The search parameters for a Milvus query. If none are provided, default params will be generated. Defaults to None.

  • dim (int, optional) -- The dimension of the embeddings. If it is not provided, collection creation will be done on first insert. Defaults to None.

  • host (str, optional) -- The host address of Milvus. Defaults to "localhost".

  • port (int, optional) -- The port of Milvus. Defaults to 19530.

  • user (str, optional) -- The username for RBAC. Defaults to "".

  • password (str, optional) -- The password for RBAC. Defaults to "".

  • use_secure (bool, optional) -- Use https. Required for Zilliz Cloud. Defaults to False.

  • overwrite (bool, optional) -- Whether to overwrite existing collection with same name. Defaults to False.

抛出
  • ImportError -- Unable to import pymilvus.

  • MilvusException -- Error communicating with Milvus, more can be found in logging under Debug.

返回

Vectorstore that supports add, delete, and query.

返回类型

MilvusVectorstore

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add the embeddings and their nodes into Milvus.

参数

embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.

抛出

MilvusException -- Failed to insert data.

返回

List of ids inserted.

返回类型

List[str]

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

抛出

MilvusException -- Failed to delete the doc.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

  • doc_ids (Optional[List[str]]) -- list of doc_ids to filter by

class llama_index.vector_stores.MyScaleVectorStore(myscale_client: Optional[Any] = None, table: str = 'llama_index', database: str = 'default', index_type: str = 'IVFFLAT', metric: str = 'cosine', batch_size: int = 32, index_params: Optional[dict] = None, search_params: Optional[dict] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any)

MyScale Vector Store.

In this vector store, embeddings and docs are stored within an existing MyScale cluster.

During query time, the index uses MyScale to query for the top k most similar nodes.

参数
  • myscale_client (httpclient) -- clickhouse-connect httpclient of an existing MyScale cluster.

  • table (str, optional) -- The name of the MyScale table where data will be stored. Defaults to "llama_index".

  • database (str, optional) -- The name of the MyScale database where data will be stored. Defaults to "default".

  • index_type (str, optional) -- The type of the MyScale vector index. Defaults to "IVFFLAT".

  • metric (str, optional) -- The metric type of the MyScale vector index. Defaults to "cosine".

  • batch_size (int, optional) -- the size of documents to insert. Defaults to 32.

  • index_params (dict, optional) -- The index parameters for MyScale. Defaults to None.

  • search_params (dict, optional) -- The search parameters for a MyScale query. Defaults to None.

  • service_context (ServiceContext, optional) -- Vector store service context. Defaults to None

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

drop() None

Drop MyScale Index and table

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query (VectorStoreQuery) -- query

class llama_index.vector_stores.OpensearchVectorClient(endpoint: str, index: str, dim: int, embedding_field: str = 'embedding', text_field: str = 'content', extra_info_field: str = 'extra_info', method: Optional[dict] = None, auth: Optional[dict] = None)

Object encapsulating an Opensearch index that has vector search enabled.

If the index does not yet exist, it is created during init. Therefore, the underlying index is assumed to either: 1) not exist yet or 2) be created due to previous usage of this class.

参数
  • endpoint (str) -- URL (http/https) of elasticsearch endpoint

  • index (str) -- Name of the elasticsearch index

  • dim (int) -- Dimension of the vector

  • embedding_field (str) -- Name of the field in the index to store embedding array in.

  • text_field (str) -- Name of the field to grab text from

  • method (Optional[dict]) -- Opensearch "method" JSON obj for configuring the KNN index. This includes engine, metric, and other config params. Defaults to: {"name": "hnsw", "space_type": "l2", "engine": "faiss", "parameters": {"ef_construction": 256, "m": 48}}

delete_doc_id(doc_id: str) None

Delete a document.

参数

doc_id (str) -- document id

do_approx_knn(query_embedding: List[float], k: int) VectorStoreQueryResult

Do approximate knn.

index_results(results: List[NodeWithEmbedding]) List[str]

Store results in the index.

class llama_index.vector_stores.OpensearchVectorStore(client: OpensearchVectorClient)

Elasticsearch/Opensearch vector store.

参数

client (OpensearchVectorClient) -- Vector index client to use for data insertion/querying.

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.PineconeVectorStore(pinecone_index: Optional[Any] = None, index_name: Optional[str] = None, environment: Optional[str] = None, namespace: Optional[str] = None, insert_kwargs: Optional[Dict] = None, add_sparse_vector: bool = False, tokenizer: Optional[Callable] = None, **kwargs: Any)

Pinecone Vector Store.

In this vector store, embeddings and docs are stored within a Pinecone index.

During query time, the index uses Pinecone to query for the top k most similar nodes.

参数
  • pinecone_index (Optional[pinecone.Index]) -- Pinecone index instance

  • insert_kwargs (Optional[Dict]) -- insert kwargs during upsert call.

  • add_sparse_vector (bool) -- whether to add sparse vector to index.

  • tokenizer (Optional[Callable]) -- tokenizer to use to generate sparse

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return Pinecone client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.QdrantVectorStore(collection_name: str, client: Optional[Any] = None, **kwargs: Any)

Qdrant Vector Store.

In this vector store, embeddings and docs are stored within a Qdrant collection.

During query time, the index uses Qdrant to query for the top k most similar nodes.

参数
  • collection_name -- (str): name of the Qdrant collection

  • client (Optional[Any]) -- QdrantClient instance from qdrant-client package

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return the Qdrant client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query (VectorStoreQuery) -- query

class llama_index.vector_stores.RedisVectorStore(index_name: str, index_prefix: str = 'llama_index', index_args: Optional[Dict[str, Any]] = None, redis_url: str = 'redis://localhost:6379', overwrite: bool = False, **kwargs: Any)
add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to the index.

参数

embedding_results (List[NodeWithEmbedding]) -- List of embedding results to add to the index.

返回

List of ids of the documents added to the index.

返回类型

List[str]

抛出

ValueError -- If the index already exists and overwrite is False.

property client: RedisType

Return the redis client instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

delete_index() None

Delete the index and all documents.

persist(persist_path: str, fs: Optional[AbstractFileSystem] = None, in_background: bool = True) None

Persist the vector store to disk.

参数
  • persist_path (str) -- Path to persist the vector store to. (doesn't apply)

  • in_background (bool, optional) -- Persist in background. Defaults to True.

  • fs (fsspec.AbstractFileSystem, optional) -- Filesystem to persist to. (doesn't apply)

抛出

redis.exceptions.RedisError -- If there is an error persisting the index to disk.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query the index.

参数

query (VectorStoreQuery) -- query object

返回

query result

返回类型

VectorStoreQueryResult

抛出
  • ValueError -- If query.query_embedding is None.

  • redis.exceptions.RedisError -- If there is an error querying the index.

  • redis.exceptions.TimeoutError -- If there is a timeout querying the index.

class llama_index.vector_stores.SimpleVectorStore(data: Optional[SimpleVectorStoreData] = None, fs: Optional[AbstractFileSystem] = None, **kwargs: Any)

Simple Vector Store.

In this vector store, embeddings are stored within a simple, in-memory dictionary.

参数

simple_vector_store_data_dict (Optional[dict]) -- data dict containing the embeddings and doc_ids. See SimpleVectorStoreData for more details.

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding_results to index.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

classmethod from_persist_dir(persist_dir: str = './storage', fs: Optional[AbstractFileSystem] = None) SimpleVectorStore

Load from persist dir.

classmethod from_persist_path(persist_path: str, fs: Optional[AbstractFileSystem] = None) SimpleVectorStore

Create a SimpleKVStore from a persist directory.

get(text_id: str) List[float]

Get embedding.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) None

Persist the SimpleVectorStore to a directory.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Get nodes for response.

class llama_index.vector_stores.SupabaseVectorStore(postgres_connection_string: str, collection_name: str, dimension: int = 1536, **kwargs: Any)

Supbabase Vector.

In this vector store, embeddings are stored in Postgres table using pgvector.

During query time, the index uses pgvector/Supabase to query for the top k most similar nodes.

参数
  • postgres_connection_string (str) -- postgres connection string

  • collection_name (str) -- name of the collection to store the embeddings in

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete doc.

参数

doc_id (str) -- document id

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query (List[float]) -- query embedding

class llama_index.vector_stores.WeaviateVectorStore(weaviate_client: Optional[Any] = None, class_prefix: Optional[str] = None, **kwargs: Any)

Weaviate vector store.

In this vector store, embeddings and docs are stored within a Weaviate collection.

During query time, the index uses Weaviate to query for the top k most similar nodes.

参数
  • weaviate_client (weaviate.Client) -- WeaviateClient instance from weaviate-client package

  • class_prefix (Optional[str]) -- prefix for Weaviate classes

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

向量存储

Vector stores.

class llama_index.vector_stores.ChatGPTRetrievalPluginClient(endpoint_url: str, bearer_token: Optional[str] = None, retries: Optional[Retry] = None, batch_size: int = 100, **kwargs: Any)

ChatGPT Retrieval Plugin Client.

In this client, we make use of the endpoints defined by ChatGPT.

参数
  • endpoint_url (str) -- URL of the ChatGPT Retrieval Plugin.

  • bearer_token (Optional[str]) -- Bearer token for the ChatGPT Retrieval Plugin.

  • retries (Optional[Retry]) -- Retry object for the ChatGPT Retrieval Plugin.

  • batch_size (int) -- Batch size for the ChatGPT Retrieval Plugin.

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding_results to index.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Get nodes for response.

class llama_index.vector_stores.ChromaVectorStore(chroma_collection: Any, **kwargs: Any)

Chroma vector store.

In this vector store, embeddings are stored within a ChromaDB collection.

During query time, the index uses ChromaDB to query for the top k most similar nodes.

参数

chroma_collection (chromadb.api.models.Collection.Collection) -- ChromaDB collection instance

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.DeepLakeVectorStore(dataset_path: str = 'llama_index', token: Optional[str] = None, read_only: Optional[bool] = False, ingestion_batch_size: int = 1024, ingestion_num_workers: int = 4, overwrite: bool = False)

The DeepLake Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a deeplake dataset. This implemnetation allows the use of an already existing deeplake dataset if it is one that was created this vector store. It also supports creating a new one if the dataset doesnt exist or if overwrite is set to True.

参数
  • deeplake_path (str, optional) -- Path to the deeplake dataset, where data will be

  • "llama_index". (stored. Defaults to) --

  • overwrite (bool, optional) -- Whether to overwrite existing dataset with same name. Defaults to False.

  • token (str, optional) -- the deeplake token that allows you to access the dataset with proper access. Defaults to None.

  • read_only (bool, optional) -- Whether to open the dataset with read only mode.

  • ingestion_batch_size (bool, 1024) -- used for controlling batched data injestion to deeplake dataset. Defaults to 1024.

  • injestion_num_workers (int, 1) -- number of workers to use during data injestion. Defaults to 4.

  • overwrite -- Whether to overwrite existing dataset with the new dataset with the same name.

抛出
  • ImportError -- Unable to import deeplake.

  • UserNotLoggedinException -- When user is not logged in with credentials or token.

  • TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.

  • InvalidTokenException -- If the specified token is invalid

返回

Vectorstore that supports add, delete, and query.

返回类型

DeepLakeVectorstore

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add the embeddings and their nodes into DeepLake.

参数

embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.

抛出
  • UserNotLoggedinException -- When user is not logged in with credentials or token.

  • TokenPermissionError -- When dataset does not exist or user doesn't have enough permissions to modify the dataset.

  • InvalidTokenException -- If the specified token is invalid

返回

List of ids inserted.

返回类型

List[str]

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.FaissVectorStore(faiss_index: Any)

Faiss Vector Store.

Embeddings are stored within a Faiss index.

During query time, the index uses Faiss to query for the top k embeddings, and returns the corresponding indices.

参数

faiss_index (faiss.Index) -- Faiss index instance

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

NOTE: in the Faiss vector store, we do not store text in Faiss.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return the faiss index.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) None

Save to file.

This method saves the vector store to disk.

参数

persist_path (str) -- The save_path of the file.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.LanceDBVectorStore(uri: str, table_name: str = 'vectors', nprobes: int = 20, refine_factor: Optional[int] = None, **kwargs: Any)

The LanceDB Vector Store.

Stores text and embeddings in LanceDB. The vector store will open an existing

LanceDB dataset or create the dataset if it does not exist.

参数
  • uri (str, required) -- Location where LanceDB will store its files.

  • table_name (str, optional) -- The table name where the embeddings will be stored. Defaults to "vectors".

  • nprobes (int, optional) -- The number of probes used. A higher number makes search more accurate but also slower. Defaults to 20.

  • refine_factor -- (int, optional): Refine the results by reading extra elements and re-ranking them in memory. Defaults to None

抛出

ImportError -- Unable to import lancedb.

返回

VectorStore that supports creating LanceDB datasets and

querying it.

返回类型

LanceDBVectorStore

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to vector store.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

class llama_index.vector_stores.MetalVectorStore(api_key: str, client_id: str, index_id: str)
add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeEmbeddingResult]: list of embedding results

property client: Any

Return Metal client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query vector store.

class llama_index.vector_stores.MilvusVectorStore(collection_name: str = 'llamalection', index_params: Optional[dict] = None, search_params: Optional[dict] = None, dim: Optional[int] = None, host: str = 'localhost', port: int = 19530, user: str = '', password: str = '', use_secure: bool = False, overwrite: bool = False, **kwargs: Any)

The Milvus Vector Store.

In this vector store we store the text, its embedding and a few pieces of its metadata in a Milvus collection. This implemnetation allows the use of an already existing collection if it is one that was created this vector store. It also supports creating a new one if the collection doesnt exist or if overwrite is set to True.

参数
  • collection_name (str, optional) -- The name of the collection where data will be stored. Defaults to "llamalection".

  • index_params (dict, optional) -- The index parameters for Milvus, if none are provided an HNSW index will be used. Defaults to None.

  • search_params (dict, optional) -- The search parameters for a Milvus query. If none are provided, default params will be generated. Defaults to None.

  • dim (int, optional) -- The dimension of the embeddings. If it is not provided, collection creation will be done on first insert. Defaults to None.

  • host (str, optional) -- The host address of Milvus. Defaults to "localhost".

  • port (int, optional) -- The port of Milvus. Defaults to 19530.

  • user (str, optional) -- The username for RBAC. Defaults to "".

  • password (str, optional) -- The password for RBAC. Defaults to "".

  • use_secure (bool, optional) -- Use https. Required for Zilliz Cloud. Defaults to False.

  • overwrite (bool, optional) -- Whether to overwrite existing collection with same name. Defaults to False.

抛出
  • ImportError -- Unable to import pymilvus.

  • MilvusException -- Error communicating with Milvus, more can be found in logging under Debug.

返回

Vectorstore that supports add, delete, and query.

返回类型

MilvusVectorstore

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add the embeddings and their nodes into Milvus.

参数

embedding_results (List[NodeWithEmbedding]) -- The embeddings and their data to insert.

抛出

MilvusException -- Failed to insert data.

返回

List of ids inserted.

返回类型

List[str]

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

抛出

MilvusException -- Failed to delete the doc.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

  • doc_ids (Optional[List[str]]) -- list of doc_ids to filter by

class llama_index.vector_stores.MyScaleVectorStore(myscale_client: Optional[Any] = None, table: str = 'llama_index', database: str = 'default', index_type: str = 'IVFFLAT', metric: str = 'cosine', batch_size: int = 32, index_params: Optional[dict] = None, search_params: Optional[dict] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any)

MyScale Vector Store.

In this vector store, embeddings and docs are stored within an existing MyScale cluster.

During query time, the index uses MyScale to query for the top k most similar nodes.

参数
  • myscale_client (httpclient) -- clickhouse-connect httpclient of an existing MyScale cluster.

  • table (str, optional) -- The name of the MyScale table where data will be stored. Defaults to "llama_index".

  • database (str, optional) -- The name of the MyScale database where data will be stored. Defaults to "default".

  • index_type (str, optional) -- The type of the MyScale vector index. Defaults to "IVFFLAT".

  • metric (str, optional) -- The metric type of the MyScale vector index. Defaults to "cosine".

  • batch_size (int, optional) -- the size of documents to insert. Defaults to 32.

  • index_params (dict, optional) -- The index parameters for MyScale. Defaults to None.

  • search_params (dict, optional) -- The search parameters for a MyScale query. Defaults to None.

  • service_context (ServiceContext, optional) -- Vector store service context. Defaults to None

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

drop() None

Drop MyScale Index and table

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query (VectorStoreQuery) -- query

class llama_index.vector_stores.OpensearchVectorClient(endpoint: str, index: str, dim: int, embedding_field: str = 'embedding', text_field: str = 'content', extra_info_field: str = 'extra_info', method: Optional[dict] = None, auth: Optional[dict] = None)

Object encapsulating an Opensearch index that has vector search enabled.

If the index does not yet exist, it is created during init. Therefore, the underlying index is assumed to either: 1) not exist yet or 2) be created due to previous usage of this class.

参数
  • endpoint (str) -- URL (http/https) of elasticsearch endpoint

  • index (str) -- Name of the elasticsearch index

  • dim (int) -- Dimension of the vector

  • embedding_field (str) -- Name of the field in the index to store embedding array in.

  • text_field (str) -- Name of the field to grab text from

  • method (Optional[dict]) -- Opensearch "method" JSON obj for configuring the KNN index. This includes engine, metric, and other config params. Defaults to: {"name": "hnsw", "space_type": "l2", "engine": "faiss", "parameters": {"ef_construction": 256, "m": 48}}

delete_doc_id(doc_id: str) None

Delete a document.

参数

doc_id (str) -- document id

do_approx_knn(query_embedding: List[float], k: int) VectorStoreQueryResult

Do approximate knn.

index_results(results: List[NodeWithEmbedding]) List[str]

Store results in the index.

class llama_index.vector_stores.OpensearchVectorStore(client: OpensearchVectorClient)

Elasticsearch/Opensearch vector store.

参数

client (OpensearchVectorClient) -- Vector index client to use for data insertion/querying.

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.PineconeVectorStore(pinecone_index: Optional[Any] = None, index_name: Optional[str] = None, environment: Optional[str] = None, namespace: Optional[str] = None, insert_kwargs: Optional[Dict] = None, add_sparse_vector: bool = False, tokenizer: Optional[Callable] = None, **kwargs: Any)

Pinecone Vector Store.

In this vector store, embeddings and docs are stored within a Pinecone index.

During query time, the index uses Pinecone to query for the top k most similar nodes.

参数
  • pinecone_index (Optional[pinecone.Index]) -- Pinecone index instance

  • insert_kwargs (Optional[Dict]) -- insert kwargs during upsert call.

  • add_sparse_vector (bool) -- whether to add sparse vector to index.

  • tokenizer (Optional[Callable]) -- tokenizer to use to generate sparse

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return Pinecone client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数
  • query_embedding (List[float]) -- query embedding

  • similarity_top_k (int) -- top k most similar nodes

class llama_index.vector_stores.QdrantVectorStore(collection_name: str, client: Optional[Any] = None, **kwargs: Any)

Qdrant Vector Store.

In this vector store, embeddings and docs are stored within a Qdrant collection.

During query time, the index uses Qdrant to query for the top k most similar nodes.

参数
  • collection_name -- (str): name of the Qdrant collection

  • client (Optional[Any]) -- QdrantClient instance from qdrant-client package

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Return the Qdrant client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query (VectorStoreQuery) -- query

class llama_index.vector_stores.RedisVectorStore(index_name: str, index_prefix: str = 'llama_index', index_args: Optional[Dict[str, Any]] = None, redis_url: str = 'redis://localhost:6379', overwrite: bool = False, **kwargs: Any)
add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to the index.

参数

embedding_results (List[NodeWithEmbedding]) -- List of embedding results to add to the index.

返回

List of ids of the documents added to the index.

返回类型

List[str]

抛出

ValueError -- If the index already exists and overwrite is False.

property client: RedisType

Return the redis client instance

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

delete_index() None

Delete the index and all documents.

persist(persist_path: str, fs: Optional[AbstractFileSystem] = None, in_background: bool = True) None

Persist the vector store to disk.

参数
  • persist_path (str) -- Path to persist the vector store to. (doesn't apply)

  • in_background (bool, optional) -- Persist in background. Defaults to True.

  • fs (fsspec.AbstractFileSystem, optional) -- Filesystem to persist to. (doesn't apply)

抛出

redis.exceptions.RedisError -- If there is an error persisting the index to disk.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query the index.

参数

query (VectorStoreQuery) -- query object

返回

query result

返回类型

VectorStoreQueryResult

抛出
  • ValueError -- If query.query_embedding is None.

  • redis.exceptions.RedisError -- If there is an error querying the index.

  • redis.exceptions.TimeoutError -- If there is a timeout querying the index.

class llama_index.vector_stores.SimpleVectorStore(data: Optional[SimpleVectorStoreData] = None, fs: Optional[AbstractFileSystem] = None, **kwargs: Any)

Simple Vector Store.

In this vector store, embeddings are stored within a simple, in-memory dictionary.

参数

simple_vector_store_data_dict (Optional[dict]) -- data dict containing the embeddings and doc_ids. See SimpleVectorStoreData for more details.

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding_results to index.

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

classmethod from_persist_dir(persist_dir: str = './storage', fs: Optional[AbstractFileSystem] = None) SimpleVectorStore

Load from persist dir.

classmethod from_persist_path(persist_path: str, fs: Optional[AbstractFileSystem] = None) SimpleVectorStore

Create a SimpleKVStore from a persist directory.

get(text_id: str) List[float]

Get embedding.

persist(persist_path: str = './storage/vector_store.json', fs: Optional[AbstractFileSystem] = None) None

Persist the SimpleVectorStore to a directory.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Get nodes for response.

class llama_index.vector_stores.SupabaseVectorStore(postgres_connection_string: str, collection_name: str, dimension: int = 1536, **kwargs: Any)

Supbabase Vector.

In this vector store, embeddings are stored in Postgres table using pgvector.

During query time, the index uses pgvector/Supabase to query for the top k most similar nodes.

参数
  • postgres_connection_string (str) -- postgres connection string

  • collection_name (str) -- name of the collection to store the embeddings in

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: None

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete doc.

参数

doc_id (str) -- document id

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.

参数

query (List[float]) -- query embedding

class llama_index.vector_stores.WeaviateVectorStore(weaviate_client: Optional[Any] = None, class_prefix: Optional[str] = None, **kwargs: Any)

Weaviate vector store.

In this vector store, embeddings and docs are stored within a Weaviate collection.

During query time, the index uses Weaviate to query for the top k most similar nodes.

参数
  • weaviate_client (weaviate.Client) -- WeaviateClient instance from weaviate-client package

  • class_prefix (Optional[str]) -- prefix for Weaviate classes

add(embedding_results: List[NodeWithEmbedding]) List[str]

Add embedding results to index.

Args

embedding_results: List[NodeWithEmbedding]: list of embedding results

property client: Any

Get client.

delete(ref_doc_id: str, **delete_kwargs: Any) None

Delete nodes using with ref_doc_id.

参数

ref_doc_id (str) -- The doc_id of the document to delete.

query(query: VectorStoreQuery, **kwargs: Any) VectorStoreQueryResult

Query index for top k most similar nodes.