Redis Vector Store

In this notebook we are going to show a quick demo of using the RedisVectorStore.

import os
import sys
import logging
import textwrap

import warnings
warnings.filterwarnings("ignore")

# stop huggingface warnings
os.environ["TOKENIZERS_PARALLELISM"] = "false"

# Uncomment to see debug logs
#logging.basicConfig(stream=sys.stdout, level=logging.INFO)
#logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import GPTVectorStoreIndex, SimpleDirectoryReader, Document
from llama_index.vector_stores import RedisVectorStore
from IPython.display import Markdown, display

Start Redis

The easiest way to start Redis as a vector database is using the redis-stack docker image.

To follow every step of this tutorial, launch the image as follows:

docker run --name redis-vecdb -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest

This will also launch the RedisInsight UI on port 8001 which you can view at http://localhost:8001.

Setup OpenAI

Lets first begin by adding the openai api key. This will allow us to access openai for embeddings and to use chatgpt.

import os
os.environ["OPENAI_API_KEY"] = "sk-<your key here>"

Read in a dataset

Here we will use a set of Paul Graham essays to provide the text to turn into embeddings, store in a RedisVectorStore and query to find context for our LLM QnA loop.

# load documents
documents = SimpleDirectoryReader('../data/paul_graham').load_data()
print('Document ID:', documents[0].doc_id, 'Document Hash:', documents[0].doc_hash)

Document ID: 09206095-be73-4069-b9f6-ff76d1f03343 Document Hash: 77ae91ab542f3abb308c4d7c77c9bc4c9ad0ccd63144802b7cbe7e1bb3a4094e

Initialize the Redis Vector Store

Now we have our documents read in, we can initialize the Redis Vector Store. This will allow us to store our vectors in Redis and create an index.

Here is the docstring for the RedisVectorStore:

class RedisVectorStore(VectorStore):
    
    def __init__(
        self,
        index_name: Optional[str],
        index_prefix: Optional[str] = "gpt_index",
        index_args: Optional[Dict[str, Any]] = None,
        redis_url: Optional[str] = "redis://localhost:6379",
        overwrite: bool = False,
        **kwargs: Any,
    ) -> None:
        """Initialize RedisVectorStore.

        Args:
            index_name (str): Name of the index.
            index_prefix (str): Prefix for the index. Defaults to "gpt_index".
            index_args (Dict[str, Any]): Arguments for the index. Defaults to None.
            redis_url (str): URL for the redis instance. Defaults to "redis://localhost:6379".
            overwrite (bool): Whether to overwrite the index if it already exists. Defaults to False.
            kwargs (Any): Additional arguments to pass to the redis client.

        Raises:
            ValueError: If redis-py is not installed
            ValueError: If RediSearch is not installed

        Examples:
            >>> from gpt_index.vector_stores.redis import RedisVectorStore
            >>> # Create a RedisVectorStore
            >>> vector_store = RedisVectorStore(
            >>>     index_name="my_index",
            >>>     index_prefix="gpt_index",
            >>>     index_args={"algorithm": "HNSW", "m": 16, "efConstruction": 200, "distance_metric": "cosine"},
            >>>     redis_url="redis://localhost:6379/",
            >>>     overwrite=True)

        """

from llama_index.storage.storage_context import StorageContext


vector_store = RedisVectorStore(
    index_name="pg_essays",
    index_prefix="llama",
    redis_url="redis://localhost:6379",
    overwrite=True
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = GPTVectorStoreIndex.from_documents(documents, storage_context=storage_context)

Query the data

Now that we have our document stored in the index, we can ask questions against the index. The index will use the data stored in itself as the knowledge base for chatgpt.

query_engine = index.as_query_engine()
response = query_engine.query("What did the author learn?")
print(textwrap.fill(str(response), 100))

 The author learned that the AI programs of the time were not capable of understanding natural
language, and that the field of AI was a hoax. He also learned that he could make art, and that he
could pass the entrance exam for the Accademia di Belli Arti in Florence. He also learned Lisp
hacking and wrote his dissertation on applications of continuations.

response = query_engine.query("What was a hard moment for the author?")
print(textwrap.fill(str(response), 100))

 A hard moment for the author was when he realized that the AI programs of the time were a hoax and
that there was an unbridgeable gap between what they could do and actually understanding natural
language. He had invested a lot of time and energy into learning about AI and was disappointed to
find out that the field was not as promising as he had thought.

Saving and Loading

Redis allows the user to perform backups in the background or synchronously. With Llamaindex, the RedisVectorStore.persist() function can be used to trigger such a backup.

!docker exec -it redis-vecdb ls /data

redis  redisinsight

vector_store.persist(persist_path="") # persist_path means nothing for RedisVectorStore

!docker exec -it redis-vecdb ls /data

dump.rdb  redis  redisinsight

Deleting documents or index completely

Sometimes it may be useful to delete documents or the entire index. This can be done using the delete and delete_index methods.

document_id = documents[0].doc_id
document_id

'09206095-be73-4069-b9f6-ff76d1f03343'

redis_client = vector_store.client
print("Number of documents", len(redis_client.keys()))

Number of documents 24

vector_store.delete(document_id)

print("Number of documents", len(redis_client.keys()))

Number of documents 14

# now lets delete the index entirely (happens in the background, may take a second)
# this will delete all the documents and the index
vector_store.delete_index()

print("Number of documents", len(redis_client.keys()))

Number of documents 0