Question Generation

import logging
import sys
import pandas as pd

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.evaluation import DatasetGenerator, QueryResponseEvaluator
from llama_index import SimpleDirectoryReader, GPTVectorStoreIndex, ServiceContext, LLMPredictor, Response
from langchain.chat_models import ChatOpenAI
reader = SimpleDirectoryReader("../paul_graham_essay/data")
documents = reader.load_data()
data_generator = DatasetGenerator.from_documents(documents)
WARNING:llama_index.llm_predictor.base:Unknown max input size for gpt-3.5-turbo, using defaults.
Unknown max input size for gpt-3.5-turbo, using defaults.
eval_questions = data_generator.generate_questions_from_nodes()
eval_questions
['What were the two main things the author worked on before college?',
 'What language did the author use to write programs on the IBM 1401?',
 "What was the author's first microcomputer and what did they use it for?",
 'What did the author study in college before switching to AI?',
 'What made the author want to work on AI?',
 'What language was commonly used for AI in the mid-1980s?',
 'What did the author realize about AI during their first year of grad school?',
 'What did the author decide to focus on instead of AI?',
 'What was the problem with systems work according to the author?',
 'What did the author realize about making art while looking at a painting at the Carnegie Institute?',
 "What was the topic of the author's dissertation?",
 'Which art schools did the author apply to and which one did they end up attending?',
 "What was the author's experience like at the Accademia di Belli Arti in Florence?",
 'What did the author learn about low-end software while working at Interleaf?',
 "What is the author's opinion on signature styles in painting?",
 'How did the author manage to save enough money to pay off their college loans?',
 'What did the author learn about technology companies while working at Interleaf?',
 "What was the author's experience like in the color class they took at RISD?",
 'What did the author learn about painting still lives?',
 "What was the author's experience like at RISD and why did they end up dropping out?",
 'What is the difference between the tribe of signature style seekers and the earnest students at RISD?',
 'How did the author end up in New York?',
 "What was the author's initial plan to make money after dropping out of RISD?",
 'Who was Idelle Weber and how did she help the author?',
 "What was the author's startup idea and why did it fail?",
 'What was the main goal of an online store builder and why was it important?',
 'How did Viaweb differentiate itself from its competitors?',
 'What did the author learn about retail while building stores for users?',
 "How did the author's attitude towards business change after getting users?",
 "What was the author's vision for software development in the future?",
 "What lesson did the author learn about scanning images of men's shirts?",
 'Why did the author initially think they needed a "business person" to be in charge?',
 'What is the ultimate test of a startup according to the author?',
 'Why did the author hire lots more people for their startup?',
 "What was the author's experience like working at Yahoo after their company was bought?",
 'What advice does the author give to founders who are leaving after selling their companies?',
 "What was the author's idea for a new company after leaving Yahoo?",
 'Why did the author decide to build a new dialect of Lisp?',
 'How did the author realize the potential of publishing essays on the web?',
 "What was the author's plan for writing essays on the web?",
 "What was the author's involvement with building the infrastructure of the web?",
 'In the print era, who were the only people allowed to publish essays?',
 'What did the author realize about online essays and their social perception?',
 'According to the author, what is a danger for the ambitious?',
 'What was the turning point for the author in figuring out what to work on?',
 "What was the idea behind the big party at the author's house in October 2003?",
 "What was Jessica Livingston's job before she started compiling a book of interviews with startup founders?",
 'What was the most distinctive thing about Y Combinator?',
 'How did YC solve one of the biggest problems faced by founders?',
 "What was the author's original plan for YC and how did it change over time?",
 'What is the "YC GDP" and how has it evolved over time?',
 'What was the original intention for Y Combinator and how did it change over time?',
 'What was the purpose of Hacker News and how did it impact YC?',
 'What were some of the challenges faced by Paul Graham while working at YC?',
 'What advice did Robert Morris give to Paul Graham and how did it impact his decision to leave YC?',
 'How did Paul Graham decide to spend his time after leaving YC?',
 'What is Lisp and how did it originate?',
 'What was the goal of creating Bel and how was it achieved?',
 "How did working on Bel impact Paul Graham's life?",
 'Why did Paul Graham and his family move to England in 2016?',
 "What was the author's experience with time-sharing machines?",
 'Where did the author live while attending the Accademia in Florence?',
 "What is the significance of the Y combinator in the author's work?",
 'What is Bel and how was it developed?',
 "What was the author's experience with writing essays in 2020?",
 'How did the author choose what to work on in the past?',
 'What is the difference between rent-controlled and rent-stabilized apartments?',
 "What was the author's experience with launching an online store builder?",
 "What is the author's opinion on customs in rapidly changing fields?",
 "What was the author's experience with leaving Y Combinator?"]
# gpt-4
llm_predictor_gpt4 = LLMPredictor(llm=ChatOpenAI(temperature=0, model_name="gpt-4"))
service_context_gpt4 = ServiceContext.from_defaults(llm_predictor=llm_predictor_gpt4)
evaluator_gpt4 = QueryResponseEvaluator(service_context=service_context_gpt4)
# create vector index
vector_index = GPTVectorStoreIndex.from_documents(
    documents, 
    service_context=service_context_gpt4
)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 17617 tokens
> [build_index_from_nodes] Total embedding token usage: 17617 tokens
# define jupyter display function
def display_eval_df(query: str, response: Response, eval_result: str) -> None:
    eval_df = pd.DataFrame(
        {
            "Query": query,
            "Response": str(response), 
            "Source": response.source_nodes[0].source_text[:1000] + "...",
            "Evaluation Result": eval_result
        },
        index=[0]
    )
    eval_df = eval_df.style.set_properties(
        **{
            'inline-size': '600px',
            'overflow-wrap': 'break-word',
        }, 
        subset=["Response", "Source"]
    )
    display(eval_df)
query_engine = vector_index.as_query_engine()
response_vector = query_engine.query(eval_questions[1])
eval_result = evaluator_gpt4.evaluate(eval_questions[1], response_vector)
display_eval_df(eval_questions[1], response_vector, eval_result)
  Query Response Source Evaluation Result
0 What language did the author use to write programs on the IBM 1401? The author used an early version of Fortran to write programs on the IBM 1401. What I Worked On February 2021 Before college the two main things I worked on, outside of school, were writing and programming. I didn't write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep. The first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district's 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain's lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights. The language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in... YES