Lets Code Them Up!

Tag: reinforcement learning using human feedback

Generative AI for Everyone | Notes | Week 2

January 20th, 2024
Recently I finished the Andrew Ng’s course on Coursera – Generative AI for Everyone. These are my notes for week 2 from that course.

The second week was focused on using generative AI to create software applications – how much it costs/how much time it takes.

Let’s consider building in software application for restaurant reputation monitoring using Machine Learning.
1. Collecting data (few hundred/few thousand), label it. (approx. 1 month)
2. Find the correct AI model to train on the data to learn how to output positive/negative depending on the input. (approx. 3 months)
3. Find a cloud service to deploy and use the model. (approx. 3 months)
This process generally takes months to complete. However you can do the same using generative AI within weeks. Here’s the steps involved –
1. Scope the project
2. Build the initial prototype and then work on improving it.
3. Evaluation of the outputs to increase the system’s accuracy.
4. Deploy using a cloud service and monitor
Retrieval Augmented Generation –

As we know LLM takes prompt as input and the length of prompt is limited, if we provide additional context to improve the answer, the length of that context is limited.

RAG is additional technique using which you can provide additional context to LLM without increasing the prompt length. It occurs in three steps –
1. Given a prompt, search all relavent documents to generate the answer.
2. Add that retrieved text to the prompt.
3. Then feed the new prompt with the additional context to LLM.
Many applications use this technique, example coursera coach, which uses the coursera website information to answer specific questions that student ask. Many companies are creating ChatBots for their company offerings and they use RAG to provide the additional context.

Fine-tuning –

Most of the generative AI models are trained on data from web. They are general purpose LLMs. We can use this LLM and fine tune it on our domain specific data, so that the LLM learns to give the niche output we want.

For example, a general purpose LLM will not be able to give correct output on medical records/legal records. We can first train our LLM on these records so that it starts giving correct output for new medical records.

BloombergGPT is one such solution which was build specifically for financial data.

How to choose a model?
1. Based on Model size –
  - 1B parameters – pattern matching and basic knowledge of the world
  - 10B parameters – greater world knowledge and can follow basic instructions
  - 100B parameters – rich world knowledge and complex reasoning.
2. Open source/closed source
  - Closed source models – available through cloud programming interface, easy to use, not that costly
  - Open source models – full control over the model and data, can be run using your own systems.
LLM, Instruction Tuning and RLHF –

Instruction Tuning is training your LLM on specific set of questions and answers or examples of LLM following instructions, so it learns how to follow specific type of questions (instructions).

Re-inforcement Learning from Human Feedback – to further improve LLM, we can use supervised learning to rate LLM answers.
1. Train an answer quality model. Given a prompt, we will get multiple answers from LLM, and then we can store these answers and their rating (score by humans) into a dataset. Then we train a ML model on this data to automatically rate the answer.
2. Have LLM generate more responses for different prompts and train the LLM to generate answers with higher rating.
Thanks for stopping by!

Tag: reinforcement learning using human feedback

Generative AI for Everyone | Notes | Week 2

Retrieval Augmented Generation –

Fine-tuning –

How to choose a model?

LLM, Instruction Tuning and RLHF –