AI Academy

  1. Welcome to the AI Academy
  2. AI Academy
  3. Hosting our own ChatGPT
  4. RAG Session

Welcome to the AI Academy

LO: to harness the power of AI for ourselves

An AI...

👉 Completely Free

👉 Completely offline

👉 Completely yours

...is what we'll be building here

The Ancient Art of the Terminal

Before there were touchscreens and mice and browsers there was...

The Terminal

The Terminal is the most direct interface between you and the computer. Rather than clicking on buttons and icons, we can type commands straight to our CPU.

Open your terminal now

We are going to use it a lot. Get familiar with it.

The commands I'll be writing work on MacOS and Linux. If you're using Windows, you'll need to change them, or set up WSL (a mini-Linux inside Windows)

Some important commands ⌨️

  • cd : change the folder your working in
  • mkdir : create a new folder
  • ls : list all the files in a folder
  • rm : delete a file (add -r for a folder)
  • touch : create a new file
  • nano : editor a given file
  • cat : display a text file in the terminal

Using these commands in the terminal...

  1. Make a new folder to store your work at the AI Academy
  2. Create a new file called notes.txt
  3. Open the file and copy down a couple of the commands, so you remember them for later.

I'm using the word 'folder' here, but the proper term is directory. ^That's why cd = 'change directory'^

What makes an AI?

  • Most computer code is deterministic: you write set of commands, specify the inputs and the desired outputs, then the computer executes it (either successfully or unsuccessfully).

  • Every time you run the program, it'll give the same answer (unless you re-write the code).

  • The 'AIs' we'll be using are a little different. They're the product of neural networks, sort of artificial versions of the way that brains work.

  • Rather than writing programs step by step, neural network are set up to work out their own solutions to problems.

  • That means they are non-deterministic: they won't always give the same answer from the same input.

What's a model?

  • The AIs we know and play with are made of Large Language Models (LLMs) - neural networks set up to generate text in response to inputs.

  • These networks are 'trained' on massive amounts of writing, gathered from the Internet and other digital sources.

  • A bit like how predictive text on your phone knows what words you usually write one of the other, so LLMs are able to 'predict' which words best go together.

  • Different models use different training data, with different 'weightings' to make them pay attention to different things.

  • One thing they all have in common: they need incredible amounts of computing power to build - one of the reasons why GPU manufacturer Nvidia is now the most valuable company on the planet.

Models in the wild

  • Originally, models were carefully guarded secrets. They cost incredible amounts of money to build, so companies wanted to control how they were used
  • Models lived on tech company servers, with users paying to access them
  • Then, in March 2023, Meta's cutting-edge Llama model was leaked online
  • Suddenly, everyone with a powerful enough computer could run the model at home, without paying a penny
  • Meta decided to release their future models into the open as well, a pattern many other AI companies have followed (but not OpenAI):

We believe that openness drives innovation and is the right path forward, which is why we continue to share our research and collaborate with our partners and the developer community.

Meta, announcing Llama 3.2

Bigger isn't better

  • The top-of-the-range models are pretty massive:
    • Llama 3.1 is 243 gigabytes, way too big to run outside a server farm with $40,000 GPUs
  • Over the last year or so, more and more lighter models have been released, or Small Language Models
  • These weigh in between 0.5 to 2.5 gigabytes, and are small enough to run on laptops and even phones
  • These are the sorts of models we'll be playing with

Getting started

We'll be using a program called Ollama to download and test out various AI models.

Ollama is an example of open-source software, meaning we can use it for free and look at the code that powers it.

Something like ChatGPT is closed-source, so called because we can't see the source-code of the software.

Install Ollama 🦙

For Windows, I'd recommend following these instructions 🪟

Downloading our first model

Lots of companies and organisations have trained their own Large Language Models.

We are going to try out one of the latest, small models from Meta (the company that owns Facebook).

Run the commands below to download the llama3.2:1b model. (Don't forget the 1b! Otherwise you'll pull the 3b version by default.)

ollama serve
ollama pull llama3.2:1b

Chatting to the model

Now you've got the model on your hard-drive, it's yours to keep.

Switch off your WiFi, to prove it's really yours (and not running on Zuckerberg's server)

Now, in your terminal, type this command to start chatting to the model:

ollama serve
ollama pull llama3.2:1b    #if you haven't downloaded already
ollama run llama3.2:1b

Ask the model about itself! What can you learn?

Designing our own AI

Llama3.2 has been trained by Meta, with system prompts to make it a useful (and generic) assistant.

We can change it so that it meets our own needs.

Setting up the Modelfile

  1. Make a new folder for your AI projects
  2. In the folder, create a new text file called 'MyModel'
  3. Open up the text file in an editor
mkdir ai-projects  # making a new folder 
cd ai-projects     # moving into the new folder
touch MyModel      # creating the Modelfile
nano MyModel       # opening the file in text editor

Writing the Modelfile

#select the model you're basing this on
FROM llama3.2:1b

# set creativity (0-1) 
# higher means more creative
PARAMETER temperature 0.9

# sets the context window size
# how many tokens the LLM can use as context 
PARAMETER num_ctx 4096

# sets a custom system message to specify 
# the behavior of the chat assistant
SYSTEM You are Macbeth from Shakespeare's tragedy,
 'Macbeth'. Answer all questions in the appropriate 
 style and language of the play. 

Read the documentation 📃

AI Academy

RAG and Context

Model of the Week: DeepSeek-R1

  • An open-source model capable of reasoning - thinking through an answer and showing its working
  • Similar to OpenAI's o1 series
  • Available in a tiny 1.5b parameters version

To try it out, run...

ollama pull deepseek-r1:1.5b

Hosting our own ChatGPT

What's a 'virtual environment' in computing? Why is it useful?

(if you don't know... why not ask one of your local models?)

OpenWebUI

  • OpenWebUI is an open-source front-end for Ollama LLMs
  • It replicates many of the features of ChatGPT's website, including:
    • conversation history
    • file uploads
    • speech-to-text
    • prompt templates
    • knowledge databases
    • ...and much more!
  • But unlike ChatGPT, you can choose which models to run, all on your own device
  • Today, we'll get it up and running

Setting up our virtual environment

  • OpenWebUI runs on Python, making a localhost server you can access in your browser
  • To avoid clashing with packages you may have on your system already, we need to create a new virtual environment.
  • Navigate to your Ai Academy directory, and run:
python3.11 -m venv venv

You must run this with Python version 3.11, or it won't work!

  • This will create a new subdirectory - venv - containing binaries for python and pip, the package manager
  • This is your new virtual environment: packages installed here won't affect your existing Python install.

Installing OpenWebUI

  • Run this to start using your venv and install the open-webui packages:
source venv/bin/activate
pip install open-webui
  • Once it's finished installing, you can start the OpenWebUI serve with:
open-webui serve
  • If all goes well, you can point your browser to the address below and start chatting!
http://localhost:8080/

RAG Session

Squeezing through the context window

  • LLMs work by look at a group of words and deciding which ones should come next
  • We call that group of words the 'context': what the model takes into account when generating a response
  • The first models had small windows of context, only a few hundred or thousand characters
  • More recent state of the art models have much larger context windows, allowing them to 'read' documents the size of novels when considering their responses
  • the bigger the context window, however, the more memory needed to run the model (generally!)

How big is a our window?

  • we're running Llama3.2:1b, one of the most efficient small open source models
  • the default context window is about 8,000 tokens, though the model can handle up to 131,072
  • What does that actually mean?
  • Tokens a little chunks of data - small, common words are generally one token, though larger and more complex words can be multiple tokens
  • a good rule of thumb is 100 tokens = 70 words (check the OpenAI tokenizer for a more detailed estimate)
  • taking that calculation, our Llama can manage 90,000 words of context
  • that's about the length of four Shakespeare plays
  • but remember: more tokens means more memory and CPU. The larger the context window, the slower the response will be