Practical AI: Utilizing Large Language Models

Who Am I?

Software developer with a BSc in Computer Science.
About a year experience in utilizing LLMs (ChatGPT) every day at work.

I don't develop AI systems, rather I utilize them to improve my programming.

I have voraciously consuming the latest developments in generative AI.

Roadmap

Introduction to AI

Defining AI vs AGI and shifting perspectives

Machine Learning Fundamentals

Types of machine learning to set up discussion of LLMs

Large Language Models (LLMs)

Introduction to LLMs and how they work

LLM Prompting Techniques

Crafting effective prompts to use these tool to full potential.

Conclusion and Call to Action

The future of LLMs and their impact

Purpose: To provide a comprehensive introduction to AI, machine learning, and LLMs, enabling the audience to understand, evaluate, and harness the power of generative AI for their work and creativity.

Why learn about AI tools?

Empowerment: Leveraging AI tools can empower you to create amazing work.

Like any skill utilizing AI tools improves with practice.

Rapid Improvement: AI is improving at a breathtaking pace, it seems likely that almost all 'knowledge' and artistic professions will be changed forever.

New jobs and artist developments will challenge existing institutions.

Informed Evaluation: Understanding how these tools work enables evaluation of 'AI hype'.

Nvidia GB200

Birds with Hats 🐦🎩

I have had a lot of fun using Dalle-3 to generate characters, specifically birds with hats, against a white background.
Using this ability, I got GPT-4 to write code to remove the background and print a sticker sheet to decorate my laptop.
AI tools empowered me to complete a project I would have otherwise never attempted.

Introduction to 'AI'

Defining Artificial Intelligence (AI)

AI encompasses the development of computer systems capable of performing tasks that traditionally require human intelligence.

Task-Specific AI:
- Designed for specialized tasks, demonstrating intelligence in a narrow domain.
- Image recognition, speech recognition, and game-playing AI are examples of task-specific AI.
General AI (AGI):
- Systems that excel across a wide range of tasks, potentially surpassing human capabilities.
- AGI could learn and adapt to new situations without explicit programming.

Figure Robot

"traditionally require human intelligence"

In 1950, British Mathematician Alan Turing proposed the Imitation Game to evaluate if a computer possesses intelligence.

The Turing Test:
- Challenges machines to engage in dialogue so convincingly that observers cannot distinguish it from human responses.
Recent LLMs (Claude Opus, GPT-4) arguably have the ability to mimic human dialogue at a level that meets the criteria of the Turing Test.

There is no longer a clearly defined test for machine intelligence.

Machine Learning & Neural Networks

Machine learning is the process were a computer system can be trained to accomplish a task. A neural network is an algorithm that can learn patterns from data.

Machine Learning: Beyond Traditional Programming
- In traditional programming, tasks are solved similarly to following a recipe. This method falls short with complexity and unpredictability.
- Machine Learning is akin to teaching a computer to cook by tasting dishes: it identifies patterns with flexible, adaptive learning.
Neural Networks: Imitating the Brain
- Neural networks simulate the brain's structure with interconnected 'neurons'.
- They can be trained using a technique called 'backpropagation', which iteratively adjusts the neurons' connections to produce the desired outcome.

Machine Learning Paradigms

Supervised Learning: Learning from labeled data, akin to a guided teaching process.
- Example: A child learns colors by names through explicit instruction, e.g., being told "This is red" with a red object.
Unsupervised Learning: Identifying patterns in data without labels, discovering structure autonomously.
- Example: A child categorizes toys by shapes or colors independently, recognizing patterns without direct teaching.
Reinforcement Learning: Learning via interaction, adapting through trial and feedback.
- Example: A toddler improves walking skills through trial and error, motivated by encouragement and adapting based on feedback.

Supervised learning

Unsupervised learning

Reinforcement Learning

Recognizing Cats and Dogs

Let's explore a concrete example of machine learning: teaching a model to recognize images of cats and dogs.
By understanding this process, you'll gain insights into how AI systems learn from data and make predictions.

Step 1: Collect and Label a Dataset

The first step is to gather a dataset of cat and dog images and label each image accordingly.
Labeling involves assigning the correct category ("cat" or "dog") to each image in the dataset.
Key Point: The quality and diversity of the dataset are crucial for the model's performance. The dataset should include various breeds, angles, and backgrounds to ensure the model learns robust features.

Cat

Dog

Cat

Dog

Cat

Dog

Cat

Dog

Cat

Dog

Step 2: Neural Network Training

A guessing game between a teacher and a toddler is a simple analogy for understanding neural network training.

Initial Guesses: The neural network, like a toddler, makes initial guesses when shown labeled images of cats and dogs. It may not always be correct at first.
Learning from Mistakes: Just as a teacher corrects a toddler's mistakes, the network learns from the differences between its guesses and the actual labels, adjusting its understanding accordingly.
Repetition and Improvement: With repeated exposure to more examples and continuous feedback, the neural network gradually improves its ability to classify images accurately, similar to how a toddler's skills develop through practice and guidance.

Key Point: This process of guessing, receiving feedback, and learning from mistakes is the foundation of how neural networks learn to recognize and classify patterns in data.

→

Dog

→

✖

Prediction:
Cat: 70%
Dog: 30%

Step 3: Testing the Trained Network

After training, the network is tested on new, unseen cat and dog images to evaluate its performance.
The network uses its learned features to classify the new images as either cats or dogs.
Key Point: Testing on unseen data helps assess the network's ability to generalize beyond the training examples. A well-trained model should be able to accurately classify new images it hasn't encountered before.

→

✓

Prediction:
Cat: 95%
Dog: 5%

→

?

Prediction:
Cat: 10%
Dog: 90%

AI Basics: Key Takeaways 🔑

AI systems, like the cat and dog recognizer, learn from data by identifying patterns and relationships. 🐱🐶
The quality of the training data significantly impact the model's performance. 📊
- LLMs face additional challenges due to the complexity of human language, and AI systems may have limitations, with the potential for biased outcomes.
Generalization is crucial, enabling models to make predictions on unseen data. 🔮
- A well-trained model demonstrating its ability to learn general patterns by accurately classifying new examples, rather than memorizing specific instances. 🎓
These concepts lay the foundation for understanding Large Language Models (LLMs), which apply similar principles to natural language processing tasks. 💬

Large Language Model background

Introducing Large Language Models

LLMs are computer programs that output coherent and relevant text in response to an input prompt.

OpenAI's release of ChatGPT in November 2022 brought LLM technology into the public consciousness.
LLMs can be trained in a variety of sizes measured by parameter count
The largest LLMs have demonstrated an unprecedented ability to 'reason' with language

Just like how the cat and dog recognizer learned from labeled images, LLMs learn from vast amounts of text data to capture the intricacies of human language.

Increasing LLM size seems to improve performance on 'reasoning' tasks.

How LLMs Generate Text

LLMs generate text by playing a guessing game, similar to the cat and dog classification example we saw earlier.

Instead of guessing the correct label for an image, LLMs predict the most probable next word based on the previous words in the sequence.
This process is repeated, with each new predicted word added to the input sequence 🔄

LLMs are a highly advanced version of autocomplete.

Compressing the Internet

LLMs learn from vast amounts of diverse online data, essentially "compressing" the knowledge available on the internet into their parameters.

During training, LLMs capture patterns, relationships, and real-world knowledge from this data.

This is similar to how the cat and dog classifier learned from labeled images.

Training top-of-the-line LLMs, like GPT-4, is speculated to cost upwards of $100 million.

A good language model is also a model of the world.

Transformers and Attention: The Backbone of LLMs 🤖🚗

The transformer architecture, introduced in 2017, has been crucial to the development of LLMs.

Transformers utilize an attention mechanism that allows LLMs to focus on the most relevant parts of the input data.

(no attention): "I went to the Calgary Stampede and met a nice [day]" (phone keyboard)
(w/ attention): "I went to the Calgary Stampede and met a nice [cowboy]." (GPT-3.5)

This architecture consists takes input data and generate appropriate outputs, such as translations, summaries, or responses.
Transformers are highly versatile adapt to handle a wide array of input, including text, audio, images, and video.

Transformer Architecture

Completion Models

We have described LLMs autocomplete for internet documents, this is different from the helpful assistant presented by a tool like ChatGPT

Prompt: Can you suggest some dog names?

(davinci-002):

Please?

Name your favorite black and tan male and female dog names,

Name your favorite black and tan Great Dane male and female dog names. Please give a variety. Thank you!

Prompt: list of dog names: Randy,

(davinci-002):

Taffy / Tiffy, Tonto, Risa / Sheena, Sid, Sandy / Sandy, Spike, Scruffy, Teddy, Bodi, Lady, Maxx, Sandy, Roxy, Bella, Fifi, Lucky, Nobi, Marvin, Buddy, Ted, Oscar, Cookie, Cleo, Champ, Cheko, Sam, Pickles, Kevin, Toby, Shadow, Colin, Noro, Brian, Scotty, Kimichi, Pepper

LLM personas

LLMs utilize a compression of the knowledge available on the internet. By fine-tuning this compressed knowledge, we can create models that excel at specific tasks.

Models like ChatGPT are fine-tuned for conversational interactions, with the model playing the role of a helpful assistant.

Prompting is an effective way to invoke an persona in an assistant LLM.

Prompt: please describe large language models to me in 3 sentences

(GPT-4): Large language models (LLMs) like ChatGPT are advanced AI systems trained on vast amounts of text data to generate human-like responses to textual inputs. (...) These models use techniques from deep learning, specifically transformer architectures, to understand and produce text contextually, adapting their responses based on the input they receive.

Prompt: As a kindergarten teacher, please describe large language models to me in 3 sentences

(GPT-4): Large language models are like very smart robots that can read and write. They learn by studying a huge amount of books, articles, and websites to understand and use language just like people do. They can help answer questions, write stories, and even explain things in different ways depending on who they're talking to!

Reinforcement Learning from Human Feedback

Users of ChatGPT participate in reinforcement learning when the model asks you to select the best of two responses.

Limitations of LLMs

Hallucinations:

LLMs are trained on the internet which contains inaccurate information.
Ex. Generating convincing-sounding but fictional news articles or scientific discoveries

Biased outputs:

Ex. Gender stereotyping in generated stories like defaulting to female pronouns for nurses or teachers

Difficulty handling complex situations:

Ex. Customer service chat-bots struggling with unique situations requiring human judgment

Unnecessary Refusals

her fleece as white as copyright infringement