Ollama

ChatGPT Clone With Ollama & Gradio

In this blog, I'll guide you through leveraging Ollama to create a fully local and open-source iteration of ChatGPT from the ground up. Furthermore, Ollama enables running multiple models concurrently, offering a plethora of opportunities to explore.

If you are new to Ollama, check the following blogs first to set your basic clear with Ollama

Alright, let's get started. We will use the llama2 model for this blog, first of all, check and verify if our llama2 model is running locally

$ ollama  list

NAME                 	ID          	SIZE  	MODIFIED    
gemma:latest         	430ed3535049	5.2 GB	2 hours ago	
hogwards-model:latest	4484e4280376	3.8 GB	3 weeks ago	
llama2:latest        	78e26419b446	3.8 GB	3 weeks ago	
superman:latest      	eab3b004339f	5.2 GB	2 hours ago

$ ollama run llama2

Let's call our chat application Superchat

Create a superchat folder and create app.py in it.

import requests
import json

url = "http://localhost:11434/api/generate"

headers = {
	'Content-Type': 'application/json'
}

data = {
	"model": "llama2",
	"stream": False,
	"prompt": 'Why earth is revolves?'
}

response =  requests.post(url, headers=headers, data=json.dumps(data))

if response.status_code == 200:
  response_txt = response.text
  data = json.loads(response_txt)
  actual_response = data["response"]
  print(actual_response)
else:
  print("Error:", response.status_code, response.text)

Let's run app.py

$ python app.py

The Earth revolves around the sun due to a combination of gravitational forces and the conservation of angular momentum.

Gravitational forces: The Earth is attracted to the sun by gravity, which pulls it inward. The strength of this attraction depends on the mass of the Earth and the distance between it and the sun. As the Earth orbits the sun, it follows an elliptical path due to the balance between this gravitational force and the centrifugal force (which arises from the Earth's rotation).

Alright, our API is working

So next, we will use Gradio to build frontend

$ pip install gradio

Modify your program to use Gradio Interface, we will wrap our API call to a method that will accept prompts as input and output will be a model response

import requests
import json
import gradio as gr

url = "http://localhost:11434/api/generate"

headers = {
	'Content-Type': 'application/json'
}

def generate_response(prompt):
	data = {
		"model": "llama2",
		"stream": False,
		"prompt": prompt
	}

	response =  requests.post(url, headers=headers, data=json.dumps(data))

	if response.status_code == 200:
	  response_txt = response.text
	  data = json.loads(response_txt)
	  actual_response = data["response"]
	  return actual_response
	else:
	  print("Error:", response.status_code, response.text)


iface = gr.Interface(
  fn=generate_response,
  inputs=["text"],
  outputs=["text"]
)

if __name__ == "__main__":
  iface.launch()

Run the program

$ python app.py

Here we go!! Our local chatbot is working, you can modify this program further to remember history

But there is a problem, if we ask for therefore the model to tell us our last message it will not be able to remember it so let's implement chat history

import requests
import json
import gradio as gr

url = "http://localhost:11434/api/generate"

headers = {
	'Content-Type': 'application/json'
}

conversation_history = []

def generate_response(prompt):
	conversation_history.append(prompt)

	full_prompt = "\n".join(conversation_history)

	data = {
		"model": "llama2",
		"stream": False,
		"prompt": full_prompt
	}

	response =  requests.post(url, headers=headers, data=json.dumps(data))

	if response.status_code == 200:
	  response_txt = response.text
	  data = json.loads(response_txt)
	  actual_response = data["response"]
	  conversation_history.append(actual_response)
	  return actual_response
	else:
	  print("Error:", response.status_code, response.text)


iface = gr.Interface(
  fn=generate_response,
  inputs=["text"],
  outputs=["text"]
)

if __name__ == "__main__":
  iface.launch()

Output

So now our chatbot remembers history and sends history along with a new prompt

FAQs about ChatGPT Clones

What is a ChatGPT Clone?

The ChatGPT clone is a replica or imitation of the original ChatGPT model created by OpenAI. It uses similar language models and algorithms to generate human-like responses in a conversational format.

How can I build a ChatGPT Clone?

There are several ways to build a ChatGPT clone, one is to use Ollama open-source model, you can run Llama 2, Mistral , Gemma models locally and build chatbot application on top of it.

Can I customize the responses of my ChatGPT Clone?

Yes, you have the ability to tailor responses from your ChatGPT Clone. By customizing prompts and providing specific guidelines, you can shape the generated responses to meet your needs. For instance, if you require HTML markup in the response, you can structure your prompts accordingly. This flexibility allows for personalized interactions that align with your preferences and objectives.

How can I add styling to the frontend of my ChatGPT Clone?

To enhance the aesthetics of your ChatGPT Clone's frontend, leverage the capabilities of Gradio and Streamlit. Both platforms offer a range of templates and customization options to craft visually appealing user interfaces. Explore their diverse template libraries and harness their flexibility to design an interface that suits your preferences and engages your audience effectively.

ChatGPT Clone With Ollama & Gradio

FAQs about ChatGPT Clones

What is a ChatGPT Clone?

How can I build a ChatGPT Clone?

Can I customize the responses of my ChatGPT Clone?

How can I add styling to the frontend of my ChatGPT Clone?

Read next

Step-by-Step Guide - How to use Ollama to run open-source LLM models locally

What is Ollama? How to Use Ollama?