Open AI just released their newest model GPT-4o with multimodal capabilities, which means it can process text, image and audio at the same time. It’s supposed to be 2x faster and half the price compared to GPT-4, also better than GPT-4 at most of the benchmarks.
First thing is that you need to have a developers account on OpenAI and have some credits there. To do so you can visit platform.openai.com. Once you’ve billing enabled, create an API key.
Link to Collab Notebook.
First we import the required libraries and specify the path of the image.
from google.colab import userdata
import base64
import requests
image_path = "<image path>"
Then we need to create a helper function to encode the image in base64 notation.
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
If we take a look at the image, it’s a right-angled triangle with a missing angle, C, which, if you have done basic math, can be calculated to be 180-90-25 = 65 degrees.

To use the API, you have to create a header and a json payload.
In the head, pass the content type and as application/json and your API key in the authorisation,
In the payload, you have to specify the model, gpt-4o in our case. Under messages, you can specify the text and image. Note that you can pass multiple texts, and the model will consider both images to answer the question. Here, we are passing a single image with the question to find the missing angle and only return the answer.
# Getting the base64 string
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {userdata.get('OPENAI_API_KEY')}" #Pass API KEY here
}
payload = {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Calculate the missing angle C in the image, only return the answer"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
],
"max_tokens": 200
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
The max tokens specify the maximum completion tokens allowed. Let us see if GPT-4o gets this correct.
It returns 65 degrees as expected. However, this post was intended to give you an idea of how you can use GPT-4o in your development projects via API.

Leave a comment