How to use vision capabilities with Gemini and PHP

August 17, 2024
Posted by: Editor
Categories: Gemini, Google, Uncategorized

Although there’s no official Google Gemini API library for PHP it’s very easy to connect to the API using basic PHP code and curl. We’ve previous talked about how to get a response from Gemini using a text promp. This time we’ll show how easy it is to upload an image to the API and ask questions about it.

Prerequisites

IDE or your favorite Text Editor e.g. Visual Studio Code or Notepad++
Gemini API Key
PHP 7.4+

To create an API key you will need to go to https://aistudio.google.com/app/u/1/apikey and create an Create API key.

If you have a pre-existing project in your Google Cloud projects you can use the search bar to select it.

Alternatively, you can create an API key by clicking the blue Create API key in new project.

Now create your project folder. We can call this folder gemini-vision for testing.

Create your index.php file and add the following code. Paste your API key into the GEMINI_API_KEY definition. Gemini offers different models depending on your needs or preferences. You can take a look at those over here. For the example below we will be using Gemini 1.5 Flash Latest.

Our core function will look like this.

Our generateContent function takes a text prompt, and image prompt and image type. Our image prompt will be a base64 encoded image while our image type will be the mime type of the file. It’s important to note that Google Gemini only supports specific image types.

Images can be one of the following image data MIME types:

PNG – image/png
JPEG – image/jpeg
WEBP – image/webp
HEIC – image/heic
HEIF – image/heif

Each image is equivalent to 258 tokens.

<?php
define('GEMINI_API_KEY','your api key here');
define('MODEL','gemini-1.5-flash-latest');
define('BASEURL', 'https://generativelanguage.googleapis.com/v1beta');



function generateContent($textPrompt,$imagePrompt,$imageType)
{
    $text = filter_var($textPrompt, FILTER_SANITIZE_STRING);
    //combine the base url with preferred model and api key
    $url = sprintf("%s/models/%s:generateContent?key=%s",BASEURL,MODEL,GEMINI_API_KEY);
    
    $data = [
        "contents" => [
            "parts" => [
                [
                    "inlineData" => [
                        "mimeType" => $imageType,
                        "data" => $imagePrompt
                    ]
                ],
                [
                    "text" => $text
                ]
            ]
        ]
    ];
   
    $jsonData = json_encode($data);
    $ch = curl_init($url);
    
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $jsonData);

    $response = curl_exec($ch);

    if (curl_errno($ch)) {
        echo 'Request error: ' . curl_error($ch);
    } else {
     $response = json_decode($response); 

     if (isset($response->candidates[0]->content->parts[0]->text)) {
         $text = $response->candidates[0]->content->parts[0]->text;
         echo $text;
     } else {      
         echo "No response returned";
     }
    }

    curl_close($ch);
}

Let break this down a bit. For our example, we’re using the inlineData option. This is what allows us to upload our image directly to Gemini. We then pass our “text prompt” in the text part. This could be something like, “Analyze the following image and provide feedback.”

In order to upload our image, let’s create a simple html form that will take some text and allow us to upload an image.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Content Submission Form</title>
</head>
<body>
    <form action="" method="post" enctype="multipart/form-data">
        <label for="text">Enter Text:</label>
        <input type="text" name="text" id="text" required><br><br>

        <label for="image">Upload Image:</label>
        <input type="file" name="image" id="image" accept="image/*" required><br><br>

        <button type="submit">Submit</button>
    </form>
</body>
</html>

Now we need to handle the image upload and pass it into our function. Insert the following PHP code into the same file. Here we check the allowed image types and if it’s valid we upload the image to our server.

if ($_SERVER["REQUEST_METHOD"] == "POST") {
    $text = $_POST['text'];
    if (isset($_FILES['image']) && $_FILES['image']['error'] == 0) {
        // only allow the selected image types
        $allowedMimeTypes = ['image/jpeg', 'image/png', 'image/gif','image/webp','image/heic','image/heif'];
        $fileMimeType = mime_content_type($_FILES['image']['tmp_name']);
        $originalFileName = $_FILES['image']['name'];
        $extension = pathinfo($originalFileName, PATHINFO_EXTENSION);

        if (in_array($fileMimeType, $allowedMimeTypes)) {
            // generate a file name
            $newFileName = uniqid() . '-' . bin2hex(random_bytes(8)). '.' . $extension;
            // Secure file path
            //$uploadDirectory = getcwd().'/uploads/';
            $uploadDirectory = realpath(dirname(__FILE__)).DIRECTORY_SEPARATOR .'uploads';
            
            if (!is_dir($uploadDirectory)) {
                mkdir($uploadDirectory, 0755, true);
            }
            $filePath = $uploadDirectory .DIRECTORY_SEPARATOR. $newFileName;

            // Move the uploaded file to the target directory
            if (move_uploaded_file($_FILES['image']['tmp_name'], $filePath)) {
                $image = base64_encode(file_get_contents($filePath));                
            }
        }
        
    }

   
    generateContent($text,$image,$fileMimeType);
}

Then, we use file_get_contents to grab the contents of the image file, and base64 encode it. We can then pass this into our generateContent function to get a response. Here’s an example prompt and response using the following hummingbird photo.

Text Prompt

Analyze the following image and provide feedback.

Response from Gemini

The image is a beautiful shot of a hummingbird feeding from a red flower. The colors are vibrant and the hummingbird’s feathers are perfectly captured. The background is slightly out of focus, which helps draw the viewer’s attention to the subject. The image is well-composed and the lighting is good. Overall, this is a great picture!

Gemini is an exciting alternative to OpenAI and other competing products and offers a generous free tier which makes it a compelling offering for quick hobby projects or for testing various ideas. Vision recognition allows us to upload images for analysis. In our follow up article we’ll delve into how enhance this using ajax and also use vision capabilities to compare images.

How to use vision capabilities with Gemini and PHP

Text Prompt

Response from Gemini

Leave a Reply Cancel reply