Skip to main content

Generate from image (Streamed)

POST 

/image_generate_stream

Generate from image (Streamed)

The /image_generate_stream endpoint is used to communicate with the LLM. Use this endpoint when you want to send an image to a multimodal LLM, and receive a stream of responses from the LLM, token by token. If you want your response to be returned all at once, see the /image_generate endpoint.

This endpoint takes a multipart input, with two required fields:

  1. 'json_data': should contain json data, matching the format used for the /generate and /generate_stream endpoints.
  2. 'image_data': a stream of bytes, representing an image file.

Multipart requests support is built into most common HTTP clients.

To send a batch of requests with the same image, the text field of the json payload can be either a string, or an array of strings. Only one image can be supplied per request - to supply a set of generation requests each to different images, send them in quick succession and rely on automatic batching.

The response is a stream of server sent events, where each event is a token generated by the LLM. If you've supplied a batch of inputs:

{
"text": ["1 2 3 4", "a b c d"]
}

The server sent events data fields will be a stream of json payloads, with each payload having a text field containing the token, and a batch_id field containing the index of the batch that the token belongs to.

data:{"text": "5", "batch_id": 0}

data:{"text": "e", "batch_id": 1}

data:{"text": "6", "batch_id": 0}

data:{"text": "f", "batch_id": 1}

The specific order of the batch_ids of the returned tokens is not guaranteed.

Request​

Body

required

    image_databinaryrequired

    json_data

    object

    required

    JSON generation payload, used in /generate, /generate_stream, /image_generate, /image_generate_stream

    constrained_decoding_backendstringnullable
    consumer_groupstringnullable
    image_pathsstring[]nullable
    json_schemanullable
    lora_idstringnullable
    max_new_tokensint64nullable
    min_new_tokensint64nullable
    no_repeat_ngram_sizeint64nullable
    prompt_max_tokensint64nullable
    regex_stringstringnullable
    repetition_penaltyfloatnullable
    sampling_temperaturefloatnullable
    sampling_topkint64nullable
    sampling_toppfloatnullable

    text

    object

    required

    Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.

    oneOf

    string

    use_chat_templatebooleannullable

Responses​

Takes in a JSON payload and returns the response token by token, as a stream of server sent events.

Schema

    text

    object

    required

    Input Text used for ease of users not to have to use the clunky PayloadText. Mapping provided below to convert InputText to PayloadText.

    oneOf

    string