MetaVision API Documentation

Introduction

MetaVision is an advanced, high-performance Multi-Modal AI analysis engine.

Unlike traditional computer vision APIs that return static labels, MetaVision allows you to define flexible Analysis Categories or use Custom Prompts to extract structured intelligence from media (Images, Videos, and Audio).

Try the Interactive Playground →

Authentication

All API requests require authentication using a Bearer token. Include your API key in the Authorization header of every request:

Authorization: Bearer YOUR_API_KEY

Core Analysis

The primary endpoint handles image, video, and audio analysis.

POST /api/v1/analyze

Request Structure

Send a JSON body with the media content and configuration.

Required Parameters

ParameterTypeDescription
mediastringThe content to analyze. Must be a valid HTTP URL or a Base64 encoded string (image, video, or audio/mp3).

Optional Parameters

ParameterTypeDefaultDescription
categoriesarray[string](All)List of Category IDs to execute (e.g., ["title", "objects"]).
custom_promptstringnullProvide a custom analysis instruction (max 2000 chars). The result will appear under the key custom_analysis.
modelstringnullForce the use of a specific Agent. Must use the Model UUID found in the Models section.
agent_strategystring"random"If no model is specified: random (load balancing) or best (highest reliability score).
detailstring"high"Analysis depth: low (faster, less token usage) or high (more detailed analysis).

Response Structure

The API returns a JSON object divided into metadata and data results.

{
  "meta": {
    "timestamp": "2024-03-20T10:00:00.000Z",
    "model_id": "8f32e9...",
    "model_name": "general-v2",
    "execution_time": 2.45,
    "successful_categories": 2,
    "failed_categories": 0,
    "total_tokens_in": 1500,
    "total_tokens_out": 300,
    "estimated_cost": 0.0045
  },
  "data": {
    "title": {
      "result": "Sunset over a mountain range"
    },
    "custom_analysis": {
      "result": "The mood is serene and peaceful."
    }
  }
}

Meta Object

FieldTypeDescription
model_idstringThe UUID of the model that processed the request.
model_namestringThe display name of the model.
execution_timefloatTotal processing time in seconds.
total_tokens_inintegerTotal input tokens consumed by the agents.
total_tokens_outintegerTotal output tokens generated.
estimated_costfloatEstimated cost of the request in USD.

Analysis Categories

Categories define specific tasks for the engine. You can select specific categories in your request to tailor the analysis output.

Loading categories...

Available Models

Use the UUID below in the model parameter to force a specific agent.

Loading models...

Advanced Options

Agent Consistency

MetaVision ensures consistency by using a single Agent for all categories within a single request. This prevents conflicting interpretations of the media across different analysis tasks.

Audio Analysis

MetaVision supports audio files (MP3, WAV, OGG). The engine listens to speech, tone, and background audio to perform the analysis. Max file size: 10MB.

Video Analysis

MetaVision supports video files (MP4, MOV, WebM). The engine samples frames from the video to perform the analysis.

Performance Note: Video and Audio analysis are significantly more computationally expensive than images. Execution times may range from 15s to 45s.

Error Handling

Errors can occur at the request level (4xx/5xx status codes) or at the individual category level.

API Errors

StatusCodeDescription
400invalid_requestMalformed JSON, missing 'media', or invalid 'model' UUID.
400invalid_categoriesRequested categories do not exist or are disabled.
401invalid_api_keyMissing or incorrect Authorization header.
403account_disabledThe user account has been disabled by an admin.
413payload_too_largeMedia exceeds the maximum allowed size (10MB).
500internal_errorUnexpected server or upstream provider error.

Code Examples

import requests

API_KEY = "your_api_key"
API_URL = "https://metavision.umbrosus.com/api/v1/analyze"

payload = {
    "media": "https://example.com/image.jpg",
    "categories": ["title", "description"],
    "custom_prompt": "Describe the emotion.",
    "model": "8f32e9...", # Optional UUID
    "agent_strategy": "best"
}

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(API_URL, json=payload, headers=headers)
print(response.json())
const API_KEY = 'your_api_key';
const API_URL = 'https://metavision.umbrosus.com/api/v1/analyze';

async function analyzeMedia() {
  const response = await fetch(API_URL, {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      media: "https://example.com/audio.mp3", // Audio supported
      categories: ["title"],
      custom_prompt: "Transcribe the speech and describe tone.",
      model: "8f32e9..." // Optional UUID
    })
  });

  const result = await response.json();
  console.log(result);
}

analyzeMedia();
curl -X POST "https://metavision.umbrosus.com/api/v1/analyze" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "media": "https://example.com/image.jpg",
    "custom_prompt": "List all colors present.",
    "agent_strategy": "best"
  }'