OmniParser API

The OmniParser API provides powerful screen parsing and UI element detection capabilities. It can analyze screenshots or HTML content to identify and extract information about UI elements, their relationships, and hierarchical structure.

Examples


import requests
import base64

# Convert image to Base64
with open('screenshot.png', 'rb') as image_file:
    image_data = base64.b64encode(image_file.read()).decode('utf-8')

headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}

payload = {
    "model": "omniparser2",
    "base64_image":image_data,
}

response = requests.post(
    'https://api.inferenceapis.com',
    json=payload,
    headers=headers
)

print(response.json())

Parameters

Parameter Type Required Description
model string Yes omniparser2
base64_image string Yes Base64-encoded image for processing.
box_threshold float No Confidence threshold for detecting UI elements (0.01 - 1.0).
iou_threshold float No Intersection over Union (IoU) threshold for overlapping elements (0.1 - 1.0).
text_threshold float No Confidence level for text detection (0.1 - 1.0).
use_paddleocr boolean No Use PaddleOCR for improved text detection.