OmniParser API
The OmniParser API provides powerful screen parsing and UI element detection capabilities. It can analyze screenshots or HTML content to identify and extract information about UI elements, their relationships, and hierarchical structure.
Examples
import requests
import base64
# Convert image to Base64
with open('screenshot.png', 'rb') as image_file:
image_data = base64.b64encode(image_file.read()).decode('utf-8')
headers = {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json'
}
payload = {
"model": "omniparser2",
"base64_image":image_data,
}
response = requests.post(
'https://api.inferenceapis.com',
json=payload,
headers=headers
)
print(response.json())
Parameters
Parameter | Type | Required | Description |
---|---|---|---|
model | string | Yes | omniparser2 |
base64_image | string | Yes | Base64-encoded image for processing. |
box_threshold | float | No | Confidence threshold for detecting UI elements (0.01 - 1.0). |
iou_threshold | float | No | Intersection over Union (IoU) threshold for overlapping elements (0.1 - 1.0). |
text_threshold | float | No | Confidence level for text detection (0.1 - 1.0). |
use_paddleocr | boolean | No | Use PaddleOCR for improved text detection. |