Notice: session_start(): ps_files_cleanup_dir: opendir(/var/lib/php/sessions) failed: Permission denied (13) in /home/inferenceapis/web/header.php on line 4

Warning: session_start(): Session cache limiter cannot be sent after headers have already been sent (output started at /home/inferenceapis/web/header.php:4) in /home/inferenceapis/web/header.php on line 4
Llama 3.3 70B Instruct API - Inference APIs

Llama 3.3 70B Instruct API

The LLaMA API wraps the LLaMA 3.3 70B model and lets you submit structured prompts to receive conversational responses. Ideal for natural dialogue, character-based prompting, or general-purpose language generation.

Base URL: https://api.inferenceapis.com

Endpoints

Chat Completion

POST /

Submit a conversation and get a response from the model.

Examples


import requests

payload = {
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [
        {"role": "system", "content": "Act like you're a cowboy."},
        {"role": "user", "content": "What did you do today?"}
    ],
}

headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}
response = requests.post(
    'https://api.inferenceapis.com',
    json=payload,
    headers=headers
)

print(response.json())
        

Parameters

Parameter Type Required Description
model string Yes Must be meta-llama/Llama-3.3-70B-Instruct-Turbo
messages array Yes List of objects with role (system/user/assistant) and content.
max_tokens integer No Max tokens to generate (default is model-defined).
temperature float No Controls randomness. Higher = more creative.
top_p float No Nucleus sampling probability cutoff.
min_p float No Minimum cumulative probability during sampling.
top_k integer No Limits token sampling to top-k tokens.
repetition_penalty float No Discourages repeating tokens.
presence_penalty float No Discourages previously mentioned topics.
frequency_penalty float No Penalizes frequent tokens.
seed integer No Random seed for deterministic outputs.