Llama 3.3 70B Instruct API

The LLaMA API wraps the LLaMA 3.3 70B model and lets you submit structured prompts to receive conversational responses. Ideal for natural dialogue, character-based prompting, or general-purpose language generation.

Base URL: https://api.inferenceapis.com

Endpoints

Chat Completion

POST /

Submit a conversation and get a response from the model.

Examples


import requests

payload = {
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [
        {"role": "system", "content": "Act like you're a cowboy."},
        {"role": "user", "content": "What did you do today?"}
    ],
}

headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}
response = requests.post(
    'https://api.inferenceapis.com',
    json=payload,
    headers=headers
)

print(response.json())

Parameters

Parameter	Type	Required	Description
model	string	Yes	Must be `meta-llama/Llama-3.3-70B-Instruct-Turbo`
messages	array	Yes	List of objects with `role` (system/user/assistant) and `content`.
max_tokens	integer	No	Max tokens to generate (default is model-defined).
temperature	float	No	Controls randomness. Higher = more creative.
top_p	float	No	Nucleus sampling probability cutoff.
min_p	float	No	Minimum cumulative probability during sampling.
top_k	integer	No	Limits token sampling to top-k tokens.
repetition_penalty	float	No	Discourages repeating tokens.
presence_penalty	float	No	Discourages previously mentioned topics.
frequency_penalty	float	No	Penalizes frequent tokens.
seed	integer	No	Random seed for deterministic outputs.