Llama 3.3 70B Instruct API

The LLaMA API wraps the LLaMA 3.3 70B model and lets you submit structured prompts to receive conversational responses. Ideal for natural dialogue, character-based prompting, or general-purpose language generation.

Base URL: https://api.inferenceapis.com

Endpoints

Chat Completion

POST /

Submit a conversation and get a response from the model.

Examples


import requests

payload = {
    "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo",
    "messages": [
        {"role": "system", "content": "Act like you're a cowboy."},
        {"role": "user", "content": "What did you do today?"}
    ],
}

headers = {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
}
response = requests.post(
    'https://api.inferenceapis.com',
    json=payload,
    headers=headers
)

print(response.json())
        

Parameters

Parameter Type Required Description
model string Yes Must be meta-llama/Llama-3.3-70B-Instruct-Turbo
messages array Yes List of objects with role (system/user/assistant) and content.
max_tokens integer No Max tokens to generate (default is model-defined).
temperature float No Controls randomness. Higher = more creative.
top_p float No Nucleus sampling probability cutoff.
min_p float No Minimum cumulative probability during sampling.
top_k integer No Limits token sampling to top-k tokens.
repetition_penalty float No Discourages repeating tokens.
presence_penalty float No Discourages previously mentioned topics.
frequency_penalty float No Penalizes frequent tokens.
seed integer No Random seed for deterministic outputs.