llama-3.3-70b-instruct-fp8-fast
Text Generation • MetaLlama 3.3 70B quantized to fp8 precision, optimized to be faster.
| Model Info | |
|---|---|
| Context Window ↗ | 24,000 tokens | 
| Terms and License | link ↗ | 
| Function calling ↗ | Yes | 
| Batch | Yes | 
| Unit Pricing | $0.29 per M input tokens, $2.25 per M output tokens | 
Playground
Try out this model with Workers AI LLM Playground. It does not require any setup or authentication and an instant way to preview and test a model directly in the browser.
Launch the LLM PlaygroundUsage
Worker - Streaming
 export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {
    const messages = [      { role: "system", content: "You are a friendly assistant" },      {        role: "user",        content: "What is the origin of the phrase Hello, World",      },    ];
    const stream = await env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", {      messages,      stream: true,    });
    return new Response(stream, {      headers: { "content-type": "text/event-stream" },    });  },} satisfies ExportedHandler<Env>;Worker
 export interface Env {  AI: Ai;}
export default {  async fetch(request, env): Promise<Response> {
    const messages = [      { role: "system", content: "You are a friendly assistant" },      {        role: "user",        content: "What is the origin of the phrase Hello, World",      },    ];    const response = await env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", { messages });
    return Response.json(response);  },} satisfies ExportedHandler<Env>;Python
 import osimport requests
ACCOUNT_ID = "your-account-id"AUTH_TOKEN = os.environ.get("CLOUDFLARE_AUTH_TOKEN")
prompt = "Tell me all about PEP-8"response = requests.post(  f"https://api.cloudflare.com/client/v4/accounts/{ACCOUNT_ID}/ai/run/@cf/meta/llama-3.3-70b-instruct-fp8-fast",    headers={"Authorization": f"Bearer {AUTH_TOKEN}"},    json={      "messages": [        {"role": "system", "content": "You are a friendly assistant"},        {"role": "user", "content": prompt}      ]    })result = response.json()print(result)curl
 curl https://api.cloudflare.com/client/v4/accounts/$CLOUDFLARE_ACCOUNT_ID/ai/run/@cf/meta/llama-3.3-70b-instruct-fp8-fast \  -X POST \  -H "Authorization: Bearer $CLOUDFLARE_AUTH_TOKEN" \  -d '{ "messages": [{ "role": "system", "content": "You are a friendly assistant" }, { "role": "user", "content": "Why is pizza so good" }]}'Parameters
* indicates a required field
Input
-  0object-  promptstring required min 1The input text prompt for the model to generate a response. 
-  lorastringName of the LoRA (Low-Rank Adaptation) model to fine-tune the base model. 
-  response_formatobject-  typestring
-  json_schema
 
-  
-  rawbooleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting. 
-  streambooleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events. 
-  max_tokensinteger default 256The maximum number of tokens to generate in the response. 
-  temperaturenumber default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results. 
-  top_pnumber min 0.001 max 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. 
-  top_kinteger min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. 
-  seedinteger min 1 max 9999999999Random seed for reproducibility of the generation. 
-  repetition_penaltynumber min 0 max 2Penalty for repeated tokens; higher values discourage repetition. 
-  frequency_penaltynumber min -2 max 2Decreases the likelihood of the model repeating the same lines verbatim. 
-  presence_penaltynumber min -2 max 2Increases the likelihood of the model introducing new topics. 
 
-  
-  1object-  messagesarray requiredAn array of message objects representing the conversation history. -  itemsobject-  rolestring requiredThe role of the message sender (e.g., 'user', 'assistant', 'system', 'tool'). 
-  contentstring requiredThe content of the message as a string. 
 
-  
 
-  
-  functionsarray-  itemsobject-  namestring required
-  codestring required
 
-  
 
-  
-  toolsarrayA list of tools available for the assistant to use. -  itemsone of-  0object-  namestring requiredThe name of the tool. More descriptive the better. 
-  descriptionstring requiredA brief description of what the tool does. 
-  parametersobject requiredSchema defining the parameters accepted by the tool. -  typestring requiredThe type of the parameters object (usually 'object'). 
-  requiredarrayList of required parameter names. -  itemsstring
 
-  
-  propertiesobject requiredDefinitions of each parameter. -  additionalPropertiesobject-  typestring requiredThe data type of the parameter. 
-  descriptionstring requiredA description of the expected parameter. 
 
-  
 
-  
 
-  
 
-  
-  1object-  typestring requiredSpecifies the type of tool (e.g., 'function'). 
-  functionobject requiredDetails of the function tool. -  namestring requiredThe name of the function. 
-  descriptionstring requiredA brief description of what the function does. 
-  parametersobject requiredSchema defining the parameters accepted by the function. -  typestring requiredThe type of the parameters object (usually 'object'). 
-  requiredarrayList of required parameter names. -  itemsstring
 
-  
-  propertiesobject requiredDefinitions of each parameter. -  additionalPropertiesobject-  typestring requiredThe data type of the parameter. 
-  descriptionstring requiredA description of the expected parameter. 
 
-  
 
-  
 
-  
 
-  
 
-  
 
-  
 
-  
-  response_formatobject-  typestring
-  json_schema
 
-  
-  rawbooleanIf true, a chat template is not applied and you must adhere to the specific model's expected formatting. 
-  streambooleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events. 
-  max_tokensinteger default 256The maximum number of tokens to generate in the response. 
-  temperaturenumber default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results. 
-  top_pnumber min 0.001 max 1Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. 
-  top_kinteger min 1 max 50Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises. 
-  seedinteger min 1 max 9999999999Random seed for reproducibility of the generation. 
-  repetition_penaltynumber min 0 max 2Penalty for repeated tokens; higher values discourage repetition. 
-  frequency_penaltynumber min -2 max 2Decreases the likelihood of the model repeating the same lines verbatim. 
-  presence_penaltynumber min -2 max 2Increases the likelihood of the model introducing new topics. 
 
-  
-  2object-  requestsarray-  itemsobject-  external_referencestringUser-supplied reference. This field will be present in the response as well it can be used to reference the request and response. It's NOT validated to be unique. 
-  promptstring min 1Prompt for the text generation model 
-  streambooleanIf true, the response will be streamed back incrementally using SSE, Server Sent Events. 
-  max_tokensinteger default 256The maximum number of tokens to generate in the response. 
-  temperaturenumber default 0.6 min 0 max 5Controls the randomness of the output; higher values produce more random results. 
-  top_pnumber min 0 max 2Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses. 
-  seedinteger min 1 max 9999999999Random seed for reproducibility of the generation. 
-  repetition_penaltynumber min 0 max 2Penalty for repeated tokens; higher values discourage repetition. 
-  frequency_penaltynumber min 0 max 2Decreases the likelihood of the model repeating the same lines verbatim. 
-  presence_penaltynumber min 0 max 2Increases the likelihood of the model introducing new topics. 
-  response_formatobject-  typestring
-  json_schema
 
-  
 
-  
 
-  
 
-  
Output
-  0object-  responsestring requiredThe generated text response from the model 
-  usageobjectUsage statistics for the inference request -  prompt_tokensnumber 0Total number of tokens in input 
-  completion_tokensnumber 0Total number of tokens in output 
-  total_tokensnumber 0Total number of input and output tokens 
 
-  
-  tool_callsarrayAn array of tool calls requests made during the response generation -  itemsobject-  argumentsobjectThe arguments passed to be passed to the tool call request 
-  namestringThe name of the tool to be called 
 
-  
 
-  
 
-  
-  1string
-  2object-  request_idstringThe async request id that can be used to obtain the results. 
 
-  
API Schemas
The following schemas are based on JSON Schema
{    "type": "object",    "oneOf": [        {            "title": "Meta_Llama_3_3_70B_Instruct_Fp8_Fast_Prompt",            "properties": {                "prompt": {                    "type": "string",                    "minLength": 1,                    "description": "The input text prompt for the model to generate a response."                },                "lora": {                    "type": "string",                    "description": "Name of the LoRA (Low-Rank Adaptation) model to fine-tune the base model."                },                "response_format": {                    "title": "JSON Mode",                    "type": "object",                    "properties": {                        "type": {                            "type": "string",                            "enum": [                                "json_object",                                "json_schema"                            ]                        },                        "json_schema": {}                    }                },                "raw": {                    "type": "boolean",                    "default": false,                    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."                },                "stream": {                    "type": "boolean",                    "default": false,                    "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."                },                "max_tokens": {                    "type": "integer",                    "default": 256,                    "description": "The maximum number of tokens to generate in the response."                },                "temperature": {                    "type": "number",                    "default": 0.6,                    "minimum": 0,                    "maximum": 5,                    "description": "Controls the randomness of the output; higher values produce more random results."                },                "top_p": {                    "type": "number",                    "minimum": 0.001,                    "maximum": 1,                    "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."                },                "top_k": {                    "type": "integer",                    "minimum": 1,                    "maximum": 50,                    "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."                },                "seed": {                    "type": "integer",                    "minimum": 1,                    "maximum": 9999999999,                    "description": "Random seed for reproducibility of the generation."                },                "repetition_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Penalty for repeated tokens; higher values discourage repetition."                },                "frequency_penalty": {                    "type": "number",                    "minimum": -2,                    "maximum": 2,                    "description": "Decreases the likelihood of the model repeating the same lines verbatim."                },                "presence_penalty": {                    "type": "number",                    "minimum": -2,                    "maximum": 2,                    "description": "Increases the likelihood of the model introducing new topics."                }            },            "required": [                "prompt"            ]        },        {            "title": "Meta_Llama_3_3_70B_Instruct_Fp8_Fast_Messages",            "properties": {                "messages": {                    "type": "array",                    "description": "An array of message objects representing the conversation history.",                    "items": {                        "type": "object",                        "properties": {                            "role": {                                "type": "string",                                "description": "The role of the message sender (e.g., 'user', 'assistant', 'system', 'tool')."                            },                            "content": {                                "type": "string",                                "description": "The content of the message as a string."                            }                        },                        "required": [                            "role",                            "content"                        ]                    }                },                "functions": {                    "type": "array",                    "items": {                        "type": "object",                        "properties": {                            "name": {                                "type": "string"                            },                            "code": {                                "type": "string"                            }                        },                        "required": [                            "name",                            "code"                        ]                    }                },                "tools": {                    "type": "array",                    "description": "A list of tools available for the assistant to use.",                    "items": {                        "type": "object",                        "oneOf": [                            {                                "properties": {                                    "name": {                                        "type": "string",                                        "description": "The name of the tool. More descriptive the better."                                    },                                    "description": {                                        "type": "string",                                        "description": "A brief description of what the tool does."                                    },                                    "parameters": {                                        "type": "object",                                        "description": "Schema defining the parameters accepted by the tool.",                                        "properties": {                                            "type": {                                                "type": "string",                                                "description": "The type of the parameters object (usually 'object')."                                            },                                            "required": {                                                "type": "array",                                                "description": "List of required parameter names.",                                                "items": {                                                    "type": "string"                                                }                                            },                                            "properties": {                                                "type": "object",                                                "description": "Definitions of each parameter.",                                                "additionalProperties": {                                                    "type": "object",                                                    "properties": {                                                        "type": {                                                            "type": "string",                                                            "description": "The data type of the parameter."                                                        },                                                        "description": {                                                            "type": "string",                                                            "description": "A description of the expected parameter."                                                        }                                                    },                                                    "required": [                                                        "type",                                                        "description"                                                    ]                                                }                                            }                                        },                                        "required": [                                            "type",                                            "properties"                                        ]                                    }                                },                                "required": [                                    "name",                                    "description",                                    "parameters"                                ]                            },                            {                                "properties": {                                    "type": {                                        "type": "string",                                        "description": "Specifies the type of tool (e.g., 'function')."                                    },                                    "function": {                                        "type": "object",                                        "description": "Details of the function tool.",                                        "properties": {                                            "name": {                                                "type": "string",                                                "description": "The name of the function."                                            },                                            "description": {                                                "type": "string",                                                "description": "A brief description of what the function does."                                            },                                            "parameters": {                                                "type": "object",                                                "description": "Schema defining the parameters accepted by the function.",                                                "properties": {                                                    "type": {                                                        "type": "string",                                                        "description": "The type of the parameters object (usually 'object')."                                                    },                                                    "required": {                                                        "type": "array",                                                        "description": "List of required parameter names.",                                                        "items": {                                                            "type": "string"                                                        }                                                    },                                                    "properties": {                                                        "type": "object",                                                        "description": "Definitions of each parameter.",                                                        "additionalProperties": {                                                            "type": "object",                                                            "properties": {                                                                "type": {                                                                    "type": "string",                                                                    "description": "The data type of the parameter."                                                                },                                                                "description": {                                                                    "type": "string",                                                                    "description": "A description of the expected parameter."                                                                }                                                            },                                                            "required": [                                                                "type",                                                                "description"                                                            ]                                                        }                                                    }                                                },                                                "required": [                                                    "type",                                                    "properties"                                                ]                                            }                                        },                                        "required": [                                            "name",                                            "description",                                            "parameters"                                        ]                                    }                                },                                "required": [                                    "type",                                    "function"                                ]                            }                        ]                    }                },                "response_format": {                    "title": "JSON Mode",                    "type": "object",                    "properties": {                        "type": {                            "type": "string",                            "enum": [                                "json_object",                                "json_schema"                            ]                        },                        "json_schema": {}                    }                },                "raw": {                    "type": "boolean",                    "default": false,                    "description": "If true, a chat template is not applied and you must adhere to the specific model's expected formatting."                },                "stream": {                    "type": "boolean",                    "default": false,                    "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."                },                "max_tokens": {                    "type": "integer",                    "default": 256,                    "description": "The maximum number of tokens to generate in the response."                },                "temperature": {                    "type": "number",                    "default": 0.6,                    "minimum": 0,                    "maximum": 5,                    "description": "Controls the randomness of the output; higher values produce more random results."                },                "top_p": {                    "type": "number",                    "minimum": 0.001,                    "maximum": 1,                    "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."                },                "top_k": {                    "type": "integer",                    "minimum": 1,                    "maximum": 50,                    "description": "Limits the AI to choose from the top 'k' most probable words. Lower values make responses more focused; higher values introduce more variety and potential surprises."                },                "seed": {                    "type": "integer",                    "minimum": 1,                    "maximum": 9999999999,                    "description": "Random seed for reproducibility of the generation."                },                "repetition_penalty": {                    "type": "number",                    "minimum": 0,                    "maximum": 2,                    "description": "Penalty for repeated tokens; higher values discourage repetition."                },                "frequency_penalty": {                    "type": "number",                    "minimum": -2,                    "maximum": 2,                    "description": "Decreases the likelihood of the model repeating the same lines verbatim."                },                "presence_penalty": {                    "type": "number",                    "minimum": -2,                    "maximum": 2,                    "description": "Increases the likelihood of the model introducing new topics."                }            },            "required": [                "messages"            ]        },        {            "title": "Async Batch",            "type": "object",            "properties": {                "requests": {                    "type": "array",                    "items": {                        "type": "object",                        "properties": {                            "external_reference": {                                "type": "string",                                "description": "User-supplied reference. This field will be present in the response as well it can be used to reference the request and response. It's NOT validated to be unique."                            },                            "prompt": {                                "type": "string",                                "minLength": 1,                                "description": "Prompt for the text generation model"                            },                            "stream": {                                "type": "boolean",                                "default": false,                                "description": "If true, the response will be streamed back incrementally using SSE, Server Sent Events."                            },                            "max_tokens": {                                "type": "integer",                                "default": 256,                                "description": "The maximum number of tokens to generate in the response."                            },                            "temperature": {                                "type": "number",                                "default": 0.6,                                "minimum": 0,                                "maximum": 5,                                "description": "Controls the randomness of the output; higher values produce more random results."                            },                            "top_p": {                                "type": "number",                                "minimum": 0,                                "maximum": 2,                                "description": "Adjusts the creativity of the AI's responses by controlling how many possible words it considers. Lower values make outputs more predictable; higher values allow for more varied and creative responses."                            },                            "seed": {                                "type": "integer",                                "minimum": 1,                                "maximum": 9999999999,                                "description": "Random seed for reproducibility of the generation."                            },                            "repetition_penalty": {                                "type": "number",                                "minimum": 0,                                "maximum": 2,                                "description": "Penalty for repeated tokens; higher values discourage repetition."                            },                            "frequency_penalty": {                                "type": "number",                                "minimum": 0,                                "maximum": 2,                                "description": "Decreases the likelihood of the model repeating the same lines verbatim."                            },                            "presence_penalty": {                                "type": "number",                                "minimum": 0,                                "maximum": 2,                                "description": "Increases the likelihood of the model introducing new topics."                            },                            "response_format": {                                "title": "JSON Mode",                                "type": "object",                                "properties": {                                    "type": {                                        "type": "string",                                        "enum": [                                            "json_object",                                            "json_schema"                                        ]                                    },                                    "json_schema": {}                                }                            }                        }                    }                }            }        }    ]}{    "oneOf": [        {            "type": "object",            "contentType": "application/json",            "properties": {                "response": {                    "type": "string",                    "description": "The generated text response from the model"                },                "usage": {                    "type": "object",                    "description": "Usage statistics for the inference request",                    "properties": {                        "prompt_tokens": {                            "type": "number",                            "description": "Total number of tokens in input",                            "default": 0                        },                        "completion_tokens": {                            "type": "number",                            "description": "Total number of tokens in output",                            "default": 0                        },                        "total_tokens": {                            "type": "number",                            "description": "Total number of input and output tokens",                            "default": 0                        }                    }                },                "tool_calls": {                    "type": "array",                    "description": "An array of tool calls requests made during the response generation",                    "items": {                        "type": "object",                        "properties": {                            "arguments": {                                "type": "object",                                "description": "The arguments passed to be passed to the tool call request"                            },                            "name": {                                "type": "string",                                "description": "The name of the tool to be called"                            }                        }                    }                }            },            "required": [                "response"            ]        },        {            "type": "string",            "contentType": "text/event-stream",            "format": "binary"        },        {            "type": "object",            "contentType": "application/json",            "title": "Async response",            "properties": {                "request_id": {                    "type": "string",                    "description": "The async request id that can be used to obtain the results."                }            }        }    ]}Was this helpful?
- Resources
- API
- New to Cloudflare?
- Directory
- Sponsorships
- Open Source
- Support
- Help Center
- System Status
- Compliance
- GDPR
- Company
- cloudflare.com
- Our team
- Careers
- © 2025 Cloudflare, Inc.
- Privacy Policy
- Terms of Use
- Report Security Issues
- Trademark