您现在的位置是：首页 > 文章详情

原生的来了！OpenAI 在 API 中引入 JSON 结构化输出功能

日期：2024-08-07点击：314收藏

OpenAI 在其 API 中引入了结构化输出功能，这意味着模型的输出可以可靠地遵循开发人员提供的 JSON 模式。

对复杂 JSON 模式进行评估时，具有结构化输出的新模型 gpt-4o-2024-08-06 得分为 100%。相比之下，gpt-4-0613 得分不到 40%。

这一功能包括两种形式：

函数调用：通过在函数定义中设置 strict: true可以使用工具的结构化输出。此功能适用于支持工具的所有型号大模型，包括所有型号 gpt-4-0613 和 gpt-3.5-turbo-0613 及更高版本。启用结构化输出后，模型输出将与提供的工具定义匹配。
response_format参数新选项：开发人员现在可以使用新参数 JSON 模式json_schema。此功能适用于最新的 GPT-4o 模型：gpt-4o-2024-08-06、gpt-4o-mini-2024-07-18。当response_format设定strict: true，模型输出将与提供的模式匹配。

函数调用通过在函数定义中设置结构化输出，使模型输出与提供的工具定义相匹配，适用于所有支持工具的模型。参数 response_format 允许开发人员通过提供 JSON 模式来约束模型的响应格式，适用于最新的 GPT-4o 模型。此外，新的结构化输出功能遵循 OpenAI 的安全政策，允许模型拒绝不安全的请求，并通过新的字符串值 refusal 在 API 响应中允许开发人员以编程方式检测模型的拒绝。

同时 OpenAI 还提供了原生 SDK 支持结构化输出，包括 Python 和 Node SDK，简化了开发过程。结构化输出还支持从非结构化数据中提取结构化数据，如会议记录中的待办事项和截止日期。为了实现这一功能，OpenAI 采用了基于上下文无关语法 (CFG) 的受限解码方法，而不是传统的有限状态机 (FSM) 或正则表达式，以处理更复杂的嵌套或递归数据结构。具体原理可以查看官方博客深入了解：https://openai.com/index/introducing-structured-outputs-in-the-api

结构化输出目前已在 API 中正式推出，支持所有支持函数调用的模型，包括 GPT-4o 和 GPT-4o-mini 系列，以及之后的所有模型。此功能还与视觉输入兼容，并且可以在 chat.completion API、助手 API 和批处理 API 上使用。结构化输出的引入有助于开发人员构建更可靠的 AI 应用程序，并且可以节省输入输出费用。

简单看一下示例：

1、Function Calling:

POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant. The current date is August 6, 2024. You help users query for the data they are looking for by calling the query function."
    },
    {
      "role": "user",
      "content": "look up all my orders in may of last year that were fulfilled but not delivered on time"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "query",
        "description": "Execute a query.",
        "strict": true,
        "parameters": {
          "type": "object",
          "properties": {
            "table_name": {
              "type": "string",
              "enum": ["orders"]
            },
            "columns": {
              "type": "array",
              "items": {
                "type": "string",
                "enum": [
                  "id",
                  "status",
                  "expected_delivery_date",
                  "delivered_at",
                  "shipped_at",
                  "ordered_at",
                  "canceled_at"
                ]
              }
            },
            "conditions": {
              "type": "array",
              "items": {
                "type": "object",
                "properties": {
                  "column": {
                    "type": "string"
                  },
                  "operator": {
                    "type": "string",
                    "enum": ["=", ">", "<", ">=", "<=", "!="]
                  },
                  "value": {
                    "anyOf": [
                      {
                        "type": "string"
                      },
                      {
                        "type": "number"
                      },
                      {
                        "type": "object",
                        "properties": {
                          "column_name": {
                            "type": "string"
                          }
                        },
                        "required": ["column_name"],
                        "additionalProperties": false
                      }
                    ]
                  }
                },
                "required": ["column", "operator", "value"],
                "additionalProperties": false
              }
            },
            "order_by": {
              "type": "string",
              "enum": ["asc", "desc"]
            }
          },
          "required": ["table_name", "columns", "conditions", "order_by"],
          "additionalProperties": false
        }
      }
    }
  ]
}

格式化输出：

{
  "table_name": "orders",
  "columns": ["id", "status", "expected_delivery_date", "delivered_at"],
  "conditions": [
    {
      "column": "status",
      "operator": "=",
      "value": "fulfilled"
    },
    {
      "column": "ordered_at",
      "operator": ">=",
      "value": "2023-05-01"
    },
    {
      "column": "ordered_at",
      "operator": "<",
      "value": "2023-06-01"
    },
    {
      "column": "delivered_at",
      "operator": ">",
      "value": {
        "column_name": "expected_delivery_date"
      }
    }
  ],
  "order_by": "asc"
}

2、response_format参数方式：

POST /v1/chat/completions
{
  "model": "gpt-4o-2024-08-06",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful math tutor."
    },
    {
      "role": "user",
      "content": "solve 8x + 31 = 2"
    }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "math_response",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "steps": {
            "type": "array",
            "items": {
              "type": "object",
              "properties": {
                "explanation": {
                  "type": "string"
                },
                "output": {
                  "type": "string"
                }
              },
              "required": ["explanation", "output"],
              "additionalProperties": false
            }
          },
          "final_answer": {
            "type": "string"
          }
        },
        "required": ["steps", "final_answer"],
        "additionalProperties": false
      }
    }
  }
}

格式化输出：

{
  "steps": [
    {
      "explanation": "Subtract 31 from both sides to isolate the term with x.",
      "output": "8x + 31 - 31 = 2 - 31"
    },
    {
      "explanation": "This simplifies to 8x = -29.",
      "output": "8x = -29"
    },
    {
      "explanation": "Divide both sides by 8 to solve for x.",
      "output": "x = -29 / 8"
    }
  ],
  "final_answer": "x = -29 / 8"
}

最后再来看一下当前世面上的一些格式化输出框架：