您现在的位置是:首页 > 文章详情

国产算力筑基,API为刃:大模型应用的积木式创新时代已至

日期:2025-08-13点击:97

让API成为创新的积木:当大语言模型调度视频生成引擎,文生图无缝衔接语音合成,多模态API的创造性编排正催生真正的AI原生应用——而国产算力与模型的联合突围,为这场变革铺就坚实基石。

精妙积木之上,方有巍峨大厦。本文将从基础API调用到智能体架构设计逐层解构,助您快速构建AI应用后端。即使无大模型开发经验,仅需数行代码即可实现专属上层应用,让创新变得触手可及;

我们将逐步演示API调用范式——其便捷性堪比调用英伟达算力资源。不仅提供技术方案与算力支持,也期待大模型应用开发者们,以API为砖石,在国产算力的地基上,共同筑就下一代智能体的智慧穹顶。

AI写作:文学创作的原子化重构

人工智能应用虽浩如烟海,其核心能力可凝练为七大范式:LLM、VLM、T2I、T2V、RAG、搜广推及音频处理。本文聚焦OpenAI兼容接口,以最简代码构建AI应用引擎。

AI写作是目前最常用也是最基本的工具,基于API方式调用流程,只需要简单的几行代码就可以快速搭建一个LLM应用的后端,可以按照以下调用方式进行调用:

def test_simple_story_generation(model_name: str):
    """Test a simple story generation request"""
    try:
        client = OpenAI(
            base_url="https://ai.gitee.com/v1",
            api_key="XXXXXXXXXXXXXXXXXXXXXXXXXXX",
        )
 
        response = client.chat.completions.create(
            messages=[
                {
                    "role": "system",
                    "content": "You are a professional literary writer engaged in professional writing, excelling in areas such as prose, poetry, fiction and music criticism. You should think step-by-step."
                },
                {
                    "role": "user",
                    "content": "Write a very short report about AI hisory."
                }
            ],
            model=model_name,
            stream=False,
        )
 
        data = response.json()
        logger.info("Response: {}".format(json.dumps(data, ensure_ascii=False, indent=4)))
 
    except Exception as e:
        print(f"❌ Error testing story generation: {e}")

通过Open AI API 的调用方式,其中模型标识符,可以选择用于文本生成的大模型;

messages包含到目前为止的会话集合,role为模型指定要遵循的指令或上下文,如定义语气、风格和其他行为指导,content表示用户消息内容的主题内容;

stream是否启用流式输出,默认为 true。启用流式响应传输,使模型生成的回复内容以增量形式(逐 token 或逐数据块)实时推送至客户端,而非等待完整回复生成后一次性传输;

基于上面的代码就可以构建一个AI写作的后端,定义URL设置需要的API Key,选取恰当并且支持的模型就可以在终端中输出大模型的构思的文字创作;

进阶部署,当前有很多开源的优秀框架比如WriteHERO等,只需要对llm的后端API URL进行适当替换就可以无缝迁移至国产算力平台,当然这需要一定的code开发和调试的经验;

AI绘画:梦境可视化引擎

AI绘画工具是非常实用,有趣的应用基本上可以为任何一个人定制向别人分享自己的梦境所见的能力,用户只需向大模型描述你的梦境所见,就可以得到的你的梦境图像,而且可以定制梦境图像的风格可以是超现实主义、油画风格、水彩画、动漫风格、写诗风格、赛博朋克等;

def generate_dream_image(model_name:str, description: str, style: str = "surreal", resolution: str = "1024x1024"):
    os.makedirs("dream_images", exist_ok=True)
 
    try:
        enhanced_prompt = (
            f"Surrealist Dream Illustration, {style} style,"
            f"Describe the content: {description},"
            "High detail, dreamy colors, strong contrast of light and shadow, high resolution."
        )          
         
        response = client.images.generate(
            model=model_name,
            prompt=enhanced_prompt,
            size=resolution,
        )
 
        image_base64 = response.data[0].b64_json
        image_bytes = base64.b64decode(image_base64)
 
        filename = f"dream_{uuid.uuid4().hex}.png"
        image_path = f"dream_images/{filename}"
        with open(image_path, "wb") as f:
            f.write(image_bytes)
 
    except Exception as e:
        print(f"\n❌ Image generation failed: {str(e)}")

通过Open AI API 的调用方式,其中模型标识符,可以选择用于生成图像的大模型;

promt提示文本,用于生成图像的描述信息;

size生成图像的尺寸,格式:宽x高;支持512 x 512 至 1024 x 1024 多级输出;

基于上面的代码就可以构建一个AI梦境生成器的后端,定义URL设置需要的API Key,选取恰当并且支持的模型就可以在dream_images文件夹下生成梦境图像,实现创作自由;

RAG引擎:知识熔炉

检索增强生成(RAG)通过文档检索与信息融合,突破大模型知识边界,构建精准知识服务体系。RAG(全称Retrieval-Augmented Generation)技术是指在回答问题或者生成文本时,先从文档库中检索相关信息,然后利用这些检索到的信息来生成响应文本,从而提高预测质量。

检索增强生成由三个基本组件组成:Embedding组件、检索组件、文本生成组件,两个主要流程:检索库生成、在线文本生成。

RAG主要的步骤可涵盖为:

数据准备阶段,知识预处理其中资料可以是来自pdf、word、txt、excel等,下文介绍两种常用的格式pdf和txt;

def chunks_from_pdf(pdf_directory):
    """
    Chunks all pdfs from a directory
    :param pdf_directory: directory of pdfs
    :return: list of chunks
    """
    # fetching all pdfs from the directory and storing them as strings in a list
    docs = []
    for file in glob.glob(pdf_directory + "/*.pdf"):
        loader = PyPDFLoader(file)
        doc = loader.load()
        docs.extend(doc)
 
    # split texts into chunks with overlap
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=500, chunk_overlap=100)
    splits = splitter.split_documents(docs)
    return splits
 
def chunks_from_text(text_directory):
    """
    Chunks all text files from a directory
    :param text_directory: directory of text files
    :return: list of chunks
    """
    # fetch all txt files from the firectory and store them in a list
    loader = DirectoryLoader(text_directory, loader_cls=TextLoader)  # , glob="**/*.txt")
    docs = loader.load()
 
    # split texts into chunks with overlap
    splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=500, chunk_overlap=100)
    splits = splitter.split_documents(docs)
 
    return splits
 
def chunking(data_directory):
    """
    Automatically calls the correct chunking function, either for pdfs or for txt files
    :param data_directory: directory of data, either ../pdf or ../text
    :return: result from the corresponding chunking function
    """
    if data_directory == "../pdf":
        return chunks_from_pdf(data_directory)
    else:
        return chunks_from_text(data_directory)

国产向量化引擎,定制文本向量化Embedding阶段,定义一个EmbeddingAgent的class选取相应的向量化模型用于私域资料的向量化过程;

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import DirectoryLoader
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import Chroma
from langchain_core.embeddings import Embeddings
 
 
class EmbeddingAgent(Embeddings):
    def __init__(self, model_name: str = "bge-m3"):
        self.api_url = API_URL
        self.model_name = model_name
 
    def embed_documents(self, texts: List[str]) -> List[List[float]]:
        non_empty_texts = [text for text in texts if text.strip()]
        if not non_empty_texts:
            return [[] for _ in texts]
 
        batch_size = 3
        embeddings = []
 
        for i in range(0, len(non_empty_texts), batch_size):
            batch = texts[i:i+batch_size]
 
            response = client.embeddings.create(
                input=batch[i],
                model=self.model_name,
            )
 
            data = response.data
 
            for i in range(len(data)):
                embeddings.append(data[i].embedding)
 
        return embeddings
 
    def embed_query(self, text: str) -> List[float]:
        return self.embed_documents([text])[0]

数据检索流程,定义语义向量库的构建和检索过程

def create_vector_store(db_directory, chunks, embedding):
    if not chunks:
        return None
 
    # create vector store and index
    vectorstore = Chroma.from_documents(
        documents=chunks,
        collection_name="rag_collection",
        embedding=embedding,
        persist_directory=db_directory
    )
 
    return vectorstore.as_retriever()
 
def fetch_vector_store(db_directory, embedding):
    print("Fetching vector store")
    vectorstore = Chroma(
        collection_name="chromemwah",
        embedding_function=embedding,
        persist_directory=db_directory
    )
    return vectorstore.as_retriever()
 
def retrieve(retrieving, question):
    documents = retrieving.get_relevant_documents(question)
    return documents

最终的文本生成阶段,定义国产化ChatAgent选取相应的模型进行文本生成;

class ChatAgent:
    def __init__(self, model_name: str):
        self.api_url = API_URL
        self.model_name = model_name
 
    def invoke(self, messages: List[Dict[str, str]]) -> str:
        try:
            extra_headers = {}
            completion = client.chat.completions.create(
                extra_headers=extra_headers,
                model=self.model_name,
                messages=messages,
                stream=True,
                temperature=0.6,
            )
            for chunk in completion:
                if len(chunk.choices) != 0:
                    delta = chunk.choices[0].delta
                    # If is thinking content, print it in gray
                    if hasattr(delta, 'reasoning_content') and delta.reasoning_content:
                        fullResponse += delta.reasoning_content
                    elif delta.content:
                        fullResponse += delta.content
 
            # Format response to match expected output
            result = [{
                "message": {
                    "content": fullResponse
                }
            }]
 
        except Exception as e:
            logger.error(f"Error with Gitee API: {e}")
            raise
 
        return result[0]['message']['content']
 
def generate(question, context, llm_model):
    system_prompt = (
        "You are an assistant for question-answering tasks. Use the "
        "following pieces of retrieved context to answer the question. If "
        "you don't know the answer, just say that you don't know. Keep the "
        "answer concise, truthful, and informative. If you decide to use a "
        "source, you must mention in which document you found specific "
        "information. Sources are indicated in the context by "
        "[doc<doc_number>]."
    )
    user_prompt = f"Question: {question} \nContext: {context}"
 
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]
 
    return ChatAgent.invoke(messages)

通过以上的步骤可以构建一个基础的rag应用,流程较简单,当前也有很多优秀的RAG开源框架如Langchain-Chatchat等,只需要对llm的后端API URL进行适当替换就可以无缝构建金融/医疗等领域知识引擎,当然这需要一些code开发和调试的经验;

总结

近年来AI应用层出不穷,千变万化,其本质就是各类的模型的搭配使用,当Multi-Agent架构将复杂任务解耦为智能体协作网络,API的积木式创新进入新纪元;我们期待大模型应用开发者们可以串联更多大模型应用接口,基于国产算力开发更多具有创新性、实用性的应用产品。

积木已成,大厦将起,当每块API积木在国产算力的基座上精准咬合,我们终将筑起智能时代的巴别塔。

 
原文链接:https://my.oschina.net/u/4806939/blog/18688044
关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有,本站原创内容转载请注明来源。

文章评论

共有0条评论来说两句吧...

文章二维码

扫描即可查看该文章

点击排行

推荐阅读

最新文章