Chroma —— 开源嵌入式数据库
Chroma 是一个开源嵌入式数据库,宣称是使用内存构建 Python 或 JavaScript LLM 应用程序的最快方法。
特点:
- 简单:完全类型化、完全测试、完全文档化
- 集成:LangChain (Python 和 JS)、LlamaIndex 等
- 开发、测试、生产:在 Python notebook 中运行相同 API,可以扩展到你的集群。
- 功能丰富:查询、过滤、密度估计等。
- 免费和开源:Apache 2.0
pip install chromadb # python client # for javascript, npm install chromadb! # for client-server mode, docker-compose up -d --build
核心 API 仅包含 4 个函数:
import chromadb # setup Chroma in-memory, for easy prototyping. Can add persistence easily! client = chromadb.Client() # Create collection. get_collection, get_or_create_collection, delete_collection also available! collection = client.create_collection("all-my-documents") # Add docs to the collection. Can also update and delete. Row-based API coming soon! collection.add( documents=["This is document1", "This is document2"], # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well metadatas=[{"source": "notion"}, {"source": "google-docs"}], # filter on these! ids=["doc1", "doc2"], # unique for each doc ) # Query/search 2 most similar results. You can also .get by id results = collection.query( query_texts=["This is a query document"], n_results=2, # where={"metadata_field": "is_equal_to_this"}, # optional filter # where_document={"$contains":"search_string"} # optional filter )