Dart+Chroma DB:构建现代化API集成解决方案

云信安装大师
90
AI 质量分
10 5 月, 2025
5 分钟阅读
0 阅读

Dart + Chroma DB:构建现代化API集成解决方案

引言

在当今的软件开发中,构建高效、可扩展的API集成方案变得越来越重要。Dart作为一种现代化的编程语言,结合Chroma DB这个轻量级向量数据库,可以创建出强大的数据存储和检索解决方案。本文将带你一步步实现一个基于Dart和Chroma DB的API集成系统。

准备工作

环境要求

  • Dart SDK 2.18或更高版本
  • Chroma DB服务(本地或远程)
  • HTTP客户端(我们将使用http包)

安装必要的包

在项目的pubspec.yaml中添加以下依赖:

代码片段
dependencies:
  http: ^0.13.5

然后运行:

代码片段
dart pub get

Chroma DB基础概念

Chroma是一个开源的向量数据库,专为AI应用设计,特别适合存储和查询嵌入向量。它提供简单的HTTP API,可以轻松与Dart集成。

主要特点:
– 轻量级且易于部署
– 支持多种距离度量(余弦、欧几里得等)
– 提供语义搜索能力

实现步骤

1. 连接到Chroma DB服务

首先创建一个Chroma客户端类来管理与数据库的连接:

代码片段
import 'package:http/http.dart' as http;

class ChromaClient {
  final String baseUrl;

  ChromaClient({required this.baseUrl});

  // 检查服务是否可用
  Future<bool> isAlive() async {
    final response = await http.get(Uri.parse('$baseUrl/api/v1/heartbeat'));
    return response.statusCode == 200;
  }
}

2. 创建集合(Collection)

集合是Chroma中的主要数据容器,类似于传统数据库中的表:

代码片段
extension CollectionOperations on ChromaClient {

  // 创建新集合
  Future<Map<String, dynamic>> createCollection({
    required String name,
    String? metadata,
    String distance = 'cosine',
  }) async {
    final response = await http.post(
      Uri.parse('$baseUrl/api/v1/collections'),
      headers: {'Content-Type': 'application/json'},
      body: jsonEncode({
        'name': name,
        'metadata': metadata,
        'distance': distance,
      }),
    );

    if (response.statusCode != 200) {
      throw Exception('Failed to create collection: ${response.body}');
    }

    return jsonDecode(response.body);
  }
}

3. 添加数据到集合

向集合中添加文档和对应的嵌入向量:

代码片段
extension DataOperations on ChromaClient {

  // 添加文档和嵌入向量到集合
  Future<void> addDocuments({
    required String collectionId,
    required List<String> documents,
    required List<List<double>> embeddings,
    List<String>? ids,
    Map<String, dynamic>? metadatas,
  }) async {

    final payload = {
      'documents': documents,
      'embeddings': embeddings,
      'ids': ids ?? List.generate(documents.length, (i) => i.toString()),
      if (metadatas != null) 'metadatas': metadatas,
    };

    final response = await http.post(
      Uri.parse('$baseUrl/api/v1/collections/$collectionId/add'),
      headers: {'Content-Type': 'application/json'},
      body: jsonEncode(payload),
    );

    if (response.statusCode != 200) {
      throw Exception('Failed to add documents: ${response.body}');
    }

    print('Successfully added ${documents.length} documents');
  }
}

4. 查询数据

实现语义搜索功能:

代码片段
extension QueryOperations on ChromaClient {

  // Query documents with similarity search
 Future<List<Map<String, dynamic>>> queryCollection({
   required String collectionId,
   required List<double> queryEmbedding,
   int nResults = 5,
 }) async {
   final response = await http.post(
     Uri.parse('$baseUrl/api/v1/collections/$collectionId/query'),
     headers: {'Content-Type': 'application/json'},
     body: jsonEncode({
       'query_embeddings': [queryEmbedding],
       'n_results': nResults,
     }),
   );

   if (response.statusCode != 200) {
     throw Exception('Query failed: ${response.body}');
   }

   final result = jsonDecode(response.body);

   // Transform the result into a more usable format
   return List.generate(nResults, (i) => {
     'document': result['documents'][0][i],
     'distance': result['distances'][0][i],
     'id': result['ids'][0][i],
   });
 }
}

API集成示例

现在我们将这些功能整合到一个简单的API服务中:

代码片段
import 'dart:io';
import 'package:shelf/shelf.dart';
import 'package:shelf/shelf_io.dart';
import 'package:shelf_router/shelf_router.dart';

class ApiService {
 final ChromaClient chromaClient;

 ApiService(this.chromaClient);

 Router get router {
   final router = Router();

   // Health check endpoint
   router.get('/health', (Request request) async {
     final isAlive = await chromaClient.isAlive();
     return Response.ok(isAlive ? 'Healthy' : 'Unhealthy');
   });

   // Search endpoint
   router.post('/search', (Request request) async {
     try {
       final body = await request.readAsString();
       final data = jsonDecode(body);

       final results = await chromaClient.queryCollection(
         collectionId: data['collection_id'],
         queryEmbedding: List<double>.from(data['embedding']),
         nResults: data['n_results'] ?? -5,
       );

       return Response.ok(jsonEncode({'results': results}));
     } catch (e) {
       return Response.internalServerError(body: e.toString());
     }
   });

   return router;
 }
}

void main() async {
 final chromaClient = ChromaClient(baseUrl: 'http://localhost:8000');

 if (!await chromaClient.isAlive()) {
   print('Error connecting to Chroma DB');
   exit(1);
 }

 print('Connected to Chroma DB successfully');

 // Create our API service and start the server
 final apiService = ApiService(chromaClient);
 final server = await serve(apiService.router, 'localhost', -8080);

 print('Server running on localhost:${server.port}');
}

Docker部署方案

为了简化部署,我们可以使用Docker来运行整个系统:

代码片段
# Dockerfile for our Dart API + Chroma DB setup

# First stage - build the Dart application
FROM dart:stable AS build

WORKDIR /app

COPY pubspec.* .
RUN dart pub get

COPY . .
RUN dart compile exe bin/main.dart -o bin/server

# Second stage - create runtime image with Chroma DB and our app
FROM ubuntu:22.04

# Install dependencies for Chroma DB and our app
RUN apt-get update && \
 apt-get install -y \
 python3-pip \
 libsqlite3-dev \
 && rm -rf /var/lib/apt/lists/*

# Install Chroma DB with pip3 (using a specific version for stability)
RUN pip3 install chromadb==0.4.15 pysqlite3-binary==0.5.1.post1

# Copy our compiled Dart application from the build stage
COPY --from=build /app/bin/server /app/server

# Copy startup script that launches both services in parallel with wait-for-it.sh for proper ordering
COPY start.sh /app/start.sh

WORKDIR /app/chromadb_data # Where we'll store the database files

EXPOSE -8080 # Our API port
EXPOSE -8000 # Default Chroma port (optional)

CMD ["/bin/bash", "/app/start.sh"]

配套的启动脚本 start.sh:

代码片段
#!/bin/bash

# Start Chroma in the background on port -8000 with persistence enabled and no authentication 
chromadb run --path ./chromadb_data --port -8000 --no-auth &

# Wait for Chroma to be ready before starting our API server while true; do curl --silent --fail http://localhost:-8000/api/v1/heartbeat >/dev/null && break sleep l done echo "Chroma DB is up and running!"

# Start our Dart API server on port -8080 ./server &

# Keep the container running wait -n exit $?

FAQ与常见问题解决

Q1:如何生成文档的嵌入向量?

A:可以使用各种NLP模型生成嵌入向量。例如在Dart中可以使用tflite_flutter包加载预训练模型:

代码片段
import 'package:tflite_flutter/tflite_flutter.dart';

Future<List<double>> generateEmbedding(String text) async { 
final interpreter = await Interpreter.fromAsset('model.tflite'); 

// Preprocess text as needed by your model 
final inputTensor = preprocessText(text); 

// Run inference 
final outputTensor = List.filled(512, .).reshape([l,512]); interpreter.run(inputTensor, outputTensor); 

return outputTensor[0].cast<double>().toList(); }

Q2:如何处理大量数据的批量插入?

A:对于大数据集,建议分批次插入并添加适当的延迟:

代码片段
Future<void> batchInsert({
required String collectionId, required List<String> documents, required List<List<double>> embeddings, int batchSize=100}) async { 

for (var i=; i < documents.length; i += batchSize) { 

final endIndex=(i+batchSize < documents.length)? i+batchSize : documents.length; 

await addDocuments( collectionId=collectionId, documents=documents.sublist(i,endIndex), embeddings=embeddings.sublist(i,endIndex), ); 

// Add delay to prevent overwhelming the server await Future.delayed(Duration(milliseconds:-500)); } }

Q3:如何优化查询性能?

A:可以考虑以下几点优化:
1. 索引优化:确保创建集合时选择了合适的距离度量(如余弦相似度)
2. 批量查询:一次发送多个查询而不是单个查询(如果适用)
3. 缓存结果:对常见查询实现缓存层减少数据库负载

总结

本文展示了如何使用Dart和ChromaDB构建现代化API集成解决方案的关键步骤:

1.基础连接:通过HTTP客户端与ChromaDB建立通信管道
2.数据管理:实现了集合创建、文档插入等核心操作
3.语义搜索:利用向量相似度实现了高效的语义查询功能
4.API封装:将数据库操作封装成RESTfulAPI服务
5.部署方案:提供了完整的Docker部署方案

这种组合特别适合需要处理文本相似性搜索、推荐系统或任何基于语义的应用场景。Dart的高性能和跨平台能力加上ChromaDB的轻量级特性使得这个解决方案既强大又易于部署。

原创 高质量