从零开始:用Go和LlamaHub构建语义搜索应用

云信安装大师
90
AI 质量分
10 5 月, 2025
4 分钟阅读
0 阅读

从零开始:用Go和LlamaHub构建语义搜索应用

引言

语义搜索是现代应用中越来越重要的功能,它不仅能理解字面意思,还能理解查询背后的意图。本文将带你从零开始,使用Go语言和LlamaHub构建一个简单的语义搜索应用。即使你是Go新手或第一次接触语义搜索,也能跟着教程完成项目。

准备工作

环境要求

  1. Go 1.18+ 安装
  2. LlamaHub API密钥(可在官网注册获取)
  3. 基础的Go开发环境(推荐VS Code或Goland)

安装必要依赖

代码片段
go get github.com/sashabaranov/go-openai
go get github.com/joho/godotenv

第一步:设置项目结构

创建项目目录结构:

代码片段
/semantic-search
  |- /config
      |- config.go
  |- /handlers
      |- search.go
  |- main.go
  |- .env

初始化Go模块:

代码片段
go mod init semantic-search

第二步:配置LlamaHub客户端

config/config.go中添加:

代码片段
package config

import (
    "os"
    "github.com/sashabaranov/go-openai"
)

func NewLlamaClient() *openai.Client {
    apiKey := os.Getenv("LLAMAHUB_API_KEY")
    if apiKey == "" {
        panic("LLAMAHUB_API_KEY not set in .env file")
    }

    config := openai.DefaultConfig(apiKey)
    // LlamaHub的特殊端点配置(根据官方文档调整)
    config.BaseURL = "https://api.llamahub.ai/v1"

    return openai.NewClientWithConfig(config)
}

.env文件中添加:

代码片段
LLAMAHUB_API_KEY=your_api_key_here

第三步:实现语义搜索功能

handlers/search.go中添加:

代码片段
package handlers

import (
    "context"
    "fmt"

    "github.com/sashabaranov/go-openai"
)

type SearchRequest struct {
    Query      string   `json:"query"`
    Candidates []string `json:"candidates"`
}

func SemanticSearch(ctx context.Context, client *openai.Client, req SearchRequest) ([]string, error) {
    // 准备嵌入请求
    var items []openai.EmbeddingRequestString

    for _, candidate := range req.Candidates {
        items = append(items, openai.EmbeddingRequestString{
            String: candidate,
        })
    }

    // 获取候选文本的嵌入向量
    resp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
        Input: items,
        Model: openai.LargeEmbedding3,
    })
    if err != nil {
        return nil, fmt.Errorf("failed to create embeddings: %v", err)
    }

    if len(resp.Data) != len(req.Candidates) {
        return nil, fmt.Errorf("embedding count mismatch")
    }

    candidateEmbeddings := make([][]float32, len(resp.Data))
    for i, data := range resp.Data {
        candidateEmbeddings[i] = data.Embedding
    }

    // 获取查询的嵌入向量
    qResp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
        Input: []openai.EmbeddingRequestString{{String: req.Query}},
        Model: openai.LargeEmbedding3,
    })
    if err != nil || len(qResp.Data) == 0 {
        return nil, fmt.Errorf("failed to create query embedding: %v", err)
    }

    queryEmbedding := qResp.Data[0].Embedding

    // 计算余弦相似度并排序结果(简单实现)
    type scorePair struct {
        text   string
        score float32
    }

    var scored []scorePair

    for i, emb := range candidateEmbeddings {
        similarity := cosineSimilarity(queryEmbedding, emb)
        scored = append(scored, scorePair{
            text: req.Candidates[i],
            score: similarity,
        })
    }

    // 按相似度降序排序
    sort.Slice(scored, func(i, j int) bool {
        return scored[i].score > scored[j].score 
    })

    // 提取排序后的文本结果
    var results []string

    for _, pair := range scored {
        results = append(results, pair.text)
    }

    return results[:min(5, len(results))], nil //返回前5个结果 
}

// helper function - cosine similarity calculation 
func cosineSimilarity(a, b []float32) float32 { 
    var dotProduct float32 
    var normA float32 
    var normB float32 

    for i := range a { 
        dotProduct += a[i] * b[i] 
        normA += a[i] * a[i] 
        normB += b[i] * b[i] 
    } 

    if normA == 0 || normB ==0 { 
        return0  
    } 

    return dotProduct / (float32(math.Sqrt(float64(normA))) * float32(math.Sqrt(float64(normB))))  
} 

func min(a,b int) int { if a <b { returna }; returnb }  

第四步:创建主程序

main.go中添加:

代码片段
package main

import (
 "context"
 "fmt"
 "log"
 "os"

 "semantic-search/config"
 "semantic-search/handlers"

 "github.com/joho/godotenv"
)

func main() {
 // Load environment variables from .env file  
 if err := godotenv.Load(); err !=nil {  
     log.Fatal("Error loading .env file")  
 }  

 client:= config.NewLlamaClient()  

 ctx:= context.Background()  

 // Example usage  
 candidates:= []string{  
     "Go is an open source programming language developed by Google",  
     "Python is a popular high-level programming language",  
     "Rust is a systems programming language focused on safety and performance",  
     "JavaScript is the scripting language for Web pages",  
 }  

 result ,err:= handlers.SemanticSearch(ctx ,client ,handlers.SearchRequest{  
     Query:"What language should I learn for web development?",  
     Candidates:candidates ,  
 })  

 if err!=nil {  
     log.Fatal(err)  
 }  

 fmt.Println("Top results for your query:")  

 for i ,res:= range result {  
     fmt.Printf("%d. %s\n" ,i+1 ,res)  
 }  
}   

第五步:运行和测试

运行程序:

代码片段
go run main.go

预期输出示例:

代码片段
Top results for your query:
1. JavaScript is the scripting language for Web pages   
2. Python is a popular high-level programming language   
3. Go is an open source programming language developed by Google   
4. Rust is a systems programming language focused on safety and performance   

进阶优化建议

  1. 缓存嵌入向量:对频繁查询的文本缓存其嵌入向量,减少API调用次数。
  2. 批处理:当有大量候选文本时,可以分批处理以避免超出API限制。
  3. 错误处理增强:添加重试机制应对API限流或临时故障。
  4. 性能优化:对于大规模数据集,考虑使用专门的向量数据库如Pinecone或Milvus。

常见问题解决

  1. API限制错误

    • LlamaHub可能有速率限制,建议添加延迟或使用批处理请求。
    • 429 Too Many Requests错误时,等待一段时间后重试。
  2. 嵌入维度不匹配

    • Ensure all embeddings are generated using the same model (text-embedding-3-large in our example).
  3. 环境变量问题

    • If you get API key errors, verify .env file is in the correct location and properly formatted.
  4. 余弦相似度计算优化

    • For production use consider optimized libraries like gonum for vector operations.

总结

通过本教程,我们完成了以下工作:

  1. Set up a Go project with LlamaHub integration.
  2. Implemented text embedding generation using the LlamaHub API.
  3. Created a semantic search function that ranks texts by relevance to a query.
  4. Built a simple command-line application to demonstrate semantic search.

完整的项目代码可以在GitHub仓库找到。你可以扩展这个基础实现来构建更复杂的搜索应用,如文档检索系统、问答机器人等。

原创 高质量