从零开始:用Go和LlamaHub构建语义搜索应用
90
10 5 月, 2025
4 分钟阅读
0 阅读
从零开始:用Go和LlamaHub构建语义搜索应用
引言
语义搜索是现代应用中越来越重要的功能,它不仅能理解字面意思,还能理解查询背后的意图。本文将带你从零开始,使用Go语言和LlamaHub构建一个简单的语义搜索应用。即使你是Go新手或第一次接触语义搜索,也能跟着教程完成项目。
准备工作
环境要求
- Go 1.18+ 安装
- LlamaHub API密钥(可在官网注册获取)
- 基础的Go开发环境(推荐VS Code或Goland)
安装必要依赖
代码片段
go get github.com/sashabaranov/go-openai
go get github.com/joho/godotenv
第一步:设置项目结构
创建项目目录结构:
代码片段
/semantic-search
|- /config
|- config.go
|- /handlers
|- search.go
|- main.go
|- .env
初始化Go模块:
代码片段
go mod init semantic-search
第二步:配置LlamaHub客户端
在config/config.go
中添加:
代码片段
package config
import (
"os"
"github.com/sashabaranov/go-openai"
)
func NewLlamaClient() *openai.Client {
apiKey := os.Getenv("LLAMAHUB_API_KEY")
if apiKey == "" {
panic("LLAMAHUB_API_KEY not set in .env file")
}
config := openai.DefaultConfig(apiKey)
// LlamaHub的特殊端点配置(根据官方文档调整)
config.BaseURL = "https://api.llamahub.ai/v1"
return openai.NewClientWithConfig(config)
}
在.env
文件中添加:
代码片段
LLAMAHUB_API_KEY=your_api_key_here
第三步:实现语义搜索功能
在handlers/search.go
中添加:
代码片段
package handlers
import (
"context"
"fmt"
"github.com/sashabaranov/go-openai"
)
type SearchRequest struct {
Query string `json:"query"`
Candidates []string `json:"candidates"`
}
func SemanticSearch(ctx context.Context, client *openai.Client, req SearchRequest) ([]string, error) {
// 准备嵌入请求
var items []openai.EmbeddingRequestString
for _, candidate := range req.Candidates {
items = append(items, openai.EmbeddingRequestString{
String: candidate,
})
}
// 获取候选文本的嵌入向量
resp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
Input: items,
Model: openai.LargeEmbedding3,
})
if err != nil {
return nil, fmt.Errorf("failed to create embeddings: %v", err)
}
if len(resp.Data) != len(req.Candidates) {
return nil, fmt.Errorf("embedding count mismatch")
}
candidateEmbeddings := make([][]float32, len(resp.Data))
for i, data := range resp.Data {
candidateEmbeddings[i] = data.Embedding
}
// 获取查询的嵌入向量
qResp, err := client.CreateEmbeddings(ctx, openai.EmbeddingRequest{
Input: []openai.EmbeddingRequestString{{String: req.Query}},
Model: openai.LargeEmbedding3,
})
if err != nil || len(qResp.Data) == 0 {
return nil, fmt.Errorf("failed to create query embedding: %v", err)
}
queryEmbedding := qResp.Data[0].Embedding
// 计算余弦相似度并排序结果(简单实现)
type scorePair struct {
text string
score float32
}
var scored []scorePair
for i, emb := range candidateEmbeddings {
similarity := cosineSimilarity(queryEmbedding, emb)
scored = append(scored, scorePair{
text: req.Candidates[i],
score: similarity,
})
}
// 按相似度降序排序
sort.Slice(scored, func(i, j int) bool {
return scored[i].score > scored[j].score
})
// 提取排序后的文本结果
var results []string
for _, pair := range scored {
results = append(results, pair.text)
}
return results[:min(5, len(results))], nil //返回前5个结果
}
// helper function - cosine similarity calculation
func cosineSimilarity(a, b []float32) float32 {
var dotProduct float32
var normA float32
var normB float32
for i := range a {
dotProduct += a[i] * b[i]
normA += a[i] * a[i]
normB += b[i] * b[i]
}
if normA == 0 || normB ==0 {
return0
}
return dotProduct / (float32(math.Sqrt(float64(normA))) * float32(math.Sqrt(float64(normB))))
}
func min(a,b int) int { if a <b { returna }; returnb }
第四步:创建主程序
在main.go
中添加:
代码片段
package main
import (
"context"
"fmt"
"log"
"os"
"semantic-search/config"
"semantic-search/handlers"
"github.com/joho/godotenv"
)
func main() {
// Load environment variables from .env file
if err := godotenv.Load(); err !=nil {
log.Fatal("Error loading .env file")
}
client:= config.NewLlamaClient()
ctx:= context.Background()
// Example usage
candidates:= []string{
"Go is an open source programming language developed by Google",
"Python is a popular high-level programming language",
"Rust is a systems programming language focused on safety and performance",
"JavaScript is the scripting language for Web pages",
}
result ,err:= handlers.SemanticSearch(ctx ,client ,handlers.SearchRequest{
Query:"What language should I learn for web development?",
Candidates:candidates ,
})
if err!=nil {
log.Fatal(err)
}
fmt.Println("Top results for your query:")
for i ,res:= range result {
fmt.Printf("%d. %s\n" ,i+1 ,res)
}
}
第五步:运行和测试
运行程序:
代码片段
go run main.go
预期输出示例:
代码片段
Top results for your query:
1. JavaScript is the scripting language for Web pages
2. Python is a popular high-level programming language
3. Go is an open source programming language developed by Google
4. Rust is a systems programming language focused on safety and performance
进阶优化建议
- 缓存嵌入向量:对频繁查询的文本缓存其嵌入向量,减少API调用次数。
- 批处理:当有大量候选文本时,可以分批处理以避免超出API限制。
- 错误处理增强:添加重试机制应对API限流或临时故障。
- 性能优化:对于大规模数据集,考虑使用专门的向量数据库如Pinecone或Milvus。
常见问题解决
-
API限制错误:
- LlamaHub可能有速率限制,建议添加延迟或使用批处理请求。
429 Too Many Requests
错误时,等待一段时间后重试。
-
嵌入维度不匹配:
- Ensure all embeddings are generated using the same model (
text-embedding-3-large
in our example).
- Ensure all embeddings are generated using the same model (
-
环境变量问题:
- If you get API key errors, verify
.env
file is in the correct location and properly formatted.
- If you get API key errors, verify
-
余弦相似度计算优化:
- For production use consider optimized libraries like
gonum
for vector operations.
- For production use consider optimized libraries like
总结
通过本教程,我们完成了以下工作:
- Set up a Go project with LlamaHub integration.
- Implemented text embedding generation using the LlamaHub API.
- Created a semantic search function that ranks texts by relevance to a query.
- Built a simple command-line application to demonstrate semantic search.
完整的项目代码可以在GitHub仓库找到。你可以扩展这个基础实现来构建更复杂的搜索应用,如文档检索系统、问答机器人等。