Text Generation Inference与Ruby结合：打造强大的API集成系统

引言

在现代应用开发中，文本生成(Text Generation)已成为许多场景的核心需求，如聊天机器人、内容创作辅助、代码自动补全等。Hugging Face的Text Generation Inference(TGI)是一个高性能的推理解决方案，而Ruby作为一门优雅的编程语言，非常适合构建API服务。本文将带你一步步实现TGI与Ruby的集成，打造一个强大的文本生成API系统。

准备工作

环境要求

Ruby 2.7+ (推荐3.0+)
Docker (用于运行TGI服务)
curl (用于测试)
基本的Ruby和REST API知识

安装必要组件

首先确保已安装Docker，然后拉取TGI官方镜像：

代码片段

docker pull ghcr.io/huggingface/text-generation-inference:1.1.0

第一步：启动Text Generation Inference服务

我们将使用Docker运行TGI服务。这里以Facebook的opt-1.3b模型为例：

代码片段

docker run -d \
  --name tgi-server \
  -p 8080:80 \
  -e MODEL_ID=facebook/opt-1.3b \
  ghcr.io/huggingface/text-generation-inference:1.1.0

参数说明：
– -d: 后台运行容器
– --name: 为容器命名
– -p: 端口映射(主机端口:容器端口)
– -e MODEL_ID: 指定要加载的模型

实践经验：
1. 首次运行会下载模型，可能需要较长时间(约5GB)
2. 如需GPU加速，添加--gpus all参数并确保NVIDIA驱动已安装
3. 生产环境建议使用更大的模型如facebook/opt-6.7b

验证服务是否正常运行：

代码片段

curl http://localhost:8080/health

应返回{"status":"OK"}。

第二步：创建Ruby API客户端

我们使用httparty gem简化HTTP请求处理。首先创建Gemfile：

代码片段

# Gemfile
source 'https://rubygems.org'

gem 'httparty'
gem 'sinatra' # 可选，用于构建Web接口

安装依赖：

代码片段

bundle install

创建基础客户端类：

代码片段

# tgi_client.rb
require 'httparty'

class TextGenerationClient
  include HTTParty
  base_uri 'http://localhost:8080'

  def generate(prompt, max_new_tokens = 50, temperature = 0.9)
    options = {
      body: {
        inputs: prompt,
        parameters: {
          max_new_tokens: max_new_tokens,
          temperature: temperature,
          return_full_text: false
        }
      }.to_json,
      headers: { 'Content-Type' => 'application/json' }
    }

    self.class.post('/generate', options)
  end

  # Stream式生成（适用于长文本）
  def generate_stream(prompt, &block)
    options = {
      body: {
        inputs: prompt,
        parameters: {
          max_new_tokens: nil,
          stream: true
        }
      }.to_json,
      headers: { 'Content-Type' => 'application/json' },
      stream_body: true # HTTParty流式处理支持
    }

    self.class.post('/generate_stream', options) do |fragment|
      yield fragment if block_given?
    end
  end
end

代码解析：
1. base_uri设置TGI服务地址
2. generate方法实现基本文本生成功能：
– max_new_tokens:控制生成长度
– temperature:控制随机性(0~1)
3. generate_stream实现流式响应处理

第三步：构建Sinatra API接口（可选）

如果你需要提供Web API给前端或其他服务调用：

代码片段

# app.rb
require 'sinatra'
require_relative 'tgi_client'

set :port, 4567

client = TextGenerationClient.new

post '/api/generate' do
  content_type :json

  request_body = JSON.parse(request.body.read)

  response = client.generate(
    request_body['prompt'],
    request_body['max_tokens'] || 50,
    request_body['temperature'] || 0.9
  )

  { result: response['generated_text'] }.to_json
end

get '/' do
<<~HTML 
<html>
<body>
<h2>TGI Ruby API Demo</h2>
<form action="/api/generate" method="post">
<textarea name="prompt" rows="4" cols="50"></textarea><br>
<input type="submit" value="Generate">
</form>
</body>
</html>
HTML 
end

启动Sinatra服务：

代码片段

ruby app.rb

API调用示例

Ruby控制台测试

代码片段

require_relative 'tgi_client'
client = TextGenerationClient.new 

# Basic generation 
response = client.generate("Once upon a time")
puts response['generated_text']

# Stream generation (print tokens as they come) 
client.generate_stream("The future of AI is") do |token|
 print token['token']['text']
end

cURL测试

代码片段

curl -X POST http://localhost:4567/api/generate \
-H "Content-Type: application/json" \ 
-d '{"prompt":"Explain Ruby in simple terms","max_tokens":100}'

性能优化与注意事项

批处理请求：TGI支持批处理，可以同时发送多个prompts提高吞吐量：

代码片段

def batch_generate(prompts)
  options = {
    body: { inputs: prompts }.to_json,
    headers: { 'Content-Type': 'application/json' }
  }
  self.class.post('/generate_batch', options) 
end

超时处理：为HTTP请求添加合理的超时设置：
代码片段
```
class TextGenerationClient 
  default_timeout(10) # seconds 
end 
```

错误处理：添加基本的错误处理逻辑：

代码片段

def generate(prompt, max_new_tokens =50)
  begin 
    # ... existing code ...
  rescue HTTParty::Error => e 
    { error:"API request failed:#{e.message}"}
  rescue JSON::ParserError => e  
    { error:"Invalid response format"}
  end  
end

生产环境建议：
- TGI部署考虑使用Kubernetes管理多个副本
- Ruby应用前添加Nginx反向代理和负载均衡
- Implement JWT认证保护API端点

总结

通过本文我们实现了：
1. Docker部署Text Generation Inference服务
2. Ruby客户端封装核心生成功能
3.（可选）Sinatra构建Web API层

完整项目结构如下：

代码片段

/project-root  
├── Gemfile  
├── Gemfile.lock  
├── tgi_client.rb # TGI客户端实现  
├── app.rb        # Sinatra Web接口（可选）  
└── README.md

这种架构的优势在于：
– 解耦：文本生成服务与应用逻辑分离
– 可扩展：可轻松替换后端模型或扩展API功能
– 高性能：利用TGI的高效推理能力

后续可考虑添加的功能包括：缓存层、限流机制、更复杂的prompt工程等。希望这篇教程能帮助你快速搭建基于Ruby和TGI的文本生成系统！