掌握Ruby LangChain自定义工具开发：Web开发场景下的应用与优化

引言

在现代Web开发中，LangChain作为一个强大的语言模型集成框架，可以帮助开发者快速构建智能应用。本文将带你使用Ruby语言开发自定义LangChain工具，并展示如何在Web开发场景中应用和优化这些工具。

准备工作

在开始之前，请确保你的开发环境满足以下要求：

Ruby 3.0+
Bundler gem
OpenAI API密钥（或其他LLM提供商）
基本的Rails或Sinatra Web框架知识

安装必要的gem：

代码片段

gem install langchainrb dotenv

第一步：创建基础LangChain工具

让我们从一个简单的自定义工具开始，这个工具可以从给定的URL提取并总结网页内容。

代码片段

require 'langchain'
require 'nokogiri'
require 'open-uri'

class WebSummarizerTool < Langchain::Tool::Base
  # 定义工具名称和描述（LLM会根据描述决定是否使用此工具）
  name "WebSummarizer"
  description <<~DESC
    当用户需要从网页获取信息或总结内容时使用此工具。
    输入应该是有效的URL。
    输出将是网页内容的摘要。
  DESC

  # 实际执行的方法
  def execute(input)
    # 1. 获取网页内容
    html = URI.open(input).read

    # 2. 使用Nokogiri解析HTML
    doc = Nokogiri::HTML(html)

    # 3. 提取主要内容（这里简化处理）
    content = doc.css('body').text.gsub(/\s+/, ' ').strip

    # 4. 调用LLM进行总结（实际项目中应该异步处理）
    Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
      .complete(prompt: "请用100字以内总结以下内容: #{content[0..2000]}") # 限制输入长度

    # 返回结果
    response.completion
  rescue => e
    "错误: #{e.message}"
  end
end

代码解释：

工具定义：继承Langchain::Tool::Base并定义名称和描述，这些信息会被LLM用来判断何时使用此工具。
HTML处理：使用Nokogiri解析HTML并提取文本内容。
长度限制：向LLM发送请求时限制输入长度以避免token超限。
错误处理：捕获并返回可能的网络或解析错误。

第二步：集成到Web应用中

下面我们将这个工具集成到简单的Sinatra应用中：

代码片段

require 'sinatra'
require 'json'
require_relative 'web_summarizer_tool'

# LangChain初始化
$llm = Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])
$agent = Langchain::Agent::ReActAgent.new(llm: $llm)
$agent.add_tool(WebSummarizerTool.new)

post '/summarize' do
  content_type :json

  begin
    data = JSON.parse(request.body.read)

    # URL验证（简单版）
    raise "无效URL" unless data["url"] =~ /\A#{URI::regexp(['http', 'https'])}\z/

    # LangChain执行任务
    result = $agent.run("请总结这个网页的内容: #{data["url"]}")

    { status: "success", summary: result }.to_json

  rescue => e
    status 400
    { status: "error", message: e.message }.to_json
  end
end

# HTML前端界面（简化版）
get '/' do
<<~HTML
<!DOCTYPE html>
<html>
<head><title>网页摘要工具</title></head>
<body>
<h1>网页摘要服务</h1>
<form id="summaryForm">
<input type="url" name="url" placeholder="输入URL" required>
<button type="submit">获取摘要</button>
</form>
<div id="result"></div>

<script>
document.getElementById('summaryForm').addEventListener('submit', async (e) => {
e.preventDefault();
const url = e.target.url.value;
const response = await fetch('/summarize', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ url })
});
const data = await response.json();
document.getElementById('result').innerHTML = 
data.status === 'success' ? `<p>${data.summary}</p>` : `<p class="error">${data.message}</p>`;
});
</script>
</body>
</html>
HTML
end

puts "服务已启动，访问 http://localhost:#{settings.port}"

Web集成要点：

API端点：创建/summarize端点处理摘要请求。
输入验证：对用户输入的URL进行基本验证。
错误处理：捕获并返回JSON格式的错误信息。
前端交互：简单的前端表单演示如何调用API。

第三步：性能优化与实践经验

在实际Web应用中直接调用LLM可能会遇到性能问题。以下是几个优化建议：

1. 缓存机制

代码片段

# Redis缓存实现示例（需安装redis gem）
require 'redis'

class CachedWebSummarizer < WebSummarizerTool  
 def execute(input)
   cache_key = "summary:#{Digest::MD5.hexdigest(input)}"

   if cached_result = redis.get(cache_key)
     return cached_result 
   end 

   result = super 
   redis.setex(cache_key, 3600, result) #缓存1小时 
   result 
 end

 private

 def redis 
   @redis ||= Redis.new(url: ENV['REDIS_URL'] || 'redis://localhost:6379') 
 end 
end 

# Agent配置改为使用缓存版本：
$agent.add_tool(CachedWebSummarizer.new)

2. HTML预处理优化

原始示例中的简单HTML处理可能不够健壮。改进版本：

代码片段

def extract_main_content(doc)
 # - <article>标签优先 
 if article = doc.at_css('article')
   return article.text.gsub(/\s+/, ' ')
 end

 # - Medium风格的容器检测 
 if container = doc.at_css('[role="main"], .post-content, .article-body')
   return container.text.gsub(/\s+/, ' ')
 end

 # - Fallback到<body>但移除脚本/样式等 
 doc.css('script, style, nav, footer').remove 
 doc.at_css('body').text.gsub(/\s+/, ' ')
end 

# execute方法中替换为：
content = extract_main_content(doc)

Web应用部署注意事项

超时设置：

代码片段

configure do 
  set :show_exceptions, false 
  set :timeout, [10,15] # Sinatra超时设置可能需要调整或在前端实现轮询机制  
end

异步处理模式：

对于长时间运行的任务，建议实现作业队列系统：

代码片段

# Sidekiq示例（需要安装sidekiq gem）  
class SummaryJob  
 include Sidekiq::Worker  
 def perform(url)  
   CachedWebSummarizer.new.execute(url)  
 end  
end  

# API端点改为：
post '/summarize' do  
 job_id = SummaryJob.perform_async(params[:url])  
 { job_id: job_id }.to_json  
end  

# /status端点检查结果...

Web开发场景应用示例

让我们看一个更复杂的电商场景应用——产品评论分析器：

代码片段

class ProductReviewAnalyzer < Langchain::Tool::Base  
 name "ReviewAnalyzer"  
 description <<~DESC  
    分析产品评论的情感倾向和关键点。输入应该是JSON格式的评论列表。  
    输出将是分析报告。  
 DESC  

 def execute(input)  
   reviews = JSON.parse(input) rescue []  

   prompt_template = <<~PROMPT      
     你是一个专业的产品经理，请分析以下#{reviews.size}条用户评论：

     主要优点:
     - %{positives}

     主要缺点:
     - %{negatives}

     情感倾向评分(0-100): %{sentiment_score}      

     改进建议:
     - %{suggestions}      

     原始评论样例:
     %{sample_reviews}      
 PROMPT  

   analysis_prompt = format(prompt_template,
     positives: extract_keywords(reviews, positive: true).join(", "),
     negatives: extract_keywords(reviews, positive: false).join(", "),
     sentiment_score: sentiment_score(reviews),
     suggestions: generate_suggestions(reviews),
     sample_reviews: reviews.sample(3).map { |r| "#{r['rating']}星: #{r['text'][0..50]}..." }.join("\n")
   )

   llm.complete(prompt: analysis_prompt).completion   
 end  

 private  

 def sentiment_score(reviews)  
   avg_rating = reviews.sum { |r| r['rating'].to_i } / reviews.size.to_f  
   (avg_rating /5 *100).round  
 end  

 def extract_keywords(reviews, positive:)  
   texts = reviews.select { |r| (r['rating'].to_i >=4) == positive }.map { |r| r['text'] }   

   keywords_prompt = "提取以下文本中的#{positive ? '正面' : '负面'}关键词:\n#{texts.join("\n")}"   

   llm.complete(prompt: keywords_prompt).completion.split(/,\s*/) rescue []   
 end  

 def generate_suggestions(reviews)   
   negative_reviews = reviews.select { |r| r['rating'].to_i <=2 }   

   suggestion_prompt = <<~PROMPT      
     基于以下客户投诉提出产品改进建议:

     #{negative_reviews.map { |r| "- #{r['text']}" }.join("\n")}

     请列出3-5条具体可执行的建议:
 PROMPT  

   llm.complete(prompt:suggestion_prompt).completion.split("\n").map(&:strip).join(", ") rescue ""   
 end   

 def llm    
   @llm ||= Langchain::LLM::OpenAI.new(api_key: ENV["OPENAI_API_KEY"])    
 end    
end 

# Rails控制器的示例用法:
def analyze_reviews  
 analyzer_tool = ProductReviewAnalyzer.new  

 respond_to do |format|  
 format.json do   
 analysis_result= analyzer_tool.execute(params[:reviews].to_json)   
 render json:{ analysis_report: analysis_result }   
 end   
 end   
end

Review Analyzer关键点:

结构化输出: LLM生成的分析报告遵循固定模板，便于前端展示。
多级分析:

Sentiment评分基于实际评价星级计算；
LLM仅用于提取关键词和建议生成；
Sample展示保留原始数据可信度；

Rails集成: JSON接口设计使其易于与现有系统集成；
性能考虑: Review数量较多时应分批处理或抽样分析；
安全措施: rescue保护避免无效输入导致服务中断；

LangChain调试技巧

当你的自定义工具不按预期工作时：

日志记录:

代码片段

class LoggingWrapper < SimpleDelegator    
 def execute(input)      
 Rails.logger.info "[LangChain] #{self.class.name} received input:#{input}"       
 result= super       
 Rails.logger.debug "[LangChain] Tool output:#{result}"       
 result       
 rescue=>e       
 Rails.logger.error "[LangChain] Error in #{self.class.name}: #{e.message}"       
 raise      
 end    
end    

#包装原始工具:
$agent.add_tool(LoggingWrapper.new(ProductReviewAnalyzer.new))

测试提示词:

在Rails console中直接测试提示词效果:

代码片段

test_review= [{rating:"5", text:"物流很快包装完好"}, ...]    
puts ProductReviewAnalyzer.new.execute(test_review.to_json)

Token计数:

代码片段

llm.count_tokens("你的提示词文本") #确保不超过模型限制(如GPT-3通常4096 tokens)

4.Agent可视化(开发环境):

添加路由显示Agent的决策过程:

代码片段

get '/debug/agent' do    
 agent_logs= $agent.instance_variable_get(:@execution_logs)||[]    
 content_type :json    
 agent_logs.to_json    
end

Web部署最佳实践

生产环境部署需要考虑:

1.Docker化(示例Dockerfile):

代码片段

FROM ruby:<version>-alpine    

RUN apk add --no-cache build-base    

WORKDIR /app    

COPY Gemfile* ./    

RUN bundle install --without development test && \        
bundle clean --force    

COPY . .    

ENV PORT=<your_port>    

CMD ["bundle","exec","rackup","--host","0.0.0.0","-p","$PORT"]    

EXPOSE $PORT

2.健康检查端点:

代码片段

get '/health' do        
 checks={        
 redis:$redis.ping == "PONG",        
 openai:$llm.models.list.success?        
 }        

 status checks.all? ?200 :503        
 checks.to_json        
end

3.限流中间件(Rack Attack示例):

代码片段

class RackApp < Sinatra::Application        
 use Rack::Attack        

 Rack::Attack.throttle('API requests', limit:<%= ENV.fetch("RATE_LIMIT",60)%>, period:<%= ENV.fetch("RATE_PERIOD",60)%>)do |req|         
 req.ip if req.path.start_with?('/summarize')         
 end         
end

4.监控指标(Prometheus示例):

代码片段

require 'prometheus/middleware/collector'        

configure do         
 use Prometheus::Middleware::Collector         
end         

get '/metrics' do         
 content_type Prometheus::Client::Formats::Text.content_type         
 Prometheus::Client.configuration.data_store.set(langchain_requests_total:<%= rand(100..500)%>)         
 PrometheusExporter.text_format         
end

Web安全注意事项

在公开API时需特别注意:

1.输入消毒:

增强版的URL验证:

代码片段

def valid_url?(url)          
 uri= URI.parse(url)          
 uri.is_a?(URI::HTTP) && !uri.host.empty?          
rescue URI::InvalidURIError          
 false          
end          

def sanitize_input(text,max_length=2000)          
 CGI.escapeHTML(text.to_s[0...max_length])          
end

2.认证中间件:

代码片段

before '/summarize' do          
 halt401 unless request.env['HTTP_API_KEY']==ENV.fetch("API_KEY")          
end          

before '/analyze'do          
 verify_jwt(request.env['HTTP_AUTHORIZATION'])||halt403          
end          

def verify_jwt(token)          
 JWT.decode(token,Rails.application.secret_key_base)[0]          
rescue=>e          
 false          
end

3.敏感数据过滤:

在日志中过滤API密钥:

代码片段

filtered_keys=['password','api_key','secret']        

around_action do |controller,&block|        
 filtered_params= controller.params.deep_dup.tapdo |params|        
 filtered_keys.each{|k| params[k]&&='[FILTERED]'}        
 end        

 Rails.logger.info "[Request]#{controller.request.method}#{controller.request.path} params:#{filtered_params}"        

 block.call        

 ensure        
 Rails.logger.info "[Response]#{controller.response.status}"        
end

Web前端优化技巧

提升用户体验的关键点:

1.加载状态反馈:

增强的前端JavaScript:

代码片段

document.getElementById('summaryForm').addEventListener('submit',async(e)=>{       
 e.preventDefault();       

 const submitBtn=e.target.querySelector('button[type="submit"]');       
 const originalText=submitBtn.textContent;       

 try{       
 submitBtn.disabled=true;       
 submitBtn.innerHTML='<span class="spinner"></span>处理中...';       

 const response=await fetch('/summarize',/*...*/);       

 if(!response.ok){       
 throw new Error(`请求失败:${{response.status}}`);       
 }       

 const data=await response.json();       

 //显示带有动画的结果       
 document.getElementById('result').innerHTML=`       
 <div class="result-animation">${data.summary}</div>`;       

 }catch(error){       
 showToast(error.message,'error');       
 }finally{       
 submitBtn.disabled=false;       
 submitBtn.textContent=originalText;       
 });       

function showToast(message,type){/*...*/}      
});

//CSS动画示例:
.result-animation{
 animation:fadeIn0。5sease-out;
}

@keyframes fadeIn{
 from{opacity:<%=0%>;}
 to{opacity:<%=100%>;}
}

2.历史记录功能:

利用localStorage保存查询历史:

代码片段

function saveToHistory(url,summary){      
 const history=JSON.parse(localStorage.getItem('summaryHistory')||'[]');      
 history.unshift({url,summary,timestamp:+newDate});      
 localStorage.setItem('summaryHistory',JSON.stringify(history.slice(<%=0%>,10)));      
}      

function loadHistory(){/*渲染历史记录列表*/}      

//在成功回调中添加:
saveToHistory(url,data.summary);      
loadHistory();

3.渐进式增强UI:

高级编辑器功能示例:

代码片段

if(window.showSaveFilePicker){//现代浏览器API检测      
 document.getElementById('exportBtn').addEventListener('click',async()=>{      
 try{
 const handle=await window.showSaveFilePicker({      
 suggestedName:'summary.md',types:[{
 description:'Markdown文件',
 accept:{'text/markdown':['md']}
}]});      

 const writable=await handle.createWritable();       await writable.write(`#网页摘要\n\n${currentSummary}`);       await writable.close();       }catch(err){console.error(err);}});}else{$('#exportBtn').click(()=>{
 //回退方案...
});}

4.性能追踪:

前端性能监控代码片段:

“`
const perfObserver=new PerformanceObserver((list)=>{
list.getEntries