Python中LangChain实现多模态应用：聊天机器人实战案例

引言

在2025年的AI技术生态中，多模态AI应用已成为主流趋势。本文将带你使用Python的LangChain框架，构建一个能够同时处理文本、图像输入的多模态聊天机器人。通过这个实战案例，你将学习如何整合不同的AI模型，创建更智能的交互体验。

准备工作

环境要求

Python 3.9+
LangChain 0.1.0+
OpenAI API密钥(或其他支持的模型API)
多模态模型访问权限(如GPT-4 Vision)

安装依赖

代码片段

pip install langchain openai pillow python-dotenv

配置文件(.env)

代码片段

OPENAI_API_KEY=your_api_key_here

项目结构

代码片段

multimodal-chatbot/
├── main.py          # 主程序
├── utils.py         # 辅助函数
├── .env             # 环境配置
└── requirements.txt # 依赖列表

核心代码实现

1. 初始化多模态链

代码片段

from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from dotenv import load_dotenv
import os

# 加载环境变量
load_dotenv()

def initialize_multimodal_chain():
    # 定义基础文本处理提示模板
    text_prompt = PromptTemplate(
        input_variables=["text_input"],
        template="""
        你是一个智能助手，请根据以下输入提供有帮助的回复：
        输入: {text_input}
        回复:"""
    )

    # 初始化LLM (这里使用OpenAI GPT-4)
    llm = OpenAI(
        model_name="gpt-4",
        temperature=0.7,
        max_tokens=1000,
        openai_api_key=os.getenv("OPENAI_API_KEY")
    )

    # 创建文本处理链
    text_chain = LLMChain(llm=llm, prompt=text_prompt)

    return text_chain

# 注意：实际的多模态处理需要额外集成视觉模型，这里简化了示例结构

2. 图像处理模块

代码片段

from PIL import Image
import base64
from io import BytesIO

def process_image(image_path):
    """
    处理图像输入，转换为base64编码字符串

    参数:
        image_path (str): 图像文件路径

    返回:
        str: base64编码的图像字符串
    """
    with Image.open(image_path) as img:
        buffered = BytesIO()
        img.save(buffered, format="JPEG")
        img_str = base64.b64encode(buffered.getvalue()).decode("utf-8")

    return img_str

def get_image_description(image_str):
    """
    获取图像描述 (实际项目中应调用视觉API)

    参数:
        image_str (str): base64编码的图像字符串

    返回:
        str: 图像描述文本
    """
    # 这里是模拟实现，实际应调用如GPT-4 Vision等API

    # TODO: Replace with actual API call in production:
    # response = openai.ChatCompletion.create(
    #     model="gpt-4-vision-preview",
    #     messages=[
    #         {
    #             "role": "user",
    #             "content": [
    #                 {"type": "text", "text": "描述这张图片的内容"},
    #                 {"type": "image_url", "image_url": f"data:image/jpeg;base64,{image_str}"},
    #             ],
    #         }
    #     ],
    #     max_tokens=300,
    # )

    return "[模拟]这是一张包含多种元素的图片"

3. 多模态聊天机器人主类

代码片段

class MultimodalChatbot:
    def __init__(self):
        self.text_chain = initialize_multimodal_chain()

    def process_input(self, input_data, input_type="text"):
        """
        处理用户输入

        参数:
            input_data: str或图像路径/数据
            input_type: "text"或"image"

        返回:
            str: AI生成的回复
        """
        if input_type == "text":
            return self._process_text(input_data)
        elif input_type == "image":
            return self._process_image(input_data)
        else:
            raise ValueError("不支持的输入类型")

    def _process_text(self, text_input):
        """处理纯文本输入"""
        return self.text_chain.run(text_input=text_input)

    def _process_image(self, image_input):
        """处理图像输入"""
        if isinstance(image_input, str):
            if os.path.exists(image_input):
                image_str = process_image(image_input)
            else:
                raise FileNotFoundError(f"图像文件不存在: {image_input}")

        description = get_image_description(image_str)

        prompt = f"用户上传了一张图片。系统对图片的描述是:{description}\n请生成一个友好的回复。"

        return self.text_chain.run(text_input=prompt)

# 注意：在实际项目中应考虑添加错误处理和日志记录功能

4. CLI交互界面示例

代码片段

def run_cli_chat():
    """命令行交互界面"""

    print("多模态聊天机器人已启动(输入'exit'退出)")

    bot = MultimodalChatbot()

    while True:
        user_input = input("\n你: ")

        if user_input.lower() == 'exit':
            break

        if user_input.startswith('!img '):
            image_path = user_input[5:]
            try:
                response = bot.process_input(image_path, input_type="image")
                print(f"\nAI: {response}")
            except Exception as e:
                print(f"\n错误: {str(e)}") 

                continue



if __name__ == "__main__":

run_cli_chat()

完整示例代码可在GitHub仓库获取：[示例仓库链接]

通过本文的学习，你已经掌握了使用LangChain构建多模态聊天机器人的基本方法。随着AI技术的发展，多模态交互将成为越来越重要的能力。建议继续探索以下方向：

LangChain的其他高级功能（记忆、工具使用等）
1. 14

微信扫码登录

Python中LangChain实现多模态应用：聊天机器人实战案例 (2025年05月)

Python中LangChain实现多模态应用：聊天机器人实战案例

引言

准备工作

环境要求

安装依赖

配置文件(.env)

项目结构

核心代码实现

1. 初始化多模态链

2. 图像处理模块

3. 多模态聊天机器人主类

4. CLI交互界面示例