2025年05月必学：C++开发者的Stable Diffusion应用实战

引言

在AI绘画领域，Stable Diffusion已经成为最受欢迎的模型之一。作为C++开发者，你可能想知道如何在自己的应用中集成这一强大功能。本文将带你从零开始，使用C++实现一个简单的Stable Diffusion应用。

准备工作

环境要求

操作系统：Linux (推荐Ubuntu 22.04+) 或 Windows (需要WSL2)
编译器：支持C++17的编译器 (GCC 9+/Clang 10+/MSVC 19.28+)
Python环境 (用于模型转换)
CMake 3.20+
ONNX Runtime (1.15+)

安装依赖

代码片段

# Ubuntu/Debian
sudo apt install build-essential cmake python3-pip git

# Python依赖
pip install torch torchvision onnx onnxruntime

# Windows用户建议使用WSL2或vcpkg管理依赖

项目结构

代码片段

stable-diffusion-cpp/
├── CMakeLists.txt
├── include/
│   └── diffusion.h
├── src/
│   ├── diffusion.cpp
│   └── main.cpp
└── models/
    └── (下载的模型文件)

核心实现步骤

1. 模型准备与转换

首先我们需要将PyTorch模型转换为ONNX格式，以便在C++中使用：

代码片段

# convert_to_onnx.py
import torch
from diffusers import StableDiffusionPipeline

# 加载预训练模型
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")

# 导出为ONNX格式
dummy_input = {
    "prompt": ["a photograph of an astronaut riding a horse"],
    "num_inference_steps": torch.tensor(50, dtype=torch.long),
    "guidance_scale": torch.tensor(7.5, dtype=torch.float32),
}

torch.onnx.export(
    pipe,
    dummy_input,
    "models/sd_v1_5.onnx",
    opset_version=14,
    input_names=["prompt", "num_steps", "guidance"],
    output_names=["image"],
    dynamic_axes={
        "prompt": {0: "batch"},
        "image": {0: "batch"},
    },
)

注意事项：
1. 此转换需要至少16GB内存，建议在GPU上运行
2. ONNX文件可能很大(约5GB)，确保有足够磁盘空间

2. C++接口封装

创建diffusion.h头文件：

代码片段

// include/diffusion.h
#pragma once

#include <string>
#include <vector>
#include <onnxruntime_cxx_api.h>

class StableDiffusion {
public:
    // 初始化模型
    StableDiffusion(const std::string& model_path);

    // 生成图像
    std::vector<uint8_t> generate_image(
        const std::string& prompt,
        int steps = 50,
        float guidance_scale = 7.5f);

private:
    Ort::Env env_;
    Ort::Session session_;

    // ONNX运行时的辅助函数
    Ort::Value create_tensor(const std::vector<int64_t>& shape, 
                           const void* data, size_t size);
};

3. C++核心实现

代码片段

// src/diffusion.cpp
#include "diffusion.h"
#include <stdexcept>

StableDiffusion::StableDiffusion(const std::string& model_path) 
    : env_(ORT_LOGGING_LEVEL_WARNING, "StableDiffusion"),
      session_(env_, model_path.c_str(), Ort::SessionOptions{}) {

    if (session_.GetInputCount() != 3) {
        throw std::runtime_error("Invalid model - expected 3 inputs");
    }
}

Ort::Value StableDiffusion::create_tensor(
        const std::vector<int64_t>& shape,
        const void* data, size_t size) {

    auto memory_info = Ort::MemoryInfo::CreateCpu(
        OrtAllocatorType::OrtArenaAllocator, 
        OrtMemType::OrtMemTypeDefault);

    return Ort::Value::CreateTensor(
        memory_info, 
        const_cast<void*>(data), 
        size,
        shape.data(), 
        shape.size());
}

std::vector<uint8_t> StableDiffusion::generate_image(
        const std::string& prompt,
        int steps,
        float guidance_scale) {

    // Prepare inputs (简化版示例，实际需要更复杂的预处理)
    const char* input_names[] = {"prompt", "num_steps", "guidance"};

    // Convert inputs to tensors (实际应用中需要更完整的文本编码处理)
    std::vector<int64_t> prompt_shape = {1};

    auto prompt_tensor = create_tensor(
        prompt_shape, prompt.c_str(), prompt.size());

    auto steps_tensor = create_tensor(
        {1}, &steps, sizeof(int));

    auto guidance_tensor = create_tensor(
        {1}, &guidance_scale, sizeof(float));

    Ort::Value inputs[] = {
        std::move(prompt_tensor),
        std::move(steps_tensor),
        std::move(guidance_tensor)
    };

    // Run inference
    const char* output_names[] = {"image"};

    auto outputs = session_.Run(
        Ort::RunOptions{nullptr},
        input_names, inputs, 3,
        output_names, 1);

     // Process output (简化版示例)
     if (outputs.size() != 1) {
         throw std::runtime_error("Unexpected number of outputs");
     }

     auto& image_output = outputs[0];
     if (!image_output.IsTensor()) {
         throw std::runtime_error("Expected tensor output");
     }

     // Get raw image data (实际应用中需要解码为RGB格式)
     auto* raw_data = image_output.GetTensorData<uint8_t>();
     size_t data_len = image_output.GetTensorTypeAndShapeInfo().GetElementCount();

     return {raw_data, raw_data + data_len};
}

4. Main函数示例

代码片段

// src/main.cpp
#include "diffusion.h"
#include <iostream>
#include <fstream>

int main() {
    try {
        // Initialize the model (确保路径正确)
        StableDiffusion sd("models/sd_v1_5.onnx");

        // Generate an image from text prompt
        auto image_data = sd.generate_image(
            "a beautiful sunset over mountains in anime style",
            30,   // steps 
            7.5f); // guidance scale

         // Save the raw output to file (实际应用中应该解码为PNG/JPG等格式)
         std::ofstream out("output.bin", std::ios::binary);
         out.write(reinterpret_cast<const char*>(image_data.data()), 
                  image_data.size());

         std::cout << "Image generated successfully! Saved to output.bin\n";

     } catch (const std::exception& e) {
         std::cerr << "Error: " << e.what() << "\n";
         return -1;
     }

     return 0;
}

CMake配置

代码片段

# CMakeLists.txt
cmake_minimum_required(VERSION 3.20)
project(StableDiffusionCPP)

set(CMAKE_CXX_STANDARD 17)

find_package(ONNXRuntime REQUIRED)

add_executable(sd_app 
    src/diffusion.cpp 
    src/main.cpp)

target_include_directories(sd_app PRIVATE include)
target_link_libraries(sd_app PRIVATE onnxruntime)

if(WIN32) # Windows特定设置（如果使用）
   target_link_libraries(sd_app PRIVATE onnxruntime.lib)
endif()

Build & Run

代码片段

mkdir build && cd build
cmake ..
make -j4   # Linux/macOS使用make，Windows使用cmake --build .

# Run the application (确保models目录中有ONNX模型文件)
./sd_app

Advanced Topics & Optimization Tips

性能优化：

GPU加速：配置ONNX Runtime使用CUDA/ROCm后端

代码片段

OrtCUDAProviderOptions cuda_options;
session_options.AppendExecutionProvider_CUDA(cuda_options);<br>

内存优化：
- FP16量化：将模型转换为FP16精度减少内存占用
  代码片段
```
pipe.to(torch.float16) # PyTorch转换前添加  <br>
```
输入预处理：
- CLIP文本编码器应该在C++中实现以获得最佳性能
- Tokenization步骤可以单独优化
输出后处理：
- VAE解码器可以集成到C++端减少Python依赖

Troubleshooting常见问题解决

问题1: ONNX加载失败
– 解决方案:

代码片段

pip install --upgrade onnxruntime-gpu # GPU版本或onnxruntime-cpu  
export LD_LIBRARY_PATH=/path/to/onnxruntime/lib:$LD_LIBRARY_PATH # Linux设置库路径

问题2: Tensor形状不匹配错误
– 检查点:
1. PyTorch和ONNX版本是否兼容？
2. dummy_input的形状是否与导出时一致？

问题3: Windows链接错误（LNK2019）
– 解决方案:
确保Visual Studio安装了v143工具集并配置了正确的ONNX Runtime库路径

Conclusion总结

通过本文我们实现了：

✅ C++中加载和运行Stable Diffusion ONNX模型的基础框架
✅ ONNXRuntime API的基本使用方法
✅ AI绘画应用的端到端流程集成

未来可以进一步优化的方向：

🔹添加完整的文本编码预处理（CLIP tokenizer）
🔹实现VAE解码器的C++版本提高性能
🔹支持多批次推理提高吞吐量

完整项目代码可以在GitHub仓库获取（假设地址）：https://github.com/example/stable-diffusion-cpp