Mac上安装DeepSeek后的模型微调指南

引言

DeepSeek是一款强大的开源大语言模型，在Mac上进行本地部署和微调可以让你根据自己的需求定制模型。本指南将详细介绍如何在Mac上安装DeepSeek并对其进行微调，即使你是初学者也能轻松上手。

准备工作

系统要求

macOS 12 (Monterey) 或更高版本
Python 3.8+
建议16GB以上内存（8GB勉强可用但性能受限）
建议使用M1/M2芯片的Mac（Intel Mac也可运行但速度较慢）

前置软件安装

首先确保你的Mac已安装以下工具：

代码片段

# 安装Homebrew（如果尚未安装）
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# 安装Python和必要的工具
brew install python git cmake

DeepSeek安装步骤

1. 创建虚拟环境

代码片段

# 创建项目目录
mkdir deepseek_finetune && cd deepseek_finetune

# 创建Python虚拟环境
python -m venv venv

# 激活虚拟环境
source venv/bin/activate

注意：每次开始工作前都需要激活虚拟环境，这样可以避免包冲突。

2. 安装PyTorch和依赖项

根据你的Mac芯片类型选择对应的PyTorch版本：

代码片段

# M1/M2芯片用户
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

# Intel芯片用户
pip install torch torchvision torchaudio

然后安装其他依赖：

代码片段

pip install transformers datasets accelerate sentencepiece peft bitsandbytes scipy pandas numpy tqdm

3. 下载DeepSeek模型

代码片段

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "deepseek-ai/deepseek-llm-7b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype="auto")

注意事项：
– Mac上首次运行会下载约15GB的模型文件，请确保有足够的磁盘空间和稳定的网络连接
– device_map="auto"会自动将模型分配到可用的硬件上（CPU/GPU）
– M系列芯片会使用Metal加速（通过PyTorch的MPS后端）

DeepSeek模型微调指南

1. 准备训练数据

创建一个JSON格式的训练文件train_data.json：

代码片段

[
    {"instruction": "解释什么是机器学习", "output": "机器学习是人工智能的一个分支..."},
    {"instruction": "如何用Python写一个简单的HTTP服务器", "output": "可以使用Python内置的http.server模块..."},
    {"instruction": "描述TCP/IP协议的工作原理", "output": "TCP/IP协议族是互联网的基础..."}
]

2. 数据预处理代码

代码片段

from datasets import load_dataset

dataset = load_dataset("json", data_files="train_data.json")

def preprocess_function(examples):
    inputs = [f"Instruction: {i}\nOutput: " for i in examples["instruction"]]
    targets = examples["output"]

    # Tokenize inputs and targets separately to avoid loss calculation on instruction text
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")

    # Tokenize targets with the same tokenizer but don't pad them (for labels)
    labels = tokenizer(targets, max_length=512, truncation=True)

    # Set the labels in model_inputs (shifted by one position for causal LM)
    model_inputs["labels"] = labels["input_ids"]

    return model_inputs

tokenized_dataset = dataset.map(preprocess_function, batched=True)

3. LoRA微调配置（节省显存）

代码片段

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=8,                  # LoRA rank维度 (越小越节省显存)
    lora_alpha=32,        # LoRA alpha参数 (通常设置为r的倍数)
    target_modules=["q_proj", "v_proj"],   # LoRA作用于哪些层 (query和value投影层)
    lora_dropout=0.05,    # Dropout率防止过拟合
    bias="none",          # LoRA是否调整bias参数 ("none"表示不调整)
)

peft_model = get_peft_model(model, lora_config) 
peft_model.print_trainable_parameters()   # 查看可训练参数数量 (应该远小于全量参数)

4. 训练配置与执行

代码片段

from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    per_device_train_batch_size=1,     # Mac上由于内存限制建议设为1或2
    gradient_accumulation_steps=4,     # 通过梯度累积模拟更大的batch size
    num_train_epochs=3,
    learning_rate=3e-4,
    logging_steps=10,
    save_steps=100,
    fp16=True if torch.backends.mps.is_available() else False,   # M系列芯片启用混合精度训练(MPS backend only)
)

trainer = Trainer(
    model=peft_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
)

trainer.train()

实践经验：
– Mac上训练大模型速度较慢，建议从小规模数据集开始测试流程是否正确
– M系列芯片可以使用fp16=True加速训练，但要注意梯度爆炸问题
– Intel Mac用户可能需要设置fp16=False并降低学习率

5.保存和使用微调后的模型

代码片段

# 保存适配器权重(仅保存LoRA部分，体积很小)
peft_model.save_pretrained("deepseek_lora_adapter")

# 加载和使用微调后的模型示例：
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-llm-7b")
tuned_model = PeftModel.from_pretrained(base_model, "deepseek_lora_adapter")

def generate_response(instruction):
    prompt = f"Instruction: {instruction}\nOutput: "

    inputs = tokenizer(prompt, return_tensors="pt").to("mps")   # MPS设备(M系列芯片专用) 

    outputs = tuned_model.generate(
        input_ids=inputs.input_ids,
        max_new_tokens=256,
        temperature=0.7,
        do_sample=True,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        no_repeat_ngram_size=2,
        early_stopping=True,
        use_cache=True,
        attention_mask=inputs.attention_mask if hasattr(inputs,"attention_mask") else None,
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True).replace(prompt,"")

return response.strip()

print(generate_response("解释什么是迁移学习"))

Mac特定优化技巧

Metal性能优化：

代码片段

import torch

if torch.backends.mps.is_available():
    device = torch.device("mps")      # M系列芯片专用加速后端

model.to(device)                      # Move model to Metal backend

print(f"Using device: {device}")

内存管理：

代码片段

from accelerate import infer_auto_device_map

device_map = infer_auto_device_model(model)     # Automatically split model across devices

print(f"Device map: {device_map}")

监控资源使用：

代码片段

# Terminal中监控系统资源使用情况(新开一个终端窗口运行)
top -o cpu     # CPU使用率排序显示

vm_stat       # Memory usage statistics

iostat -w5     # Disk I/O every5 seconds (if swapping occurs)

FAQ常见问题解决

Q:运行时遇到”RuntimeError: Placeholder storage has not been allocated on MPS device!”

A:这是PyTorch MPS后端的已知问题，尝试重启Python内核或添加以下代码：

代码片段

import os 
os.environ['PYTORCH_ENABLE_MPS_FALLBACK']='1'

Q:如何减少内存占用？

A:
1.尝试更小的LoRA rank值(r=4或r=2)
2.减小maxlength值(如256)
3.使用gradientcheckpointing技术

Q:Intel Mac上训练太慢怎么办？

A:
1.考虑使用Colab等云服务进行训练
2.减少batch size和序列长度
3.只微调最后一两层网络

总结

本文详细介绍了在Mac上安装DeepSeek并进行LoRA微调的完整流程。关键步骤包括：

1️⃣准备Python环境和必要的软件包
2️⃣下载DeepSeek基础模型
3️⃣准备适合自己任务的训练数据
4️⃣配置LoRA参数进行高效微调
5️⃣Mac特定优化技巧

通过这种方法，即使是个人开发者也可以在Mac笔记本上高效地定制大语言模型。虽然性能不如专业GPU服务器，但对于小规模实验和个人项目已经足够。