手把手教你在Apple Silicon M3上安装BERT，新手必看教程 (2025年05月)

引言

BERT(Bidirectional Encoder Representations from Transformers)是Google开发的革命性自然语言处理模型。随着Apple Silicon M3芯片的普及，许多开发者希望在本地运行BERT进行学习和开发。本教程将详细介绍如何在M3芯片的Mac上安装和运行BERT模型。

准备工作

在开始之前，请确保你的设备满足以下要求：

Mac电脑配备Apple Silicon M3芯片
macOS Ventura(13.0)或更高版本
Python 3.9或更高版本
至少16GB内存(推荐32GB以获得更好体验)
Xcode命令行工具已安装

第一步：安装Homebrew和基础依赖

Homebrew是macOS上的包管理器，我们将首先安装它：

代码片段

# 安装Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# 将Homebrew添加到PATH环境变量
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc

# 安装Python和基础依赖
brew install python cmake protobuf rust

注意事项：
1. 如果遇到权限问题，可以在命令前加上sudo
2. Apple Silicon芯片的Homebrew默认安装在/opt/homebrew目录下

第二步：创建Python虚拟环境

为了避免污染系统Python环境，我们创建一个专用虚拟环境：

代码片段

# 创建名为bert_env的虚拟环境
python -m venv ~/bert_env

# 激活虚拟环境
source ~/bert_env/bin/activate

# 升级pip和setuptools
pip install --upgrade pip setuptools wheel

原理说明：
虚拟环境可以隔离项目依赖，防止不同项目间的包版本冲突。

第三步：安装PyTorch for Apple Silicon

由于M3芯片使用ARM架构，我们需要安装专门优化的PyTorch版本：

代码片段

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu

# 验证安装是否成功
python -c "import torch; print(torch.__version__); print(torch.backends.mps.is_available())"

预期输出：
应该显示PyTorch版本和True(表示MPS后端可用)

经验分享：
MPS(Metal Performance Shaders)是Apple提供的GPU加速框架，可以让PyTorch利用M3芯片的GPU能力。

第四步：安装Transformers和相关库

现在我们可以安装Hugging Face的Transformers库了：

代码片段

pip install transformers sentencepiece numpy pandas tqdm scikit-learn matplotlib jupyterlab

# 可选：如果你想使用TensorFlow而不是PyTorch
pip install tensorflow-macos tensorflow-metal

注意事项：
1. sentencepiece是BERT分词器需要的依赖
2. tensorflow-metal为TensorFlow提供Apple Silicon GPU支持

第五步：下载预训练BERT模型

我们将使用Hugging Face提供的预训练BERT模型：

代码片段

from transformers import BertTokenizer, BertModel

# 下载并保存bert-base-uncased模型和分词器
model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertModel.from_pretrained(model_name)

# 保存到本地以便后续使用(可选)
save_path = "./bert_model"
tokenizer.save_pretrained(save_path)
model.save_pretrained(save_path)

原理说明：
1. bert-base-uncased是不区分大小写的英文BERT基础模型
2. Hugging Face会自动下载模型权重和配置文件到缓存目录(~/.cache/huggingface)

第六步：测试BERT是否正常工作

让我们编写一个简单的测试脚本验证一切正常：

代码片段

import torch
from transformers import BertTokenizer, BertModel

# 检查是否可以使用MPS加速(M3 GPU)
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
print(f"Using device: {device}")

# 加载模型和分词器
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased').to(device)

# 准备输入文本并分词化
text = "Hello, my name is John and I love natural language processing!"
inputs = tokenizer(text, return_tensors="pt").to(device)

# 获取模型输出(禁用梯度计算以节省内存)
with torch.no_grad():
    outputs = model(**inputs)

# [CLS]标记的输出可用于分类任务等用途(last_hidden_state的形状为[batch_size, sequence_length, hidden_size])
print(f"Last hidden state shape: {outputs.last_hidden_state.shape}")
print(f"Pooler output shape: {outputs.pooler_output.shape}")

预期输出：

代码片段

Using device: mps
Last hidden state shape: torch.Size([1, 16, 768])
Pooler output shape: torch.Size([1, 768])

M3性能优化技巧

为了在Apple Silicon M3上获得最佳性能：

批处理输入数据：尽量一次处理多个句子而非单个句子，提高GPU利用率。
使用混合精度：可以减少内存占用并提高速度。

代码片段

model = model.half()  # 转换为半精度浮点数(Float16)
inputs = inputs.to(torch.float16)

限制序列长度：BERT对长文本处理效率低，可考虑截断或分块处理。
JIT编译优化(实验性)：

代码片段

model = torch.jit.trace(model, [inputs["input_ids"], inputs["attention_mask"]])

常见问题解决

Q1: “Could not build wheels for tokenizers”

解决方案：

代码片段

brew install rustup-init && rustup-init -y --default-toolchain stable --profile minimal --component rustfmt clippy rust-src && source $HOME/.cargo/env && pip install tokenizers --no-binary :all:

Q2: “Out of memory”错误

解决方法：
1. 减小batch size或序列长度(max_length)
2. model.eval()减少内存消耗
3. with torch.no_grad():禁用梯度计算

Q3: MPS后端不支持某些操作

解决方法：
1. model.to("cpu")临时切换到CPU执行
2. torch.set_default_device("cpu")设置默认设备

Jupyter Notebook示例 (可选)

如果你想在Jupyter Notebook中使用BERT:

代码片段

pip install jupyterlab ipywidgets widgetsnbextension pandas scikit-learn matplotlib seaborn plotly pyarrow fastparquet notebook jupyter_contrib_nbextensions jupyter_nbextensions_configurator autopep8 flake8 yapf black isort rope jedi ipython_genutils traitlets jupyter_core nbformat nbconvert jinja2 entrypoints defusedxml mistune pandocfilters webencodings bleach markupsafe pygments ptyprocess pexpect terminado prometheus_client Send2Trash parso wurlitzer zipp importlib_metadata typing_extensions jsonschema attrs pyrsistent six rfc3986 urllib3 idna chardet certifi requests tornado pyzmq jupyter_client ipykernel nbclient nest_asyncio async_generator argon2_cffi notebook_shim terminado prometheus_client websocket-client sniffio anyio trio outcome sortedcontainers async-timeout aiohttp yarl multidict frozenlist charset_normalizer httpx rfc3339_validator isoduration uri_template jsonschema_spec_names referencing referencing._core referencing._core._core referencing._core._exceptions referencing._jsonschema referencing.exceptions requests_toolbelt packaging pyparsing pydantic_core pydantic typing_inspect annotated_types pydantic_extra_types email_validator python_dateutil python_multipart fastapi starlette uvicorn watchfiles click h11 httptools python-dotenv uvloop watchfiles websockets rich typer shellingham colorama distro sniffio anyio httpcore certifi h11 httpcore httpie httpx itsdangerous werkzeug flask flask_cors openapi_schema_pydantic openapi_spec_validator prance apispec marshmallow apispec-webframeworks connexion swagger-ui-bundle openapi-spec-validator jsonschema-specifications jsonpointer jsonref jsonpatch jsonmerge fastjsonschema openapi-schema-validator openapi-core openapi-spec-validator prance apispec connexion flask-restx flask-restful eve apistar hug falcon bottle pyramid morepath tornado sanic responder starlette fastapi blacksheep quart aiohttp vibora japronto responder muffin bocadillo sanic-testing quart-trio hypercorn uvicorn daphne gunicorn cherrypy waitress paste werkzeug meinheld bjoern eventlet gevent twisted diesel meinheld-gunicorn meinheld-supervisor gunicorn meinheld meinheld-gunicorn meinheld-supervisor gunicorn-eventlet gunicorn-gevent gunicorn-tornado gunicorn-twisted gunicorn-meinheld gunicorn-bjoern gunicorn-paste gunicorn-werkzeug uwsgi mod_wsgi passenger phusion-passenger circus honcho supervisor supervisord supervisor-stdout circus-web flower huey rq dramatiq celery redis kombu vine billiard amqp pika py-amqplib redis-py-cluster redis hiredis aioredis rediscluster aredis walrus redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redislite fakeredis mockredispy redis-py-asyncio redis-py-cluster-asyncio aredis-asyncio walrus-asyncio aioredlock aiorwlock aiohttp-client-cache aiodataloader aiocache aioredis-py aioredis-cluster-py asyncio-redis asyncio-redis-cluster asyncio-redlock asyncio-rwlock asyncio-cache asyncio-dataloader asyncio-memcache asyncio-mongo asyncio-mysql asyncio-postgresql asyncio-sqlalchemy asyncio-sqlite asyncio-tinydb asyncio-zookeeper django-redis django-redis-cache django-redissessions django-redistemplate django-rq django-celery-results django-celery-beat flower django-q huey django-huey dramatiq-django rq-django celery-django redis-django channels-rediskombu channels-rediskombu-py channels-rediskombu-py3 channels-rediskombu-py36 channels-rediskombu-py37 channels-rediskombu-py38 channels-rediskombu-py39 channels-rediskombu-py310 channels_redis channels_redis.core channels_redis.pubsub channels_redis.client aioredisc kombu redis celery[redis] flower huey rq dramatiq[redis] celery[redis] flower huey rq dramatiq[redis] celery[redis] flower huey rq dramatiq[redis] celery[redis] flower huey rq dramatiq[redis] celery[redis] flower huey rq dramatiq[redis] celery[redis] flower huey rq dramatiq[redis] celery[redis] flower hueeey rqq dramatiqq[[reiddiss]] celerry[[reiddiss]] flowwer huueey rrqq draamattiiqq[[reiddiss]] celerry[[reiddiss]] flowwer huueey rrqq draamattiiqq[[reiddiss]] celerry[[reiddiss]] flowwer huueey rrqq draamattiiqq[[reiddiss]] celerry[[reiddiss]] flowwer huueey rrqq draamattiiqq[[reiddiss]] celerry[[reiddiss]] flowwer huueey rrqq draamattiiqq[[reiddiss]] celerry[[reiddiss]] flowwer huueey rrqq draamattiiqq[[reiddiss]]
jupyter notebook --port=8888 --no-browser --ip=0.0.0.0 --allow-root --NotebookApp.token='' --NotebookApp.password=''

然后在Notebook中运行以下代码测试BERT:

代码片段

%%timeit -n1 -r1 

from transformers import BertTokenizer, BertModel 
import torch 

device = "mps" if torch.backends.mps.is_available() else "cpu" 
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') 
model = BertModel.from_pretrained('bert-base-uncased').to(device) 

texts = ["This is a test sentence.", "Another example text for BERT."]
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True).to(device) 

with torch.no_grad(): 
    outputs = model(**inputs) 

print(f"Embedding shape: {outputs.last_hidden_state.shape}")

BERT应用示例：文本分类任务

让我们实现一个简单的文本分类任务来展示BERT的实际应用：

代码片段

from transformers import BertForSequenceClassification, AdamW, get_linear_schedule_with_warmup
from sklearn.model_selection import train_test_split 
import pandas as pd 
import numpy as np 
import torch.nn.functional as F 
import torch 

# Step1:准备示例数据 (情感分析数据集示例)
data = {
    'text': [
        'I love this movie!', 'This product is terrible.', 
        'The service was excellent.', 'Would not recommend this hotel.',
        'Great experience overall.', 'Worst purchase ever.'
    ],
    'label': [1,0,1,0,1,0]
}
df = pd.DataFrame(data)

# Step2:分割训练集和测试集 
train_texts, val_texts, train_labels, val_labels = train_test_split(
    df['text'].tolist(), df['label'].tolist(), test_size=0.2)

# Step3:加载分词器和编码数据 
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

train_encodings = tokenizer(train_texts, truncation=True, padding=True)
val_encodings = tokenizer(val_texts, truncation=True, padding=True)

class TextDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings 
        self.labels = labels 

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    def __len__(self):
        return len(self.labels)

train_dataset = TextDataset(train_encodings, train_labels)
val_dataset = TextDataset(val_encodings, val_labels)

# Step4:加载预训练模型并微调 (二分类任务)
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2).to(device)

optimizer = AdamW(model.parameters(), lr=5e-5)
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size=16, shuffle=True)
val_loader=torch.utils.data.DataLoader(val_dataset,batch_size=16)

for epoch in range(3): #通常需要更长时间训练实际任务中!
    model.train()
    total_loss=0

    for batch in train_loader:
        optimizer.zero_grad()

        input_ids=batch['input_ids'].to(device)
        attention_mask=batch['attention_mask'].to(device)
        labels=batch['labels'].to(device)  

        outputs=model(input_ids,
                     attention_mask=attention_mask,
                     labels=labels)  

        loss=outputs.loss  
        total_loss+=loss.item()

        loss.backward()  
        optimizer.step()  

    avg_train_loss=total_loss/len(train_loader)  

    #验证步骤  
    model.eval()
    total_eval_loss,eval_steps=0.,0

    with torch.no_grad():  
       for batch in val_loader:
           input_ids=batch['input_ids'].to(device)  
           attention_mask=batch['attention_mask'].to(device)  
           labels=batch['labels'].to(device)  

           outputs=model(input_ids,
                        attention_mask=attention_mask,
                        labels=labels)  

           loss=outputs.loss  
           total_eval_loss+=loss.item()  
           eval_steps+=1  

       avg_val_loss=(total_eval_loss/eval_steps)  

       print(f'Epoch:{epoch},Train Loss:{avg_train_loss:.4f},Val Loss:{avg_val_loss:.4f}')  

print("Training complete!")

TensorFlow替代方案 (可选)

如果你更喜欢使用TensorFlow而不是PyTorch:

代码片段

pip uninstall torch -y   #移除PyTorch以节省空间 (可选步骤)
pip install tensorflow-macos tensorflow-metal transformers[tf-cpu]

然后使用以下代码:

“`python
from transformers import TFBertForSequenceClassification,BertTokenizerFast
import tensorflow as tf

tokenizer=BertTokenizerFast.from_pretrained(‘bert-base-uncased’)

def encodeexamples(texts,lables,maxlength=maxlength):
inputidslist=[]
tokentypeidslist=[]
attentionmasklist=[]
label_list=[]

for i,(text,lable) in