macOS Sonoma本地开发环境搭建:从零开始玩转Ollama+LangChain

云信安装大师
90
AI 质量分
11 5 月, 2025
6 分钟阅读
0 阅读

macOS Sonoma本地开发环境搭建:从零开始玩转Ollama+LangChain

引言

在人工智能快速发展的今天,本地运行大语言模型(Local LLM)变得越来越流行。本文将带你从零开始在macOS Sonoma系统上搭建Ollama+LangChain开发环境,让你能够在本机运行各种开源大模型,并构建AI应用。

准备工作

系统要求

  • macOS Sonoma (14.0+) 系统
  • 至少16GB内存(运行7B以上模型推荐32GB)
  • M1/M2芯片的Mac(Intel芯片也可但性能较差)
  • 至少20GB可用磁盘空间

前置知识

  • 基本的终端命令行操作
  • Python基础语法(本文使用Python 3.9+)

第一步:安装Homebrew

Homebrew是macOS上最受欢迎的包管理器,我们将用它来安装其他依赖。

代码片段
# 安装Homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# 将Homebrew添加到PATH(根据提示操作)
echo 'eval "$(/opt/homebrew/bin/brew shellenv)"' >> ~/.zshrc
source ~/.zshrc

# 验证安装
brew --version

注意事项
1. 如果遇到权限问题,可以尝试在前面加上sudo
2. 安装完成后可能需要重启终端

第二步:安装Python环境

推荐使用pyenv管理Python版本:

代码片段
# 安装pyenv
brew install pyenv

# 初始化pyenv(添加到shell配置)
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.zshrc
echo 'command -v pyenv >/dev/null || export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.zshrc
echo 'eval "$(pyenv init -)"' >> ~/.zshrc
source ~/.zshrc

# 安装Python 3.9.13(推荐版本)
pyenv install 3.9.13

# 设置为全局Python版本
pyenv global 3.9.13

# 验证安装
python --version
pip --version

第三步:安装Ollama

Ollama是一个强大的本地大模型运行框架:

代码片段
# 下载并安装Ollama(官方推荐方式)
curl -fsSL https://ollama.com/install.sh | sh

# Ollama服务会自动启动,验证是否运行
ollama --version

# Ollama常用命令:
ollama list      # 查看已下载模型
ollama pull llama2 # 下载llama2模型(7B版本约4GB)
ollama run llama2 # 运行模型进行对话测试

实践经验
1. M1/M2芯片的Mac运行Ollama性能非常好,7B模型响应速度接近实时
2. Intel芯片可能需要更长时间加载和响应
3. ollama pull支持多种模型:mistralcodellama

第四步:创建Python虚拟环境并安装LangChain

代码片段
# 创建项目目录并进入
mkdir llm-projects && cd llm-projects

# 创建虚拟环境(推荐venv)
python -m venv venv

# 激活虚拟环境(每次新开终端都需要执行)
source venv/bin/activate

# pip升级到最新版并安装依赖包
pip install --upgrade pip setuptools wheel langchain python-dotenv ollama openai tiktoken beautifulsoup4 faiss-cpu sentence-transformers numpy pandas matplotlib ipython jupyterlab pydantic typing-extensions httpx anyio httpcore websockets starlette uvicorn fastapi sse-starlette pydantic_core jsonpatch aiohttp aiosignal attrs frozenlist multidict yarl async-timeout asyncio nest-asyncio tenacity tqdm requests urllib3 charset-normalizer idna certifi PyYAML packaging filelock huggingface-hub safetensors tokenizers transformers accelerate bitsandbytes protobuf grpcio tensorboard tensorboard-data-server markdown werkzeug cachetools google-auth absl-py opt-einsum astunparse flatbuffers gast google-pasta keras termcolor clang ml-dtypes libclang scipy sympy networkx tensorflow-macos tensorflow-metal psutil humanize sentencepiece ninja cmake torch torchvision torchaudio xformers einops scikit-learn nltk spacy gensim transformers sentence-transformers faiss-cpu chromadb pymilvus weaviate-client pinecone-client qdrant-client redis elasticsearch elastic-transport elasticsearch-dsl sqlalchemy psycopg2-binary mysqlclient pymongo neo4j py2neo redisgraph pyodbc duckdb duckdb-engine sqlite3 lxml html5lib feedparser newspaper3k readability-lxml trafilatura unstructured unstructured-inference unstructured-pytesseract unstructured-pdfminer unstructured-docx unstructured-epub unstructured-pptx unstructured-rtf unstructured-odt unstructured-markdown unstructured-csv unstructured-excel pillow pytesseract pdfminer.six pdfplumber pypdf pypdfium2 docx2txt epub python-magic python-magic-bin filetype olefile pefile python-pptx python-docx odfpy odfdo openpyxl xlrd xlwt xlsxwriter tabulate pandasql sqlparse duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis elasticsearch elastic-transport elasticsearch-dsl sqlparse pandasql duckdb-engine sqlalchemy duckdb psycopg2-binary mysqlclient pymongo neo4j pyodbc redisgraph py2neo redis 

精简版依赖(如果上面命令太长):

代码片段
pip install langchain python-dotenv ollama openai tiktoken beautifulsoup4 faiss-cpu sentence-transformers numpy pandas ipython jupyterlab fastapi uvicorn transformers torch sentencepiece einops scikit-learn nltk spacy gensim chromadb pillow pytesseract pdfminer.six pdfplumber pypdf docx2txt epub python-magic python-pptx python-docx openpyxl tabulate requests urllib3 PyYAML packaging filelock huggingface-hub safetensors tokenizers accelerate bitsandbytes protobuf grpcio tensorboard markdown werkzeug cachetools google-auth absl-py opt-einsum astunparse flatbuffers gast google-pasta keras termcolor clang ml-dtypes libclang scipy sympy networkx tensorflow-macos tensorflow-metal psutil humanize ninja cmake torchvision torchaudio xformers 

LangChain与Ollama集成示例代码

创建一个main.py文件:

代码片段
from langchain_community.llms import Ollama
from langchain.prompts import ChatPromptTemplate, PromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate, AIMessagePromptTemplate, MessagesPlaceholder, ChatMessagePromptTemplate, load_prompt, load_prompt_from_config, load_prompt_from_file, load_prompt_from_json, load_prompt_from_yaml, load_prompt_from_dict, load_prompt_from_template_file, load_prompt_from_template_string, load_prompt_from_template_dict, load_prompt_from_template_json, load_prompt_from_template_yaml 
from langchain.chains import LLMChain, SimpleSequentialChain, SequentialChain, TransformChain, ConversationChain 
from langchain.memory import ConversationBufferMemory, ConversationBufferWindowMemory, ConversationTokenBufferMemory, ConversationSummaryMemory 
from dotenv import load_dotenv 
import os 

load_dotenv() 

def main():
    # Initialize Ollama LLM (默认使用本地11434端口)
    llm = Ollama(
        model="llama2",   # Ollama中已下载的模型名称

        # Optional parameters:
        temperature=0.7,
        top_k=50,
        top_p=0.9,
        repeat_penalty=1.1,
        num_ctx=2048,

        # Advanced options:
        base_url="http://localhost:11434",  
        verbose=True,
    )

    # Create a simple prompt template 
    prompt = PromptTemplate.from_template(
        "你是一个有用的AI助手。请用中文回答以下问题:\n\n问题: {question}\n答案:"
    )

    # Create a conversation chain with memory 
    memory = ConversationBufferMemory()

    conversation = LLMChain(
        llm=llm,
        prompt=prompt,
        memory=memory,
        verbose=True,
    )

    while True:
        question = input("\n你的问题 (输入 'quit'退出): ")

        if question.lower() == "quit":
            break

        response = conversation.run({"question": question})

        print("\nAI回答:")
        print(response)

if __name__ == "__main__":
    main()

LangChain高级功能示例:文档问答系统

创建一个document_qa.py文件:

代码片段
import os 
from langchain.document_loaders import TextLoader 
from langchain.text_splitter import RecursiveCharacterTextSplitter 
from langchain_community.vectorstores import FAISS 
from langchain_community.embeddings import OllamaEmbeddings 
from langchain.chains import RetrievalQA 

def document_qa_system():
    # Load document (replace with your own text file)
    loader = TextLoader("sample.txt")  
    documents = loader.load()

    # Split text into chunks 
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
    )

    texts = text_splitter.split_documents(documents)

    # Create embeddings and vector store 
    embeddings = OllamaEmbeddings(model="llama2")

    db = FAISS.from_documents(texts, embeddings)

    # Create QA chain 
    qa_chain = RetrievalQA.from_chain_type(
        llm=Ollama(model="llama2"),
        chain_type="stuff",

       retriever=db.as_retriever(search_kwargs={"k":3}),
       return_source_documents=True,
       verbose=True,
   )

   while True:
       query = input("\n请输入你的问题 (输入 'quit'退出): ")

       if query.lower() == "quit":
           break

       result = qa_chain({"query": query})

       print("\n答案:")
       print(result["result"])

       print("\n来源文档:")
       for doc in result["source_documents"]:
           print(f"\n{doc.page_content[:200]}...")
           print(f"来源: {doc.metadata['source']}")

if __name__ == "__main__":
   document_qa_system()

Jupyter Notebook快速体验

如果你想在Jupyter Notebook中快速体验:

代码片段
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root &

然后在浏览器打开 http://localhost:8888,创建一个新Notebook:

代码片段
from langchain_community.llms import Ollama 

llm = Ollama(model="mistral")  

response = llm("请用中文解释量子计算的基本概念")  

print(response)

GPU加速配置(M1/M2芯片)

如果你的Mac有M1/M2芯片,可以启用GPU加速:

代码片段
import torch 

if torch.backends.mps.is_available():
   device = torch.device("mps")
   print("✅ MPS (Metal Performance Shaders)可用!")
else:
   device = torch.device("cpu")
   print("❌ MPS不可用,将使用CPU")

model.to(device)  

在Ollama中也可以指定GPU层数:

代码片段
OLLAMA_NUM_GPU=50 ollama run llama2  

Docker方式运行Ollama(可选)

如果你更喜欢Docker方式:

代码片段
docker run -d -p11434:11434 --name ollamad \
--restart always \
-v ollamad:/root/.ollamad \
-v ollamad:/usr/share/ollamad \
ghcr.io/jmorganca/ollamad:latest  

docker logs -f ollamad  

然后修改代码中的base_url为 http://localhost:11434

Troubleshooting常见问题解决指南

问题 解决方案
ConnectionError 检查Ollamad是否正在运行: `ps aux grep ollamad`
CUDA out of memory 减小batch size或使用更小模型: model.to('cpu')
ModuleNotFoundError 检查虚拟环境是否激活: which pip, pip list | grep module-name
Slow performance on Intel Macs 考虑使用更小模型如TinyLlava或量化版本: ollamapull tinyllava:1b-q8bit

更多资源:
官方LangChain文档
Ollamad GitHub


通过这篇教程,你应该已经成功在macOS Sonoma上搭建了完整的本地LLM开发环境。现在你可以继续探索LangChained的各种功能,如Agent、Tools、Memory等高级特性。Happy coding! 🚀

原创 高质量