LlamaIndex 操作lancedb向量数据库 作者:马育民 • 2026-05-19 12:45 • 阅读:10002 # 介绍 操作lancedb向量数据库优点,适合开发桌面RAG: 1. 纯**本地文件存储**,无服务、无端口,打包EXE无压力 2. 读写速度优于Chroma,大批量文档更流畅 3. LlamaIndex官方原生适配,接口和Chroma几乎一致,迁移零成本 4. 知识库直接复制文件夹即可迁移,无需额外配置 # 安装lancedb 官方必需依赖(必须装这 2 个) - `lancedb` → 数据库本体 - `llama-index-vector-stores-lancedb` → LlamaIndex 官方适配器 ### pip 安装 ```bash pip install lancedb llama-index-vector-stores-lancedb ``` ### uv 安装 ```bash uv add lancedb llama-index-vector-stores-lancedb ``` ### 安装完验证 不报错 = 成功 ```python from llama_index.vector_stores.lancedb import LanceDBVectorStore print("✅ LanceDB 安装成功!兼容 LlamaIndex") ``` # 操作 LanceDB ### 目录结构 ``` 项目文件夹 ├── data/ # 存放所有待解析文档 ├── lance_db_knowledge/ # LanceDB向量数据目录 └── main.py # 运行代码 ``` ### 安装依赖 ```bash # uv uv add llama-index lancedb llama-index-vector-stores-lancedb # 文档解析全套 uv add llama-index-readers-file python-docx python-pptx openpyxl xlrd pymupdf beautifulsoup4 # pip pip install llama-index lancedb llama-index-vector-stores-lancedb pip install llama-index-readers-file python-docx python-pptx openpyxl xlrd pymupdf beautifulsoup4 ``` ### 完整代码示例 ```python from llama_index.core import ( SimpleDirectoryReader, StorageContext, VectorStoreIndex, Settings ) from llama_index.vector_stores.lancedb import LanceDBVectorStore from llama_index.embeddings.openai import OpenAIEmbedding from llama_index.llms.openai import OpenAI # ===================== 1. 全局配置 ===================== # 替换成自己模型密钥/地址,本地大模型同理切换 Settings.llm = OpenAI(model="gpt-3.5-turbo", api_key="你的key") Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small") # ===================== 2. 初始化LanceDB ===================== # 本地文件存储路径,自动创建目录 db_path = "./lance_db_knowledge" # 指定数据表名,方便区分知识库 vector_store = LanceDBVectorStore( uri=db_path, table_name="doc_knowledge" ) # 存储上下文绑定向量库 storage_context = StorageContext.from_defaults(vector_store=vector_store) # ===================== 3. 加载多格式文档 ===================== # 自动读取data下:doc/docx/ppt/pptx/xls/xlsx/txt/html/md/pdf documents = SimpleDirectoryReader(input_dir="./data").load_data() # ===================== 4. 构建向量索引入库 ===================== index = VectorStoreIndex.from_documents( documents, storage_context=storage_context, show_progress=True # 显示入库进度 ) # ===================== 5. 问答查询 ===================== query_engine = index.as_query_engine(similarity_top_k=5) # 提问 question = "总结这份文档的核心内容" response = query_engine.query(question) print("===== 回答结果 =====") print(response) # 打印引用来源 print("\n===== 引用文档片段 =====") for node in response.source_nodes: print(f"来源片段:{node.text[:200]}...") # ===================== 6. 后续直接加载已有库查询(无需重新入库) ===================== def load_exist_lancedb(): vec_store = LanceDBVectorStore(uri="./lance_db_knowledge", table_name="doc_knowledge") ctx = StorageContext.from_defaults(vector_store=vec_store) exist_index = VectorStoreIndex.from_vector_store(vec_store, storage_context=ctx) return exist_index.as_query_engine() # 调用示例 # old_engine = load_exist_lancedb() # print(old_engine.query("你的问题")) ``` # 进阶操作 ### 1. 清空当前知识库 ```python vector_store.delete_nodes(filter_dict={}) ``` ### 2. 按元数据过滤检索 ```python from llama_index.core.vector_stores import MetadataFilters, ExactMatchFilter filters = MetadataFilters( filters=[ExactMatchFilter(key="file_type", value="pdf")] ) query_engine = index.as_query_engine(filters=filters) ``` ### 3. 增量追加文档 直接往`./data`放入新文件,重新执行**加载文档+构建索引**即可自动追加,不会重复覆盖。 原文出处:http://malaoshi.top/show_1GW3L4uqXX2t.html