python解析xml库lxml 作者:马育民 • 2020-02-17 11:18 • 阅读:10445 # 概述 lxml是python的一个解析库,解析HTML和XML,支持 **XPath解析方式**,底层由C语言实现,解析效率非常高 官网: https://lxml.de/ 官方api: https://lxml.de/api/index.html # 安装 ``` pip3 install lxml ``` 测试: ``` from lxml import etree ``` # 解析xml文件 ### 初识 xml文件内容: ``` 李雷 ``` python代码 ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") print(type(tree)) #将xml内容转成bytes返回 content=etree.tostring(tree,encoding="utf-8") print(content) ``` 执行结果: ``` b'\xe6\x9d\x8e\xe9\x9b\xb7' ``` **注意:** `type(tree)`返回类型是`lxml.etree._ElementTree` ### 获取根节点 ``` root = tree.getroot() print(type(root)) ``` 执行结果: ``` ``` **注意:** `type(root)`返回类型是`lxml.etree._Element` ### 获取标签名 ``` print("标签名:",root.tag) ``` 执行结果: ``` 标签名: student ``` ### 获取标签文本 ``` print("标签内容:",root.text) ``` 执行结果: ``` 标签内容: 李雷 ``` ### 获取标签属性 ``` print("age属性:",root.attrib["age"]) print("sex属性:",root.get("sex")) ``` 执行结果: ``` age属性: 20 sex属性: 男 ``` ### 遍历 通过`for in`进行遍历 xml内容如下: ``` 李雷 韩梅梅 ``` python代码: ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") root = tree.getroot() for item in root: print("标签名:",item.tag) print(" 姓名:",item.text) print(" age:",item.attrib["age"]) print(" sex:",item.get("sex")) ``` 执行结果: ``` 标签名: student 姓名: 李雷 age: 20 asex: 男 标签名: student 姓名: 韩梅梅 age: 19 sex: 女 ``` # 解析xml字符串 ### 方法一 使用`etree.XML()` 方法 ``` from lxml import etree xml="""李雷 """ tree=etree.XML(xml) print(type(tree)) # 转成bytes content=etree.tostring(tree,encoding="utf-8") print(content) ``` 执行结果: ``` b'\xe6\x9d\x8e\xe9\x9b\xb7' ``` **注意:** `type(tree)`返回类型是`lxml.etree._Element` ### 方法二 使用`etree.fromstring()` 方法 ``` from lxml import etree xml="""李雷 """ tree=etree.fromstring(xml) print(type(tree)) content=etree.tostring(tree,encoding="utf-8") print(content) ``` 执行结果: ``` b'\xe6\x9d\x8e\xe9\x9b\xb7' ``` **注意:** `type(tree)`返回类型是`lxml.etree._Element` ### 解析带有编码的xml字符串 ``` from lxml import etree # 注意此处有encoding xml=""" 李雷 """ tree=etree.XML(xml) ``` 会报错: ``` ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration. ``` **解决:** 带有encoding编码的xml字符串,解析时,需要先转成bytes ``` from lxml import etree xml=""" 李雷 """ # 需要先转成bytes xml=xml.encode("utf-8") tree=etree.XML(xml) print(type(tree)) ``` 感谢: https://www.cnblogs.com/zhangxinqi/p/9210211.html#_label1 https://blog.csdn.net/oXiangDuiNiShuo1/article/details/51864503 原文出处:http://malaoshi.top/show_1EF504JL0LdB.html