python解析xml库lxml-XPath 作者:马育民 • 2020-02-17 16:12 • 阅读:10057 需要掌握: [XPath](https://www.malaoshi.top/show_1EF504RvgguN.html "XPath") [python解析xml库lxml](https://www.malaoshi.top/show_1EF504JL0LdB.html "python解析xml库lxml") ### xml内容 ``` 该文档是测试数据 文章摘要 文章正文 python入门到放弃摘要 python入门到放弃正文 ``` # 查找节点 ### 获取所有x节点,不论任何位置 获取所有`abstract`节点,不论任何位置: ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") article=tree.xpath("//abstract") print(article,len(article)) for item in article: print(item.tag,item.text) ``` 执行结果: ``` abstract 该文档是测试数据 abstract 文章摘要 abstract python入门到放弃摘要 ``` ### 获取指定位置的所有x节点 获取 `/articles/article/`路径下的`abstract`节点: ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") article=tree.xpath("/articles/article/abstract") print(article,len(article)) for item in article: print(item.tag,item.text) ``` 执行结果: ``` abstract 文章摘要 abstract python入门到放弃摘要 ``` ### 获取所有节点 (很少这么用) ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") article=tree.xpath("//*") print(article,len(article)) for item in article: print(item.tag,item.text) ``` ### 根据属性值获取节点 获取author是灭霸的节点: ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") article=tree.xpath("/articles/article[@author='灭霸']") print(article,len(article)) for item in article: print(item.tag,item.get("title")) ``` # 属性 ### 获取指定属性的值 获取`/articles/article`标签的`title`属性值 ``` tree=etree.parse("/Users/mym/Desktop/python/2.xml") article=tree.xpath("/articles/article/@title") print(article,len(article)) for item in article: print(item) ``` 执行结果: ``` ['解析xml', 'python入门到放弃'] 2 解析xml python入门到放弃 ``` # 标签文本 ### 获取指定标签的文本内容 获取所有`abstract`标签的文本 ``` from lxml import etree tree=etree.parse("/Users/mym/Desktop/python/2.xml") article=tree.xpath("//abstract/text()") print(article,len(article)) for item in article: print(item) ``` 执行结果: ``` ['该文档是测试数据', '文章摘要', 'python入门到放弃摘要'] 3 该文档是测试数据 文章摘要 python入门到放弃摘要 ``` 感谢: https://www.cnblogs.com/zhangxinqi/p/9210211.html 原文出处:http://malaoshi.top/show_1EF505k06HSS.html