您现在的位置是：首页 > 文章详情

Python 正则表达式（regex）

日期：2018-06-18点击：588收藏

Python 正则表达式（regex）

正则表达式

正则表达式是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符、及这些特定字符的组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑

正则表达式非Python独有，在Python中使用re模块实现

常见匹配模式

模式 描述 \w 匹配数字、字母、下划线 \W 匹配非数字、字母、下划线 \s 匹配任意空白字符，等价于[\t\n\r\f] \S 匹配任意非空字符 \d 匹配任意数字，等价于[0-9] \D 匹配任意非数字 \A 匹配字符串开始 \Z 匹配字符串结束，如果是存在换行，只匹配到换行前的结束字符串 \z 匹配字符串结束 \G 匹配最后匹配完成的位置 \n 匹配一个换行符 \t 匹配一个制表符 ^ 匹配字符串的开头 $ 匹配字符串的末尾 . 匹配任意字符，除了换行符，当re.DOTALL标记被指定时，则可以匹配包括换行符的任意字符。 [...] 用来表示一组字符，单独列出：[abc]匹配"a","b"或"c" [^...] 不再[]中的字符：[^abc]匹配除了a,b,c之外的字符 * 匹配0个或多个的表达式 + 匹配1个或多个的表达式 ？ 匹配0个或1个由前面的正则表达式定义的片段，非贪婪模式 {n} 精确匹配n个前面表达式 {n,m} 匹配n到m次由前面的正则表达式定义的片段，贪婪模式 a|b 匹配a或b （） 匹配括号内的表达式，也表示一个组

re.match

re.match 尝试从字符串的起始位置匹配一个模式，如果不是起始位置匹配成功的话, match()就返回none

re.match(pattern,string,flags=0)

常规匹配

import re content = 'Hello 111 2222 World hello python' print(len(content)) res = re.match('^Hello\s\d\d\d\s\d{4}\s\w{5}\s.*python$', content) print(res) print(res.group()) print(res.span())

运行结果： 33 <_sre.SRE_Match object; span=(0, 21), match='Hello 111 2222 World '> Hello 111 2222 World (0, 21)

泛匹配

import re content = 'Hello 111 2222 World hello python' res = re.match('^Hello.*python$', content) print(res) print(res.group()) print(res.span())

运行结果： <_sre.SRE_Match object; span=(0, 33), match='Hello 111 2222 World hello python'> Hello 111 2222 World hello python (0, 33)

匹配目标

import re content = 'Hello 111 2222 World hello python' res = re.match('^Hello\s(\d+)\s(\d+)\s.*python$', content) print(res) print(res.group(1), res.group(2))

运行结果： <_sre.SRE_Match object; span=(0, 33), match='Hello 111 2222 World hello python'> 111 2222

贪婪模式

import re content = 'Hello 111 2222 World hello python' res = re.match('^H.*(\d+)\s(\d+).*python$', content) print(res) print(res.group(1), res.group(2))

运行结果： <_sre.SRE_Match object; span=(0, 33), match='Hello 111 2222 World hello python'> 1 2222

非贪婪模式

import re content = 'Hello 111222 World hello python' res = re.match('^He.*?(\d+).*?python$', content) print(res) print(res.group(1))

运行结果： <_sre.SRE_Match object; span=(0, 31), match='Hello 111222 World hello python'> 111222

匹配模式

模式 描述 re.I 匹配的字符忽略大小写 re.M 多行匹配 re.L 本地化识别匹配 re.U 根据Unicode进行相应化解析 re.S 让 . 匹配包括换行符

import re content = """Hello 1112222 World hello python""" res = re.match('^H.*?(\d+).*?python$', content, re.S) print(res) print(res.group(1))

运行结果： <_sre.SRE_Match object; span=(0, 43), match='Hello 1112222 World \n hello python'> 1112222

转义

import re content = """The apple's price is $5.00""" res = re.match('The apple\'s price is \$5.00', content, re.S) print(res) print(res.group())

<_sre.SRE_Match object; span=(0, 26), match="The apple's price is $5.00"> The apple's price is $5.00

总结：尽量使用泛匹配、使用括号得到匹配目标、尽量使用非贪婪模式、由换行符就用re.S

re.search

re.search 扫描整个字符串并返回第一个成功的匹配

# 使用re.match() import re content = """This is a string""" res = re.match('a', content, re.S) print(res)

运行结果： None

# 使用re.search() import re content = """This is a string""" res = re.search('a\s\w*', content, re.S) print(res) print(res.group())

运行结果： <_sre.SRE_Match object; span=(8, 16), match='a string'> a string

总结：为匹配方便，能用search就不用match

re.findall

搜索字符串，以列表形式返回全部能匹配的子串

import re content = """This is a string""" res = re.findall('a\s\w*', content, re.S) print(res)

运行结果： ['a string']

re.sub

替换字符串中每一个匹配的子串后返回替换后的字符串

import re content = """This is 222211111 string""" res = re.sub('\d+', 'a',content) print(res)

运行结果： This is a string

re.compile

将正则字符串编译成正则表达式对象

将一个正则表达式串编译成正则对象，以便于复用该匹配模式

import re content = """This is 222211111 string""" pattern = re.compile('\d+') res = re.search(pattern, content) print(res) print(res.group())

运行结果： <_sre.SRE_Match object; span=(8, 17), match='222211111'> 222211111

欢迎访问

个人博客地址：www.limiao.tech

原文链接：https://yq.aliyun.com/articles/651684

关注公众号

低调大师中文资讯倾力打造互联网数据资讯、行业资源、电子商务、移动互联网、网络营销平台。

持续更新报道IT业界、互联网、市场资讯、驱动更新,是最及时权威的产业资讯及硬件资讯报道平台。

转载内容版权归作者及来源网站所有，本站原创内容转载请注明来源。

Java分享

Python 正则表达式（regex）

Python 正则表达式（regex）

正则表达式

常见匹配模式

re.match

常规匹配

泛匹配

匹配目标

贪婪模式

非贪婪模式

匹配模式

转义

re.search

re.findall

re.sub

re.compile

个人博客地址：www.limiao.tech

Java并发整理

Python网络编程 —— IP、UDP

相关文章

文章评论

文章二维码

点击排行

推荐阅读

最新文章