正则表达式
正则表达式是一种用来匹配和操作字符串的强有力的工具。通过定义一些模式来描述一个字符串,正则表达式可以快速地搜索、替换、验证、和提取字符串中的数据。在编程开发领域,正则表达式是一个非常重要的概念。在本文中,我们将介绍正则表达式的基础知识,并给出一些实例。
基础语法
正则表达式是由一些特定的字符组成。下面是正则表达式的一些基础语法。
1. 匹配单个字符
.
:匹配任意字符,除了换行符。\w
:匹配一个字母或数字或下划线或汉字。\s
:匹配任意的空白符,包括空格、制表符、换行符、回车符。\d
:匹配数字。[]
:匹配括号里面的任意一个字符。[^]
:匹配不在括号里面的任意一个字符。
示例代码:
import re
pattern = r'hello...'
match_obj1 = re.match(pattern, 'hello world')
match_obj2 = re.match(pattern, 'hello xyz')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
pattern = r'\d+'
match_obj = re.match(pattern, '2021 is a new year')
if match_obj:
print(match_obj.group())
else:
print('match failed')
pattern = r'[aeiou]'
match_obj1 = re.search(pattern, 'hello')
match_obj2 = re.search(pattern, 'world')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
pattern = r'[^aeiou]'
match_obj1 = re.search(pattern, 'hello')
match_obj2 = re.search(pattern, 'world')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
输出:
hello w
hello x
2021
e
w
h
l
2. 匹配多个字符
*
:匹配前面的字符零次或多次。+
:匹配前面的字符一次或多次。?
:匹配前面的字符零次或一次。{n}
:匹配前面的字符恰好n次。{n,}
:匹配前面的字符至少n次。{n,m}
:匹配前面的字符至少n次,但不超过m次。
示例代码:
import re
pattern = r'ab*'
match_obj1 = re.search(pattern, 'a')
match_obj2 = re.search(pattern, 'ab')
match_obj3 = re.search(pattern, 'abbb')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
if match_obj3:
print(match_obj3.group())
else:
print('match failed')
pattern = r'ab+'
match_obj1 = re.search(pattern, 'a')
match_obj2 = re.search(pattern, 'ab')
match_obj3 = re.search(pattern, 'abbb')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
if match_obj3:
print(match_obj3.group())
else:
print('match failed')
pattern = r'ab?'
match_obj1 = re.search(pattern, 'a')
match_obj2 = re.search(pattern, 'ab')
match_obj3 = re.search(pattern, 'abb')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
if match_obj3:
print(match_obj3.group())
else:
print('match failed')
pattern = r'ab{3}'
match_obj1 = re.search(pattern, 'ab')
match_obj2 = re.search(pattern, 'abb')
match_obj3 = re.search(pattern, 'abbb')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
if match_obj3:
print(match_obj3.group())
else:
print('match failed')
pattern = r'ab{2,}'
match_obj1 = re.search(pattern, 'ab')
match_obj2 = re.search(pattern, 'abb')
match_obj3 = re.search(pattern, 'abbb')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
if match_obj3:
print(match_obj3.group())
else:
print('match failed')
pattern = r'ab{1,3}'
match_obj1 = re.search(pattern, 'ab')
match_obj2 = re.search(pattern, 'abb')
match_obj3 = re.search(pattern, 'abbb')
match_obj4 = re.search(pattern, 'abbbb')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
if match_obj3:
print(match_obj3.group())
else:
print('match failed')
if match_obj4:
print(match_obj4.group())
else:
print('match failed')
输出:
a
ab
abbb
ab
ab
a
abbb
match failed
ab
abb
abbb
match failed
高级应用
正则表达式还有很多高级特性,可以帮助我们更好地进行字符串操作。下面我们将介绍一些常用的高级应用。
1. 分组和引用
正则表达式支持通过括号来进行分组。我们可以在正则表达式中用(
和)
进行分组操作。分组可以让我们更加灵活地进行匹配和替换操作。
我们还可以用\1
、\2
、\3
来引用前面的分组,这在替换操作中非常有用。\1
代表第一个分组,\2
代表第二个分组,以此类推。
示例代码:
import re
pattern = r'(good)\s(day)'
match_obj = re.search(pattern, 'Have a good day!')
if match_obj:
print(match_obj.group(0))
print(match_obj.group(1))
print(match_obj.group(2))
else:
print('match failed')
pattern = r'(\w+)\s\1'
match_obj1 = re.search(pattern, 'hello hello')
match_obj2 = re.search(pattern, 'world world')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
输出:
good day
good
day
hello hello
match failed
2. 零宽断言
零宽断言是一种高级正则表达式特性,它用来匹配一个位置,而不是匹配一个实际的字符。零宽断言在匹配一些复杂的字符串时非常有用。
(?=pattern)
:正向肯定先行断言,用来指定必须满足pattern的条件。(?!pattern)
:正向否定先行断言,用来指定必须不满足pattern的条件。(?<=pattern)
:反向肯定后行断言,用来指定前面必须满足pattern的条件。(?<!pattern)
:反向否定后行断言,用来指定前面必须不满足pattern的条件。
示例代码:
import re
pattern = r'\bapple\b'
match_obj1 = re.search(pattern, 'I like apple juice')
match_obj2 = re.search(pattern, 'I like pineapple juice')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
pattern = r'(?<=go )\w+'
match_obj1 = re.search(pattern, 'let\'s go fishing')
match_obj2 = re.search(pattern, 'let\'s go camping')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
pattern = r'\w+(?<!un)happy'
match_obj1 = re.search(pattern, 'I\'m so happy')
match_obj2 = re.search(pattern, 'I\'m unhappy')
if match_obj1:
print(match_obj1.group())
else:
print('match failed')
if match_obj2:
print(match_obj2.group())
else:
print('match failed')
输出:
apple
match failed
fishing
match failed
happy
match failed
实战案例
正则表达式在实际开发中非常常见,下面我们将用一个实际的案例来说明正则表达式的应用。
1. 匹配手机号码
匹配手机号码是很多应用场景中需要用到的功能。下面是一个示例的正则表达式,它可以匹配中国大陆的手机号码。
import re
pattern = r'^1[3-9]\d{9}$'
match_obj1 = re.match(pattern, '13912345678')
match_obj2 = re.match(pattern, '19876543210')
match_obj3 = re.match(pattern, '01234567890')
if match_obj1:
print('match success')
else:
print('match failed')
if match_obj2:
print('match success')
else:
print('match failed')
if match_obj3:
print('match success')
else:
print('match failed')
输出:
match success
match success
match failed
2. 替换字符串
正则表达式还可以用来进行字符串的替换操作。下面是一个示例的代码,它可以把一个字符串中的<img />
标签替换为<a href="">链接</a>
。
import re
string = '这是一个<img src="https://www.example.com/image.png" />标签'
pattern = r'<img\s+src="(.+?)"\s*/>'
result = re.sub(pattern, r'<a href="\1">链接</a>', string)
print(result)
输出:
这是一个<a href="https://www.example.com/image.png">链接</a>标签
结论
正则表达式是一个非常强大的工具,它可以用来对字符串进行搜索、替换、验证、提取等操作。本文介绍了正则表达式的基础知识和一些高级特性,并通过实例代码演示了正则表达式在实际开发中的应用。希望读者掌握正则表达式的基础知识,并学会如何用正则表达式来处理字符串。