正则表达式

正则表达式是一种用来匹配和操作字符串的强有力的工具。通过定义一些模式来描述一个字符串，正则表达式可以快速地搜索、替换、验证、和提取字符串中的数据。在编程开发领域，正则表达式是一个非常重要的概念。在本文中，我们将介绍正则表达式的基础知识，并给出一些实例。

基础语法

正则表达式是由一些特定的字符组成。下面是正则表达式的一些基础语法。

1. 匹配单个字符

.：匹配任意字符，除了换行符。
\w：匹配一个字母或数字或下划线或汉字。
\s：匹配任意的空白符，包括空格、制表符、换行符、回车符。
\d：匹配数字。
[]：匹配括号里面的任意一个字符。
[^]：匹配不在括号里面的任意一个字符。

示例代码：

import re

pattern = r'hello...'

match_obj1 = re.match(pattern, 'hello world')
match_obj2 = re.match(pattern, 'hello xyz')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

pattern = r'\d+'

match_obj = re.match(pattern, '2021 is a new year')
if match_obj:
    print(match_obj.group())
else:
    print('match failed')

pattern = r'[aeiou]'

match_obj1 = re.search(pattern, 'hello')
match_obj2 = re.search(pattern, 'world')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

pattern = r'[^aeiou]'

match_obj1 = re.search(pattern, 'hello')
match_obj2 = re.search(pattern, 'world')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

输出：

hello w
hello x
2021
e
w
h
l

2. 匹配多个字符

*：匹配前面的字符零次或多次。
+：匹配前面的字符一次或多次。
?：匹配前面的字符零次或一次。
{n}：匹配前面的字符恰好n次。
{n,}：匹配前面的字符至少n次。
{n,m}：匹配前面的字符至少n次，但不超过m次。

示例代码：

import re

pattern = r'ab*'

match_obj1 = re.search(pattern, 'a')
match_obj2 = re.search(pattern, 'ab')
match_obj3 = re.search(pattern, 'abbb')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

if match_obj3:
    print(match_obj3.group())
else:
    print('match failed')

pattern = r'ab+'

match_obj1 = re.search(pattern, 'a')
match_obj2 = re.search(pattern, 'ab')
match_obj3 = re.search(pattern, 'abbb')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

if match_obj3:
    print(match_obj3.group())
else:
    print('match failed')

pattern = r'ab?'

match_obj1 = re.search(pattern, 'a')
match_obj2 = re.search(pattern, 'ab')
match_obj3 = re.search(pattern, 'abb')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

if match_obj3:
    print(match_obj3.group())
else:
    print('match failed')

pattern = r'ab{3}'

match_obj1 = re.search(pattern, 'ab')
match_obj2 = re.search(pattern, 'abb')
match_obj3 = re.search(pattern, 'abbb')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

if match_obj3:
    print(match_obj3.group())
else:
    print('match failed')

pattern = r'ab{2,}'

match_obj1 = re.search(pattern, 'ab')
match_obj2 = re.search(pattern, 'abb')
match_obj3 = re.search(pattern, 'abbb')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

if match_obj3:
    print(match_obj3.group())
else:
    print('match failed')

pattern = r'ab{1,3}'

match_obj1 = re.search(pattern, 'ab')
match_obj2 = re.search(pattern, 'abb')
match_obj3 = re.search(pattern, 'abbb')
match_obj4 = re.search(pattern, 'abbbb')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

if match_obj3:
    print(match_obj3.group())
else:
    print('match failed')

if match_obj4:
    print(match_obj4.group())
else:
    print('match failed')

输出：

a
ab
abbb
ab
ab
a
abbb
match failed
ab
abb
abbb
match failed

高级应用

正则表达式还有很多高级特性，可以帮助我们更好地进行字符串操作。下面我们将介绍一些常用的高级应用。

1. 分组和引用

正则表达式支持通过括号来进行分组。我们可以在正则表达式中用(和)进行分组操作。分组可以让我们更加灵活地进行匹配和替换操作。

我们还可以用\1、\2、\3来引用前面的分组，这在替换操作中非常有用。\1代表第一个分组，\2代表第二个分组，以此类推。

示例代码：

import re

pattern = r'(good)\s(day)'

match_obj = re.search(pattern, 'Have a good day!')

if match_obj:
    print(match_obj.group(0))
    print(match_obj.group(1))
    print(match_obj.group(2))
else:
    print('match failed')

pattern = r'(\w+)\s\1'

match_obj1 = re.search(pattern, 'hello hello')
match_obj2 = re.search(pattern, 'world world')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

输出：

good day
good
day
hello hello
match failed

2. 零宽断言

零宽断言是一种高级正则表达式特性，它用来匹配一个位置，而不是匹配一个实际的字符。零宽断言在匹配一些复杂的字符串时非常有用。

(?=pattern)：正向肯定先行断言，用来指定必须满足pattern的条件。
(?!pattern)：正向否定先行断言，用来指定必须不满足pattern的条件。
(?<=pattern)：反向肯定后行断言，用来指定前面必须满足pattern的条件。
(?<!pattern)：反向否定后行断言，用来指定前面必须不满足pattern的条件。

示例代码：

import re

pattern = r'\bapple\b'

match_obj1 = re.search(pattern, 'I like apple juice')
match_obj2 = re.search(pattern, 'I like pineapple juice')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

pattern = r'(?<=go )\w+'

match_obj1 = re.search(pattern, 'let\'s go fishing')
match_obj2 = re.search(pattern, 'let\'s go camping')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

pattern = r'\w+(?<!un)happy'

match_obj1 = re.search(pattern, 'I\'m so happy')
match_obj2 = re.search(pattern, 'I\'m unhappy')

if match_obj1:
    print(match_obj1.group())
else:
    print('match failed')

if match_obj2:
    print(match_obj2.group())
else:
    print('match failed')

输出：

apple
match failed
fishing
match failed
happy
match failed

实战案例

正则表达式在实际开发中非常常见，下面我们将用一个实际的案例来说明正则表达式的应用。

1. 匹配手机号码

匹配手机号码是很多应用场景中需要用到的功能。下面是一个示例的正则表达式，它可以匹配中国大陆的手机号码。

import re

pattern = r'^1[3-9]\d{9}$'

match_obj1 = re.match(pattern, '13912345678')
match_obj2 = re.match(pattern, '19876543210')
match_obj3 = re.match(pattern, '01234567890')

if match_obj1:
    print('match success')
else:
    print('match failed')

if match_obj2:
    print('match success')
else:
    print('match failed')

if match_obj3:
    print('match success')
else:
    print('match failed')

输出：

match success
match success
match failed

2. 替换字符串

正则表达式还可以用来进行字符串的替换操作。下面是一个示例的代码，它可以把一个字符串中的<img />标签替换为<a href="">链接</a>。

import re

string = '这是一个<img src="https://www.example.com/image.png" />标签'

pattern = r'<img\s+src="(.+?)"\s*/>'

result = re.sub(pattern, r'<a href="\1">链接</a>', string)

print(result)

输出：

这是一个<a href="https://www.example.com/image.png">链接</a>标签

结论

正则表达式是一个非常强大的工具，它可以用来对字符串进行搜索、替换、验证、提取等操作。本文介绍了正则表达式的基础知识和一些高级特性，并通过实例代码演示了正则表达式在实际开发中的应用。希望读者掌握正则表达式的基础知识，并学会如何用正则表达式来处理字符串。

正则表达式

正则表达式

基础语法

1. 匹配单个字符

2. 匹配多个字符

高级应用

1. 分组和引用

2. 零宽断言

实战案例

1. 匹配手机号码

2. 替换字符串

结论

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程

正则表达式教程

回顶部