Python中的字符串统计方法|极客教程

Python中的字符串统计方法

在Python中，字符串是一种序列类型数据，由字符组成。常见的字符串操作之一就是对字符串中某个子串出现的次数进行统计。本文将介绍Python中常用的字符串统计方法，包括count()方法、正则表达式、以及一些实际应用案例。

`count()`方法

Python中的字符串对象有一个内置的count()方法，可以用来统计指定子串在字符串中出现的次数。它的语法如下：

str.count(sub[, start[, end]])

其中，sub是要查找的子串，start和end是可选参数，表示查找子串的起始和结束位置。count()方法会返回子串在字符串中出现的次数。下面是一个示例：

s = "Hello, hello, hello"
print(s.count("hello")) # 输出 3

正则表达式

除了count()方法之外，Python还提供了re模块，可以使用正则表达式来进行更加灵活的字符串匹配和统计。正则表达式是一个强大的字符串匹配工具，可以匹配各种模式的字符串。下面是一个简单的示例：

import re

s = "Hello, hello, hello"
pattern = "hello"
count = len(re.findall(pattern, s, re.IGNORECASE))
print(count) # 输出 3

上面的示例中，通过findall()函数配合re.IGNORECASE参数实现了不区分大小写的字符串匹配。正则表达式在字符串处理中是非常有用的工具，可以灵活地满足各种需求。

实际应用

统计单词出现次数

在文本处理中，经常需要统计某个单词在文章中出现的次数。下面是一个示例，统计一段文字中每个单词出现的次数：

text = "Hello world, hello python, world is so beautiful"
words = text.split()
word_count = {}
for word in words:
    if word.lower() in word_count:
        word_count[word.lower()] += 1
    else:
        word_count[word.lower()] = 1

for word, count in word_count.items():
    print(f"{word}: {count}")

上面的代码将输出每个单词出现的次数，不区分大小写：

hello: 2
world,: 1
python,: 1
world: 1
is: 1
so: 1
beautiful: 1

统计邮件中不同单词出现的次数

在处理邮件内容时，经常需要统计不同单词出现的次数。下面是一个简单的示例：

import re

email = "Hello, thank you for contacting us. We will get back to you soon."
words = re.findall(r'\b\w+\b', email)
word_count = {}
for word in words:
    if word.lower() in word_count:
        word_count[word.lower()] += 1
    else:
        word_count[word.lower()] = 1

for word, count in word_count.items():
    print(f"{word}: {count}")