如何在Python中找到字符串中的最长重复子序列?
您可以使用defaultdict在输入字符串的每个位置开头对每个子字符串进行计数。 getsubs方法是一个生成器方法,每次调用它都会生成较小的子字符串。
阅读更多:Python 教程
示例
from collections import defaultdict
def getsubs(loc, s):
substr = s[loc:]
i = -1
while(substr):
yield substr
substr = s[loc:i]
i -= 1
def longestRepetitiveSubstring(r):
occ = defaultdict(int)
# 对所有子字符串的所有出现次数进行计数
for i in range(len(r)):
for sub in getsubs(i,r):
occ[sub] += 1
# 过滤掉出现少于2次的所有子字符串
filtered = [k for k,v in occ.items() if v >= 2]
if filtered:
maxkey = max(filtered, key=len) # 找到最长的字符串
return maxkey
else:
raise ValueError("no repetitions of any substring of '%s' with 2 or more occurrences" % (r))
longestRepetitiveSubstring("hellopeople18654randomtexthellopeoplefromallaroundthe world")
输出
这将会输出:
'hellopeople'
极客教程