Python 如何从一个 CSV 文件中删除重复项

在本文中，我们将介绍如何使用 Python 从一个 CSV 文件中删除重复项。CSV 文件是一种常见的数据存储格式，它使用逗号作为字段之间的分隔符。有时候，我们需要对一个 CSV 文件进行操作，删除其中重复的记录，以保证数据的准确性和一致性。

阅读更多：Python 教程

1. 读取 CSV 文件

首先，我们需要使用 Python 中的 csv 模块来读取 CSV 文件。该模块提供了 csv.reader 方法用于读取 CSV 文件中的行。

import csv

def read_csv_file(file_name):
    rows = []
    with open(file_name, 'r') as file:
        csv_reader = csv.reader(file)
        for row in csv_reader:
            rows.append(row)
    return rows

上述代码定义了一个 read_csv_file 函数，该函数接收一个文件名参数，并返回一个包含 CSV 文件中所有行的列表。

2. 去除重复项

接下来，我们需要遍历 CSV 文件中的行，并使用 Python 的集合数据类型 set 来去重。set 是一个无序且不重复的数据集合。

def remove_duplicates(rows):
    unique_rows = set(tuple(row) for row in rows)
    return [list(row) for row in unique_rows]

上述代码定义了一个 remove_duplicates 函数，该函数接收一个包含 CSV 文件行的列表，并返回一个去除重复项的新列表。我们使用 set 数据类型来存储唯一的行，然后将其转换为列表返回。

3. 保存至新 CSV 文件

最后，我们需要将去重后的行保存到一个新的 CSV 文件中。我们可以使用 Python 的 csv.writer 方法来写入 CSV 文件。

def write_csv_file(file_name, rows):
    with open(file_name, 'w', newline='') as file:
        csv_writer = csv.writer(file)
        for row in rows:
            csv_writer.writerow(row)

上述代码定义了一个 write_csv_file 函数，该函数接收一个文件名参数和需要写入的行列表，并将其写入到指定文件中。

示例

下面是一个使用示例，假设我们有一个名为 data.csv 的 CSV 文件，其内容如下：

Name,Age,Email
John,25,john@example.com
Alice,30,alice@example.com
John,25,john@example.com
Bob,35,bob@example.com

我们可以使用上述的函数来去除重复项并保存到新的 CSV 文件中。

def remove_duplicates_from_csv(input_file, output_file):
    rows = read_csv_file(input_file)
    unique_rows = remove_duplicates(rows)
    write_csv_file(output_file, unique_rows)

remove_duplicates_from_csv('data.csv', 'output.csv')