使用csv模块在Pandas中读取数据

所谓CSV（Comma Separated Values）格式是电子表格和数据库最常见的导入和导出格式。在CSV标准化之前，有各种格式的CSV。由于缺乏一个明确的标准，意味着不同的应用程序产生和消费的数据往往存在细微的差异。这些差异会使处理来自多个来源的CSV文件变得很烦人。为此，我们将使用Python的csv库来读取和写入CSV格式的表格数据。
代码#1：我们将使用csv.DictReader()函数将数据文件导入到Python的环境中。

# importing the csv module
import csv
 
# Now let's read the file named 'auto-mpg.csv'
# After reading as a dictionary convert
# it to Python's list
with open('auto-mpg.csv') as csvfile:
    mpg_data = list(csv.DictReader(csvfile))
 
# Let's visualize the data
# We are printing only first three elements
print(mpg_data[:3])

输出 :

使用csv模块在Pandas中读取数据

我们可以看到，数据被存储为一个有序的字典列表。为了更好地理解，让我们对数据进行一些操作。
代码 #2:

# Let's find all the keys in the dictionary
print(mpg_data[0].keys)
 
# Now we would like to find out the number of
# unique values of cylinders in the car in our dataset
# We create a set containing the cylinders value
unique_cyl = set(data['cylinders'] for data in mpg_data)
 
# Let's print the values
print(unique_cyl)

输出 :

使用csv模块在Pandas中读取数据

正如我们在输出中看到的，我们的数据集中有5个独特的气缸值。
代码#3：现在让我们找出每个汽缸值的平均mpg值。

# Let's create an empty list to store the values
# of average mpg for each cylinder
avg_mpg = []
 
# c is the current cylinder size
for c in unique_cyl:
    # for storing the sum of mpg
    mpgbycyl = 0
    # for storing the sum of cylinder
    # in each category
    cylcount = 0
 
    # iterate over all the data in mpg
    for x in mpg_data:
        # Check if current value matches c
        if x['cylinders']== c:
            # Add the mpg values for c
            mpgbycyl += float(x['mpg'])
            # increment the count of cylinder
            cylcount += 1
 
    # Find the average mpg for size c
    avg = mpgbycyl/cylcount
    # Append the average mpg to list
    avg_mpg.append((c, avg))
 
# Sort the list
avg_mpg.sort(key = lambda x : x[0])
 
# Print the list
print(avg_mpg)

输出 :

使用csv模块在Pandas中读取数据