使用csv模块在Pandas中读取数据
所谓CSV(Comma Separated Values)格式是电子表格和数据库最常见的导入和导出格式。在CSV标准化之前,有各种格式的CSV。由于缺乏一个明确的标准,意味着不同的应用程序产生和消费的数据往往存在细微的差异。这些差异会使处理来自多个来源的CSV文件变得很烦人。为此,我们将使用Python的csv库来读取和写入CSV格式的表格数据。
代码#1:我们将使用csv.DictReader()函数将数据文件导入到Python的环境中。
# importing the csv module
import csv
# Now let's read the file named 'auto-mpg.csv'
# After reading as a dictionary convert
# it to Python's list
with open('auto-mpg.csv') as csvfile:
mpg_data = list(csv.DictReader(csvfile))
# Let's visualize the data
# We are printing only first three elements
print(mpg_data[:3])
输出 :
我们可以看到,数据被存储为一个有序的字典列表。为了更好地理解,让我们对数据进行一些操作。
代码 #2:
# Let's find all the keys in the dictionary
print(mpg_data[0].keys)
# Now we would like to find out the number of
# unique values of cylinders in the car in our dataset
# We create a set containing the cylinders value
unique_cyl = set(data['cylinders'] for data in mpg_data)
# Let's print the values
print(unique_cyl)
输出 :
正如我们在输出中看到的,我们的数据集中有5个独特的气缸值。
代码#3:现在让我们找出每个汽缸值的平均mpg值。
# Let's create an empty list to store the values
# of average mpg for each cylinder
avg_mpg = []
# c is the current cylinder size
for c in unique_cyl:
# for storing the sum of mpg
mpgbycyl = 0
# for storing the sum of cylinder
# in each category
cylcount = 0
# iterate over all the data in mpg
for x in mpg_data:
# Check if current value matches c
if x['cylinders']== c:
# Add the mpg values for c
mpgbycyl += float(x['mpg'])
# increment the count of cylinder
cylcount += 1
# Find the average mpg for size c
avg = mpgbycyl/cylcount
# Append the average mpg to list
avg_mpg.append((c, avg))
# Sort the list
avg_mpg.sort(key = lambda x : x[0])
# Print the list
print(avg_mpg)
输出 :
正如我们在输出中所看到的,该程序成功地返回了一个包含我们数据集中每个独特气缸类型的平均mpg的图元列表。