从Pandas系列创建数据框架
系列是pandas中的一种列表类型,它可以接受整数值、字符串值、双倍值等等。但是在Pandas系列中,我们以列表的形式返回一个对象,其索引从0开始到n,其中n是系列中数值的长度。在这篇文章的后面,我们将讨论pandas中的数据框架,但我们首先需要了解Series和Dataframe的主要区别。Series只能包含带有索引的单个列表,而Dataframe可以由多个系列组成,或者我们可以说,Dataframe是一个系列的集合,可以用来分析数据。
代码#1: 创建一个简单的系列
#importing pandas library
import pandas as pd
#Creating a list
author = ['Jitender', 'Purnima', 'Arpit', 'Jyoti']
#Creating a Series by passing list variable to Series() function
auth_series = pd.Series(author)
#Printing Series
print(auth_series)
输出:
0 Jitender
1 Purnima
2 Arpit
3 Jyoti
dtype: object
让我们检查一下系列的类型:。
print(type(auth_series))
输出:
<class 'pandas.core.series.Series'>
代码#2: 从多个系列创建数据框架
#Importing Pandas library
import pandas as pd
#Creating two lists
author = ['Jitender', 'Purnima', 'Arpit', 'Jyoti']
article = [210, 211, 114, 178]
#Creating two Series by passing lists
auth_series = pd.Series(author)
article_series = pd.Series(article)
#Creating a dictionary by passing Series objects as values
frame = { 'Author': auth_series, 'Article': article_series }
#Creating DataFrame by passing Dictionary
result = pd.DataFrame(frame)
#Printing elements of Dataframe
print(result)
输出:
Author Article
0 Jitender 210
1 Purnima 211
2 Arpit 114
3 Jyoti 178
解释一下。我们已经创建了两个列表 “author “和 “article”,它们被传递给Series()函数来创建两个系列。在创建系列后,我们创建了一个字典,并将系列对象作为字典的值传递,字典的键将作为数据框架的列。
代码#3:如何向数据框架添加新的列
#Importing pandas library
import pandas as pd
#Creating Series
auth_series = pd.Series(['Jitender', 'Purnima', 'Arpit', 'Jyoti'])
article_series = pd.Series([210, 211, 114, 178])
#Creating Dictionary
frame = { 'Author': auth_series, 'Article': article_series }
#Creating Dataframe
result = pd.DataFrame(frame)
#Creating another list
age = [21, 21, 24, 23]
##Creating new column in the dataframe by providing s Series created using list
result['Age'] = pd.Series(age)
#Printing dataframe
print(result)
输出:
Author Article Age
0 Jitender 210 21
1 Purnima 211 21
2 Arpit 114 24
3 Jyoti 178 23
解释:我们在外部增加了一个系列,命名为作者的age,然后直接将这个系列添加到pandas数据框中。请记住一件事,如果任何数值缺失,那么默认情况下,它将被转换为NaN值,即默认为null。
代码#4: 数据帧中的缺失值
#Importing pandas library
import pandas as pd
#Creating Series
auth_series = pd.Series(['Jitender', 'Purnima', 'Arpit', 'Jyoti'])
article_series = pd.Series([210, 211, 114, 178])
#Creating Dictionary
frame = { 'Author': auth_series, 'Article': article_series }
#Creating Dataframe
result = pd.DataFrame(frame)
#Creating another list
age = [21, 21, 24]
##Creating new column in the dataframe by providing s Series created using list
result['Age'] = pd.Series(age)
#Printing dataframe
print(result)
输出:
Author Article Age
0 Jitender 210 21.0
1 Purnima 211 21.0
2 Arpit 114 23.0
3 Jyoti 178 NaN
代码#5:使用系列的字典创建数据框架
#Importing pandas library
import pandas as pd
#Creating dictionary of Series
dict1={'Auth_Name':pd.Series(['Jitender', 'Purnima', 'Arpit', 'Jyoti']),
'Author_Book_No': pd.Series([210, 211, 114, 178]),
'Age': pd.Series([21, 21, 24, 23]) }
#Creating Dataframe
df = pd.DataFrame(dict1)
#Printing dataframe
print(df)
输出:
Auth_Name Auth_Book_No Age
0 Jitender 210 21
1 Purnima 211 21
2 Arpit 114 24
3 Jyoti 178 23
解释。在这里,我们传递了一个已经用一个系列创建的字典作为值,然后传递这个字典来创建一个数据框架。我们可以看到,当使用字典创建一个数据框架时,字典的键将成为列,值将成为行。
代码#6:向数据框架添加明确的索引
#Importing pandas library
import pandas as pd
#Creating dictionary of Series
dict1={'Auth_Name':pd.Series(['Jitender', 'Purnima', 'Arpit', 'Jyoti']),
'Author_Book_No': pd.Series([210, 211, 114, 178]),
'Age': pd.Series([21, 21, 24, 23]) }
#Creating Dataframe
df = pd.DataFrame(dict1,index=['SNo1','SNo2','SNo3','SNo4'])
#Printing dataframe
print(df)
输出:
Auth_Name Author_Book_No Age
SNo1 NaN NaN NaN
SNo2 NaN NaN NaN
SNo3 NaN NaN NaN
SNo4 NaN NaN NaN
解释。这里我们可以看到,在明确提供了数据框架的索引后,它用NaN值填充了所有的数据,因为我们用系列创建了这个数据框架,而系列有自己的默认索引(0,1,2),这就是为什么当数据框架和系列的索引不匹配时,我们得到了所有的NaN值。我们可以通过为每个系列元素提供相同的索引值来纠正这个问题。让我们看看如何做到这一点。
#This code is provided by Sheetal Verma
#Importing pandas library
import pandas as pd
#Creating dictionary of Series
dict1={'Auth_Name':pd.Series(['Jitender', 'Purnima', 'Arpit', 'Jyoti'],index=['SNo1','SNo2','SNo3','SNo4']),
'Author_Book_No': pd.Series([210, 211, 114, 178],index=['SNo1','SNo2','SNo3','SNo4']),
'Age': pd.Series([21, 21, 24, 23],index=['SNo1','SNo2','SNo3','SNo4']) }
#Creating Dataframe
df = pd.DataFrame(dict1,index=['SNo1','SNo2','SNo3','SNo4'])
#Printing dataframe
print(df)
输出:
Auth_Name Author_Book_No Age
SNo1 Jitender 210 21
SNo2 Purnima 211 21
SNo3 Arpit 114 24
SNo4 Jyoti 178 23