按列值分割Pandas数据框架
有时为了更准确地分析Dataframe,我们需要把它分成两部分或更多部分。Pandas提供了根据列索引、行索引和列值等来分割Dataframe的功能。
让我们看看如何在Python中通过列值拆分Pandas数据框架?
现在,让我们创建一个数据框架。
villiers
# importing pandas library
import pandas as pd
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
['A.B.D Villiers', 38, 74, 3428000],
['V.Kholi', 31, 70, 8428000],
['S.Smith', 34, 80, 4428000],
['C.Gayle', 40, 100, 4528000],
['J.Root', 33, 72, 7028000],
['K.Peterson', 42, 85, 2528000]]
# creating a pandas dataframe
df = pd.DataFrame(player_list,
columns = ['Name', 'Age',
'Weight', 'Salary'])
# show the dataframe
df
输出:
方法1:使用布尔掩码方法。
这个方法只用来打印数据框架中我们传递布尔值True的那一部分。
示例 1:
# importing pandas library
import pandas as pd
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
['A.B.D Villiers', 38, 74, 3428000],
['V.Kholi', 31, 70, 8428000],
['S.Smith', 34, 80, 4428000],
['C.Gayle', 40, 100, 4528000],
['J.Root', 33, 72, 7028000],
['K.Peterson', 42, 85, 2528000]]
# creating a pandas dataframe
df = pd.DataFrame(player_list,
columns = ['Name', 'Age',
'Weight', 'Salary'])
# splitting the dataframe into 2 parts
# on basis of 'Age' column values
# using Relational operator
df1 = df[df['Age'] >= 37]
# printing df1
df1
输出:
df2 = df[df['Age'] < 37]
# printing df2
df2
输出:
在上面的例子中,数据框 “df “根据 “年龄 “列的值被分成两部分 “df1 “和 “df2″。
示例 2:
# importing pandas library
import pandas as pd
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
['A.B.D Villiers', 38, 74, 3428000],
['V.Kholi', 31, 70, 8428000],
['S.Smith', 34, 80, 4428000],
['C.Gayle', 40, 100, 4528000],
['J.Root', 33, 72, 7028000],
['K.Peterson', 42, 85, 2528000]]
# creating a pandas dataframe
df = pd.DataFrame(player_list,
columns = ['Name', 'Age',
'Weight', 'Salary'])
# splitting the dataframe into 2 parts
# on basis of 'Weight' column values
mask = df['Weight'] >= 80
df1 = df[mask]
# invert the boolean values
df2 = df[~mask]
# printing df1
df1
输出:
# printing df2
df2
输出:
在上面的例子中,数据框 “df “根据列 “Weight “的值被分成两部分 “df1 “和 “df2″。
方法2:使用Dataframe.groupby()。
这种方法是用来根据一些标准将数据分成几组。
示例:
# importing pandas library
import pandas as pd
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
['A.B.D Villiers', 38, 74, 3428000],
['V.Kholi', 31, 70, 8428000],
['S.Smith', 34, 80, 4428000],
['C.Gayle', 40, 100, 4528000],
['J.Root', 33, 72, 7028000],
['K.Peterson', 42, 85, 2528000]]
# creating a pandas dataframe
df = pd.DataFrame(player_list,
columns = ['Name', 'Age',
'Weight', 'Salary'])
# splitting the dataframe into 2 parts
# on basis of 'Salary' column values
# using dataframe.groupby() function
df1, df2 = [x for _, x in df.groupby(df['Salary'] < 4528000)]
# printing df1
df1
输出:
# printing df2
df2
输出:
在上面的例子中,数据框’df’根据列’salary’的值被分割成两个部分’df1’和’df2’。