按列值分割Pandas数据框架

按列值分割Pandas数据框架

有时为了更准确地分析Dataframe,我们需要把它分成两部分或更多部分。Pandas提供了根据列索引、行索引和列值等来分割Dataframe的功能。

让我们看看如何在Python中通过列值拆分Pandas数据框架?

现在,让我们创建一个数据框架。

villiers

# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villiers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list,
                  columns = ['Name', 'Age',
                             'Weight', 'Salary'])
 
# show the dataframe
df

输出:

按列值分割Pandas数据框架

方法1:使用布尔掩码方法。

这个方法只用来打印数据框架中我们传递布尔值True的那一部分。

示例 1:

# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villiers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list,
                  columns = ['Name', 'Age',
                             'Weight', 'Salary'])
 
# splitting the dataframe into 2 parts
# on basis of 'Age' column values
# using Relational operator
df1 = df[df['Age'] >= 37]
 
# printing df1
df1

输出:

按列值分割Pandas数据框架

df2 = df[df['Age'] < 37]
 
# printing df2
df2

输出:

按列值分割Pandas数据框架

在上面的例子中,数据框 “df “根据 “年龄 “列的值被分成两部分 “df1 “和 “df2″。

示例 2:

# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villiers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list,
                  columns = ['Name', 'Age',
                             'Weight', 'Salary'])
 
# splitting the dataframe into 2 parts
# on basis of 'Weight' column values
mask = df['Weight'] >= 80
 
df1 = df[mask]
 
# invert the boolean values
df2 = df[~mask]
 
# printing df1
df1

输出:

按列值分割Pandas数据框架

# printing df2
df2

输出:

按列值分割Pandas数据框架

在上面的例子中,数据框 “df “根据列 “Weight “的值被分成两部分 “df1 “和 “df2″。

方法2:使用Dataframe.groupby()

这种方法是用来根据一些标准将数据分成几组。

示例:

# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data-set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villiers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list,
                  columns = ['Name', 'Age',
                             'Weight', 'Salary'])
 
# splitting the dataframe into 2 parts
# on basis of 'Salary' column values
# using dataframe.groupby() function
df1, df2 = [x for _, x in df.groupby(df['Salary'] < 4528000)]
 
# printing df1
df1

输出:

按列值分割Pandas数据框架

# printing df2
df2

输出:

按列值分割Pandas数据框架

在上面的例子中,数据框’df’根据列’salary’的值被分割成两个部分’df1’和’df2’。

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程