将Pandas交叉表转换为堆叠数据框架

在这篇文章中，我们将讨论如何将pandas串联表转换为堆叠数据框架。

堆叠的DataFrame是一个多级索引，与原始DataFrame相比，有一个或多个新的内部级别。如果列有一个单层，那么结果就是一个系列对象。

Pandas的crosstab功能是一个频率表，通过建立一个交叉表来显示两个或多个变量之间的关系，计算某些数据组之间的频率。

语法:

pandas.crosstab(index, columns, rownames=None, colnames=None)

参数 :

index – 数组或系列或类似数组的对象的列表。该值用于在行中分组
columns – 数组或系列或类似数组的对象的列表。这个值用于在列中进行分组
rownames – 这里指定的名称必须与所传递的行数组的数量相匹配。
colnames – 这里指定的名称必须与所传递的列数组的数量相符。

示例:

在这个例子中，我们创建了3个样本数组，即car_brand, version, fuel_type，如图所示。现在，我们将这些数组作为索引、列、行和列的名称传递给crosstab函数，如图所示。

最后，crosstab数据框架还可以使用python plot.bar()函数进行可视化。

# import the numpy and pandas package
import numpy as np
import pandas as pd
 
# create three separate arrays namely car_brand,
# version, fuel_type as shown
car_brand = np.array(["bmw", "bmw", "bmw", "bmw", "benz", "benz",
                      "bmw", "bmw", "benz", "benz", "benz", "benz",
                      "bmw", "bmw", "bmw", "benz", "benz", ],
                     dtype=object)
 
version = np.array(["one", "one", "one", "two", "one", "one", "one",
                    "two", "one", "one", "one", "two", "two", "two",
                    "one", "two", "one"], dtype=object)
 
fuel_type = np.array(["petrol", "petrol", "petrol", "diesel", "diesel",
                      "petrol", "diesel", "diesel", "diesel", "petrol",
                      "petrol", "diesel", "petrol", "petrol", "petrol",
                      "diesel", "diesel", ],
                     dtype=object)
 
# use pandas crosstab and pass the three arrays
# as index and columns to create a crosstab table.
cross_tab_data = pd.crosstab(index=car_brand,
                             columns=[version, fuel_type],
                             rownames=['car_brand'],
                             colnames=['version', 'fuel_type'])
 
print(cross_tab_data)
 
barplot = cross_tab_data.plot.bar()

输出:

将Pandas交叉表转换为堆叠数据框架

将串联表转换为堆叠数据框架。

这里我们要指定要堆叠的层数。这将根据pandas DataFrame的特定列上的轴级别进行转换。

语法:

pandas.DataFrame.stack(level, dropna)

参数 :

level – 指定在生成的数据框架中从列轴到索引轴堆叠的层次。
dropna – 一个bool类型。是否在生成的DataFrame/Series中删除缺失值的行。

示例 1:

在这里，我们将把交叉表转换为一个堆叠的数据框架。Fuel_type级别将作为一个列堆积在结果数据框架中。

# import the numpy and pandas package
import numpy as np
import pandas as pd
 
# create three separate arrays namely car_brand,
# version, fuel_type as shown
car_brand = np.array(["bmw", "bmw", "bmw", "bmw", "benz", "benz",
                      "bmw", "bmw", "benz", "benz", "benz", "benz",
                      "bmw", "bmw", "bmw", "benz", "benz", ],
                     dtype=object)
 
version = np.array(["one", "one", "one", "two", "one", "one", "one",
                    "two", "one", "one", "one", "two", "two", "two",
                    "one", "two", "one"], dtype=object)
 
fuel_type = np.array(["petrol", "petrol", "petrol", "diesel", "diesel",
                      "petrol", "diesel", "diesel", "diesel", "petrol",
                      "petrol", "diesel", "petrol", "petrol", "petrol",
                      "diesel", "diesel", ],
                     dtype=object)
 
# use pandas crosstab and pass the three
# arrays as index and columns
# to create a crosstab table.
cross_tab_data = pd.crosstab(index=car_brand,
                             columns=[version, fuel_type],
                             rownames=['car_brand'],
                             colnames=['version', 'fuel_type'])
 
barplot = cross_tab_data.plot.bar()
 
# use the created sample crosstab data
# to convert it to a stacked dataframe
stacked_data = cross_tab_data.stack(level=1)
 
print(stacked_data)

输出:

将Pandas交叉表转换为堆叠数据框架

例子2

在这个例子中，我们展示了1和2两个级别的结果。

# import the numpy and pandas package
import numpy as np
import pandas as pd
 
# create three separate arrays namely car_brand,
# version, fuel_type as shown
car_brand = np.array(["bmw", "bmw", "bmw", "bmw", "benz",
                      "benz", "bmw", "bmw", "benz", "benz",
                      "benz", "benz", "bmw", "bmw", "bmw",
                      "benz", "benz", ], dtype=object)
 
version = np.array(["one", "one", "one", "two", "one", "one",
                    "one", "two", "one", "one", "one", "two",
                    "two", "two", "one", "two", "one"],
                   dtype=object)
 
fuel_type = np.array(["petrol", "petrol", "petrol", "diesel",
                      "diesel", "petrol", "diesel", "diesel",
                      "diesel", "petrol", "petrol", "diesel",
                      "petrol", "petrol", "petrol", "diesel",
                      "diesel", ], dtype=object)
 
year_release = np.array([2000, 2005, 2000, 2007, 2000, 2005,
                         2007, 2005, 2005, 2000, 2007, 2000,
                         2007, 2005, 2005, 2007, 2000],
                        dtype=object)
 
# use pandas crosstab and pass the three arrays
# as index and columns to create a crosstab table.
cross_tab_data = pd.crosstab(index=car_brand,
                             columns=[version, fuel_type, year_release],
                             rownames=['car_brand'],
                             colnames=['version', 'fuel_type', 'year_release'])
 
barplot = cross_tab_data.plot.bar()
 
# use the created sample crosstab data to
# convert it to a stacked dataframe with
# level 1
stacked_data = cross_tab_data.stack(level=1)
 
 
barplot = stacked_data.plot.bar()
 
# use the created sample crosstab data to
# convert it to a stacked dataframe with
# level 2
stacked_data = cross_tab_data.stack(level=2)
 
barplot = stacked_data.plot.bar()

输出:

将Pandas交叉表转换为堆叠数据框架