Pandas 如何将层次化索引列展开

在本文中，我们将介绍在Pandas中如何将层次化索引列展开。层次化索引，也称为多级索引，是指在一个轴上拥有多个索引级别的索引方式，常常在数据分析中使用。例如，在以下示例中，列名中有两个级别的索引：

           apples       oranges
           sale  price    sale  price
city                                
New York     10     1.0      15    0.5
Los Angeles  20     0.9      25    0.6

第一级索引为城市名称，第二级索引为水果类型、销量和价格。在有些情况下，我们需要展开或者重构此层次化结构，将多级索引转换为单级索引，这样更方便数据的处理和分析。

阅读更多：Pandas 教程

使用reset_index方法

Pandas提供了reset_index方法，可以将一个或多个层次化的列索引转换为行索引，这样就可以将层次化索引列展开为单级列。下面是示例代码：

import pandas as pd

data = {'city': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
        'apples_sale': [10, 20, 30, 15],
        'apples_price': [1.0, 0.9, 0.8, 0.7],
        'oranges_sale': [15, 25, 35, 20],
        'oranges_price': [0.5, 0.6, 0.7, 0.8]}

df = pd.DataFrame(data)
df.set_index('city', inplace=True)

print(df)

# 展开层次化索引列为单级列
df_flat = df.reset_index()
print(df_flat)

执行结果如下：

             apples_sale  apples_price  oranges_sale  oranges_price
city                                                                
New York              10           1.0            15            0.5
Los Angeles           20           0.9            25            0.6
Chicago               30           0.8            35            0.7
Houston               15           0.7            20            0.8

         city  apples_sale  apples_price  oranges_sale  oranges_price
0    New York           10           1.0            15            0.5
1  Los Angeles           20           0.9            25            0.6
2      Chicago           30           0.8            35            0.7
3      Houston           15           0.7            20            0.8

通过reset_index方法，将city列的层次化索引转换为单级列。注意在示例代码中，我们首先将city列设置为索引列，然后执行reset_index方法，展开层次化索引列为单级列。

使用stack方法

在某些情况下，我们需要将多个层次化索引列同时展开为单级列。此时，可以使用stack方法，它将列中的层次化索引压缩为单个级别，将列名转换为行中的最内层级别。下面是示例代码：

import pandas as pd

data = {'city': ['New York', 'Los Angeles', 'Chicago', 'Houston'],
        'apples_sale': [10, 20, 30, 15],
        'apples_price': [1.0, 0.9, 0.8, 0.7],
        'oranges_sale': [15, 25, 35, 20],
        'oranges_price': [0.5, 0.6, 0.7, 0.8]}

df = pd.DataFrame(data)
df.set_index('city', inplace=True)

print(df)

# 将层次化索引列展开为单级列
df_flat = df.stack(level=0).reset_index(level=1)
df_flat.columns = ['fruit', 'sale', 'price']
df_flat.reset_index(inplace=True)
print(df_flat)

执行结果如下：

             apples_sale  apples_price  oranges_sale  oranges_price
city                                                                
New York              10           1.0            15            0.5
Los Angeles           20           0.9            25            0.6
Chicago               30           0.8            35            0.7
Houston               15           0.7            20            0.8

   city     fruit  sale  price
0     0    apples    10    1.0
1     0  oranges    15    0.5
2     1    apples    20    0.9
3     1  oranges    25    0.6
4     2    apples    30    0.8
5     2  oranges    35    0.7
6     3    apples    15    0.7
7     3  oranges    20    0.8

通过stack方法，将列中的层次化索引压缩为单个级别，并将列名转换为行中最内层的级别，然后使用reset_index方法将索引列转换为单级列。需要注意的是，使用stack方法压缩层次化索引列后，会形成多层索引，需要使用reset_index方法将索引列转换为普通列，使用rename方法将列名重命名为合适的名称。

总结

本文介绍了在Pandas中如何将层次化索引列展开为单级列。我们可以使用reset_index方法将单个层次化结构的列将转换为单级列；也可以使用stack方法将多个层次化结构的列压缩为单个级别，然后使用reset_index方法将多级索引转换为单级列。这些方法可以极大地简化数据的处理和分析，提高数据分析的效率。