Pandas数据框架中的重新索引
Pandas中的重新索引可以用来改变一个DataFrame的行和列的索引。索引可以被用来参考与多个pandas系列或pandas DataFrame相关的许多索引数据结构。让我们来看看如何在Pandas DataFrame中对列和行进行重新索引。
重新编制行的索引
人们可以通过使用reindex()方法对单行或多行进行重新索引。在新的索引中,不存在于数据框架中的默认值被分配为NaN。
示例 #1:
# import numpy and pandas module
import pandas as pd
import numpy as np
column=['a','b','c','d','e']
index=['A','B','C','D','E']
# create a dataframe of random values of array
df1 = pd.DataFrame(np.random.rand(5,5),
columns=column, index=index)
print(df1)
print('\n\nDataframe after reindexing rows: \n',
df1.reindex(['B', 'D', 'A', 'C', 'E']))
输出:
示例 #2:
# import numpy and pandas module
import pandas as pd
import numpy as np
column = ['a', 'b', 'c', 'd', 'e']
index = ['A', 'B', 'C', 'D', 'E']
# create a dataframe of random values of array
df1 = pd.DataFrame(np.random.rand(5, 5),
columns = column, index = index)
# create the new index for rows
new_index =['U', 'A', 'B', 'C', 'Z']
print(df1.reindex(new_index))
输出:
使用axis关键字对列进行重新索引
通过使用reindex()方法和指定我们要重新索引的轴,我们可以重新索引一个单列或多列。在新的索引中,不存在于数据框架中的默认值被分配为NaN。
示例 #1:
# import numpy and pandas module
import pandas as pd
import numpy as np
column=['a','b','c','d','e']
index=['A','B','C','D','E']
#create a dataframe of random values of array
df1 = pd.DataFrame(np.random.rand(5,5),
columns=column, index=index)
column=['e','a','b','c','d']
# create the new index for columns
print(df1.reindex(column, axis='columns'))
输出:
示例 #2:
# import numpy and pandas module
import pandas as pd
import numpy as np
column =['a', 'b', 'c', 'd', 'e']
index =['A', 'B', 'C', 'D', 'E']
# create a dataframe of random values of array
df1 = pd.DataFrame(np.random.rand(5, 5),
columns = column, index = index)
column =['a', 'b', 'c', 'g', 'h']
# create the new index for columns
print(df1.reindex(column, axis ='columns'))
输出:
替换缺失的数值
代码#1:数据框架中的缺失值可以通过传递一个值给关键字fill_value来填补。这个关键字取代了NaN值。
# import numpy and pandas module
import pandas as pd
import numpy as np
column =['a', 'b', 'c', 'd', 'e']
index =['A', 'B', 'C', 'D', 'E']
# create a dataframe of random values of array
df1 = pd.DataFrame(np.random.rand(5, 5),
columns = column, index = index)
column =['a', 'b', 'c', 'g', 'h']
# create the new index for columns
print(df1.reindex(column, axis ='columns', fill_value = 1.5))
输出:
代码#2:用一个字符串替换缺失的数据。
# import numpy and pandas module
import pandas as pd
import numpy as np
column =['a', 'b', 'c', 'd', 'e']
index =['A', 'B', 'C', 'D', 'E']
# create a dataframe of random values of array
df1 = pd.DataFrame(np.random.rand(5, 5),
columns = column, index = index)
column =['a', 'b', 'c', 'g', 'h']
# create the new index for columns
print(df1.reindex(column, axis ='columns', fill_value ='data missing'))
输出: