计算Pandas数据框架中的NaN或缺失值

计算Pandas数据框架中的NaN或缺失值

在这篇文章中,我们将看到如何使用数据框架的isnull()和sum()方法来计算Pandas数据框架中的NaN或缺失值。

Dataframe.isnull() 方法

Pandas isnull()函数检测给定对象中的缺失值。它返回一个布尔值大小相同的对象,表明这些值是否为NA。缺失的值被映射为True,非缺失的值被映射为False。

语法: DataFrame.isnull()

参数: None

返回类型:布尔值的数据帧,对NaN值为真,否则为假。

dataframe.sum() 方法

Pandas sum()函数返回所请求的轴的数值之和。如果输入的是索引轴,那么它将一列中的所有数值相加,并对所有列重复同样的操作,然后返回一个包含每一列中所有数值之和的序列。它还提供了在计算时跳过缺失值的支持。

语法: DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

参数:

  • axis : {index (0), columns (1)}
  • skipna : 计算结果时不包括NA/空值。
  • level : 如果轴是一个多指标(分层),则沿某一层次计数,折叠成一个系列
  • numeric_only : 只包括浮点、int、boolean列。如果没有,将尝试使用所有数据,然后只使用数字数据。未对系列实现。
  • min_count : 执行该操作所需的有效值的数量。如果少于min_count的非NA值,结果将是NA。

返回:系列或数据框架的总和(如果指定级别)。

让我们来创建一个pandas数据框架。

# import numpy library as np
import numpy as np
  
# import pandas library as pd
import pandas as pd
  
# List of Tuples
students = [('Ankit', 22, 'Up', 'Geu'),
           ('Ankita', np.NaN, 'Delhi', np.NaN),
           ('Rahul', 16, 'Tokyo', 'Abes'),
           ('Simran', 41, 'Delhi', 'Gehu'),
           ('Shaurya', np.NaN, 'Delhi', 'Geu'),
           ('Shivangi', 35, 'Mumbai', np.NaN ),
           ('Swapnil', 35, np.NaN, 'Geu'),
           (np.NaN, 35, 'Uk', 'Geu'),
           ('Jeet', 35, 'Guj', 'Gehu'),
           (np.NaN, np.NaN, np.NaN, np.NaN)
            ]
  
# Create a DataFrame object from
# list of tuples with columns
# and indices.
details = pd.DataFrame(students, columns =['Name', 'Age', 
                                           'Place', 'College'],
                        index =['a', 'b', 'c', 'd', 'e', 
                                'f', 'g', 'i', 'j', 'k'])
  
details
Python

输出:

计算Pandas数据框架中的NaN或缺失值

实例1 :计算DataFrame中每一列的NaN总数。

# import numpy library as np
import numpy as np
  
# import pandas library as pd
import pandas as pd
  
  
# List of Tuples
students = [('Ankit', 22, 'Up', 'Geu'),
           ('Ankita', np.NaN, 'Delhi', np.NaN),
           ('Rahul', 16, 'Tokyo', 'Abes'),
           ('Simran', 41, 'Delhi', 'Gehu'),
           ('Shaurya', np.NaN, 'Delhi', 'Geu'),
           ('Shivangi', 35, 'Mumbai', np.NaN ),
           ('Swapnil', 35, np.NaN, 'Geu'),
           (np.NaN, 35, 'Uk', 'Geu'),
           ('Jeet', 35, 'Guj', 'Gehu'),
           (np.NaN, np.NaN, np.NaN, np.NaN)
            ]
  
# Create a DataFrame object from list of tuples 
# with columns and indices.
details = pd.DataFrame(students, columns =['Name', 'Age',
                                           'Place', 'College'],
                        index =['a', 'b', 'c', 'd', 'e', 
                                'f', 'g', 'i', 'j', 'k'])
  
# show the boolean dataframe            
print(" \nshow the boolean Dataframe : \n\n", details.isnull())
  
# Count total NaN at each column in a DataFrame
print(" \nCount total NaN at each column in a DataFrame : \n\n",
      details.isnull().sum())
Python

输出:

计算Pandas数据框架中的NaN或缺失值

示例2:计算数据框架中每一行的NaN总数。

# import numpy library as np
import numpy as np
  
# import pandas library as pd
import pandas as pd
  
  
# List of Tuples
students = [('Ankit', 22, 'Up', 'Geu'),
           ('Ankita', np.NaN, 'Delhi', np.NaN),
           ('Rahul', 16, 'Tokyo', 'Abes'),
           ('Simran', 41, 'Delhi', 'Gehu'),
           ('Shaurya', np.NaN, 'Delhi', 'Geu'),
           ('Shivangi', 35, 'Mumbai', np.NaN ),
           ('Swapnil', 35, np.NaN, 'Geu'),
           (np.NaN, 35, 'Uk', 'Geu'),
           ('Jeet', 35, 'Guj', 'Gehu'),
           (np.NaN, np.NaN, np.NaN, np.NaN)
            ]
  
# Create a DataFrame object from
# list of tuples with columns
# and indices.
details = pd.DataFrame(students, columns =['Name', 'Age', 
                                           'Place', 'College'],
                        index =['a', 'b', 'c', 'd', 'e',
                                'f', 'g', 'i', 'j', 'k'])
  
# show the boolean dataframe            
print(" \nshow the boolean Dataframe : \n\n", details.isnull())
  
# index attribute of a dataframe
# gives index list 
  
# Count total NaN at each row in a DataFrame
for i in range(len(details.index)) :
    print(" Total NaN in row", i + 1, ":",
          details.iloc[i].isnull().sum())
Python

输出:

计算Pandas数据框架中的NaN或缺失值

实例3:计算数据框架中的NaN总数。

# import numpy library as np
import numpy as np
  
# import pandas library as pd
import pandas as pd
  
  
# List of Tuples
students = [('Ankit', 22, 'Up', 'Geu'),
           ('Ankita', np.NaN, 'Delhi', np.NaN),
           ('Rahul', 16, 'Tokyo', 'Abes'),
           ('Simran', 41, 'Delhi', 'Gehu'),
           ('Shaurya', np.NaN, 'Delhi', 'Geu'),
           ('Shivangi', 35, 'Mumbai', np.NaN ),
           ('Swapnil', 35, np.NaN, 'Geu'),
           (np.NaN, 35, 'Uk', 'Geu'),
           ('Jeet', 35, 'Guj', 'Gehu'),
           (np.NaN, np.NaN, np.NaN, np.NaN)
            ]
  
# Create a DataFrame object from
# list of tuples with columns
# and indices.
details = pd.DataFrame(students, columns =['Name', 'Age', 
                                           'Place', 'College'],
                        index =['a', 'b', 'c', 'd', 'e',
                                'f', 'g', 'i', 'j', 'k'])
  
# show the boolean dataframe            
print(" \nshow the boolean Dataframe : \n\n", details.isnull())
  
# Count total NaN in a DataFrame
print(" \nCount total NaN in a DataFrame : \n\n",
       details.isnull().sum().sum())
Python

输出:

计算Pandas数据框架中的NaN或缺失值

Python教程

Java教程

Web教程

数据库教程

图形图像教程

大数据教程

开发工具教程

计算机教程

登录

注册