清理给定的Pandas Dataframe中的字符串数据

正如我们所知，在当今世界，数据分析正被各种公司所使用。在处理数据时，我们可能会遇到任何一种问题，这就需要采用突破性的方法进行评估。现实生活中的大多数数据都包含实体的名称或其他名词。有可能这些名字的格式不正确。在这篇文章中，我们将讨论清理这些数据的方法。

假设我们正在处理一个基于电子商务的网站的数据。产品的名称没有正确的格式。正确地格式化数据，使其没有前导和尾部的空白，并且所有产品的第一个字母都是大写字母。

解决方案#1：很多时候，我们会遇到这样的情况：我们需要自己编写适合手头任务的定制函数。

# importing pandas as pd
import pandas as pd
  
# Create the dataframe
df = pd.DataFrame({'Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
                   'Product':[' UMbreLla', '  maTress', 'BaDmintoN ', 'Shuttle'],
                   'Updated_Price':[1250, 1450, 1550, 400],
                   'Discount':[10, 8, 15, 10]})
  
# Print the dataframe
print(df)

输出 :

清理给定的Pandas Dataframe中的字符串数据

现在我们将编写我们自己的定制函数来解决这个问题。

def Format_data(df):
    # iterate over all the rows
    for i in range(df.shape[0]):
  
        # reassign the values to the product column
        # we first strip the whitespaces using strip() function
        # then we capitalize the first letter using capitalize() function
        df.iat[i, 1]= df.iat[i, 1].strip().capitalize()
  
# Let's call the function
Format_data(df)
  
# Print the Dataframe
print(df)

输出 :
清理给定的Pandas Dataframe中的字符串数据

解决方案#2 :现在我们将看到一个更好的和有效的方法，使用Pandas DataFrame.apply()函数。

# importing pandas as pd
import pandas as pd
  
# Create the dataframe
df = pd.DataFrame({''Date':['10/2/2011', '11/2/2011', '12/2/2011', '13/2/2011'],
                   'Product':[' UMbreLla', '  maTress', 'BaDmintoN ', 'Shuttle'],
                   'Updated_Price':[1250, 1450, 1550, 400],
                   'Discount':[10, 8, 15, 10]})
  
# Print the dataframe
print(df)

输出 :

清理给定的Pandas Dataframe中的字符串数据

让我们使用Pandas DataFrame.apply()函数，以正确的格式格式化产品名称。在Pandas DataFrame.apply()函数中，我们将使用lambda函数。

# Using the df.apply() function on product column
df['Product'] = df['Product'].apply(lambda x : x.strip().capitalize())
  
# Print the Dataframe
print(df)

输出 :
清理给定的Pandas Dataframe中的字符串数据