如何在Pandas中用自定义分隔符将CSV文件读到Dataframe中

Python是一种做数据分析的好语言，因为以数据为中心的Python包有一个惊人的生态系统。pandas包是其中之一，使导入和分析数据变得如此容易。
在这里，我们将讨论如何将一个csv文件加载到一个Dataframe中。这是用pandas.read_csv()方法完成的。我们必须导入pandas库来使用这个方法。

语法：

pd.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None, squeeze=False, prefix=None,mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None,nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False,keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"', quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, doublequote=True, delim_whitespace=False, low_memory=True, memory_map=False, float_precision=None)

以下是一些有用的参数。

参数	使用
filepath_or_buffer	文件的URL或Dir位置
sep	代表分隔符，默认为csv(逗号分隔值)中的’，’ 。
index_col	该参数用于将所传递的列作为索引，而不是0、1、2、3…r。
header	该参数用于将传递的行[int/int list]作为标题。
use_cols	这个参数只使用传递的col[字符串列表]来制作数据框。
squeeze	如果是True并且只传递了一列，则返回pandas系列
skiprows	该参数用于跳过新数据框中已通过的行。
skipfooter	该参数用于跳过文件底部的行数。

这个方法使用逗号’，’作为默认的分隔符，但我们也可以使用自定义的分隔符或正则表达式作为分隔符。
例子1 :使用read_csv()方法，使用默认的分隔符，即逗号(,)。

# Importing pandas library
import pandas as pd
 
# Using the function to load
# the data of example.csv
# into a Dataframe df
df = pd.read_csv('example1.csv')
 
# Print the Dataframe
df

输出:

如何在Pandas中用自定义分隔符将CSV文件读到Dataframe中？

示例2：使用read_csv()方法，用’_’作为自定义分隔符。

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with '_' as custom delimiter
# into a Dataframe df
df = pd.read_csv('example2.csv',
                   sep = '_',
                   engine = 'python')
 
# Print the Dataframe
df

输出:

如何在Pandas中用自定义分隔符将CSV文件读到Dataframe中？

注意：在给出自定义指定器时，我们必须指定引擎=’python’，否则我们可能会得到类似下面的警告。

如何在Pandas中用自定义分隔符将CSV文件读到Dataframe中？

例子3 :使用read_csv()方法，用tab作为自定义分隔符。

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with tab as custom delimiter
# into a Dataframe df
df = pd.read_csv('example3.csv',
                   sep = '\t',
                   engine = 'python')
 
# Print the Dataframe
df

输出:

如何在Pandas中用自定义分隔符将CSV文件读到Dataframe中？

示例4 :使用read_csv()方法，用正则表达式作为自定义分隔符。
假设我们有一个带有多种类型定界符的csv文件，如下图所示。

totalbill_tip, sex:smoker, day_time, size
16.99, 1.01:Female|No, Sun, Dinner, 2
10.34, 1.66, Male, No|Sun:Dinner, 3
21.01:3.5_Male, No:Sun, Dinner, 3
23.68, 3.31, Male|No, Sun_Dinner, 2
24.59:3.61, Female_No, Sun, Dinner, 4
25.29, 4.71|Male, No:Sun, Dinner, 4

为了将这些文件加载到数据框架中，我们使用正则表达式作为分隔符。

# Importing pandas library
import pandas as pd
 
# Load the data of example.csv
# with regular expression as
# custom delimiter into a
# Dataframe df
df = pd.read_csv('example4.csv',
                   sep = '[:, |_]',
                   engine = 'python')
 
# Print the Dataframe
df

输出:

如何在Pandas中用自定义分隔符将CSV文件读到Dataframe中？