按给定的比例随机分割一个Pandas数据框架
Divide a Pandas Dataframe任务在机器学习、人工智能等领域将给定的数据集分成训练数据和测试数据进行训练和测试的情况下非常有用。让我们来看看如何将pandas数据框随机分成给定的比例。对于这项任务,我们将同时使用pandas数据框架的Dataframe.sample()和Dataframe.drop()方法。
这些函数的语法如下:
- Dataframe.sample()
语法: DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)
返回类型:一个与调用者类型相同的新对象,包含从调用者对象中随机抽取的n个项目。
- Dataframe.drop()
语法: DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors=’raise’)
Return: 带有删除值的Dataframe。
例子:现在,让我们创建一个数据框架。
# Importing required libraries
import pandas as pd
record = {
'course_name': ['Data Structures', 'Python',
'Machine Learning', 'Web Development'],
'student_name': ['Ankit', 'Shivangi',
'Priya', 'Shaurya'],
'student_city': ['Chennai', 'Pune',
'Delhi', 'Mumbai'],
'student_gender': ['M', 'F',
'F', 'M'] }
# Creating a dataframe
df = pd.DataFrame(record)
# show the dataframe
df
输出:
Dataframe
例子1:将一个数据框架随机分成1:1的比例。
# Importing required libraries
import pandas as pd
record = {
'course_name': ['Data Structures', 'Python',
'Machine Learning', 'Web Development'],
'student_name': ['Ankit', 'Shivangi',
'Priya', 'Shaurya'],
'student_city': ['Chennai', 'Pune',
'Delhi', 'Mumbai'],
'student_gender': ['M', 'F',
'F', 'M'] }
# Creating a dataframe
df = pd.DataFrame(record)
# Creating a dataframe with 50%
# values of original dataframe
part_50 = df.sample(frac = 0.5)
# Creating dataframe with
# rest of the 50% values
rest_part_50 = df.drop(part_50.index)
print("\n50% of the given DataFrame:")
print(part_50)
print("\nrest 50% of the given DataFrame:")
print(rest_part_50)
输出:
Divide dataframe
例子2:将一个数据帧随机分成3:1的比例。
# Importing required libraries
import pandas as pd
record = {
'course_name': ['Data Structures', 'Python',
'Machine Learning', 'Web Development'],
'student_name': ['Ankit', 'Shivangi',
'Priya', 'Shaurya'],
'student_city': ['Chennai', 'Pune',
'Delhi', 'Mumbai'],
'student_gender': ['M', 'F',
'F', 'M'] }
# Creating a dataframe
df = pd.DataFrame(record)
# Creating a dataframe with 75%
# values of original dataframe
part_75 = df.sample(frac = 0.75)
# Creating dataframe with
# rest of the 25% values
rest_part_25 = df.drop(part_75.index)
print("\n75% of the given DataFrame:")
print(part_75)
print("\nrest 25% of the given DataFrame:")
print(rest_part_25)
输出:
Divide Dataframe