使用Pandas数据框架的处理时间

Pandas是为金融建模而创建的，所以正如你所期望的，它包含了大量用于处理日期和时间的工具。有时，我们的数据集中给定的日期和时间格式不能直接用于分析，所以我们对这些时间值进行预处理，以获得日期、月份、年份、小时、分钟和秒等特征。

让我们讨论一下用Pandas数据框架处理日期和时间的所有不同方法。

将日期和时间划分为多个功能：
使用pd.date_range创建五个日期和时间，生成固定频率的日期和时间跨度的序列。然后我们使用pandas.Series.dt来提取特征。

# Load library
import pandas as pd
  
# calling DataFrame constructor
df = pd.DataFrame()
  
# Create 6 dates
df['time'] = pd.date_range('2/5/2019', periods = 6, freq ='2H')
print(df['time'])  # print dataframe
  
# Extract features - year, month, day, hour, and minute
df['year'] = df['time'].dt.year
df['month'] = df['time'].dt.month
df['day'] = df['time'].dt.day
df['hour'] = df['time'].dt.hour
df['minute'] = df['time'].dt.minute
  
# Show six rows
df.head(6)

输出:

0   2019-02-05 00:00:00
1   2019-02-05 02:00:00
2   2019-02-05 04:00:00
3   2019-02-05 06:00:00
4   2019-02-05 08:00:00
5   2019-02-05 10:00:00
Name: time, dtype: datetime64[ns]


                time  year  month  day  hour  minute
0 2019-02-05 00:00:00  2019      2    5     0       0
1 2019-02-05 02:00:00  2019      2    5     2       0
2 2019-02-05 04:00:00  2019      2    5     4       0
3 2019-02-05 06:00:00  2019      2    5     6       0
4 2019-02-05 08:00:00  2019      2    5     8       0
5 2019-02-05 10:00:00  2019      2    5    10       0

**
将字符串转换为时间戳。
我们使用pd.to_datetime将给定的字符串转换为数据时间格式，然后我们可以使用第一种方法从数据时间提取不同的特征。

# Load libraries
import numpy as np
import pandas as pd
  
# Create time Strings
dt_strings = np.array(['04-03-2019 12:35 PM',
                       '22-06-2017 11:01 AM',
                       '05-09-2009 07:09 PM'])
  
# Convert to datetime format
timestamps = [pd.to_datetime(date, format ="%d-%m-%Y%I:%M %p",
                     errors ="coerce") for date in dt_strings]
  
# Convert to datetimes
timestamps = [pd.to_datetime(date, format ="%d-%m-%Y %I:%M %p",
                      errors ="coerce") for date in dt_strings]

输出:

[Timestamp(‘2019-03-04 12:35:00’), Timestamp(‘2017-06-22 11:01:00’), Timestamp(‘2009-09-05 19:09:00’) ]

从给定的日期中提取一周的日期：。
我们使用Series.dt.weekday_name从给定的Date中找到一周中的某一天的名称。

# Load library
import pandas as pd
df = pd.DataFrame()
  
# Create 6 dates
dates = pd.pd.Series(date_range('2/5/2019', periods = 6, freq ='M'))
  
print(dates)
  
# Extract days of week and then print
print(dates.dt.weekday_name)

输出:

0   2019-02-28
1   2019-03-31
2   2019-04-30
3   2019-05-31
4   2019-06-30
5   2019-07-31
dtype: datetime64[ns]
0     Thursday
1       Sunday
2      Tuesday
3       Friday
4       Sunday
5    Wednesday
dtype: object

在日期和时间范围内提取数据：
我们可以从给定的数据集中获得位于特定时间范围内的行。

方法#1：如果数据集没有建立时间索引。

# Load library
import pandas as pd
  
# Create data frame
df = pd.DataFrame()
  
# Create datetimes
df['date'] = pd.date_range('1/1/2012', periods = 1000, freq ='H')
  
print(df.head(5))
  
# Select observations between two datetimes
x = df[(df['date'] > '2012-1-1 01:00:00') &
       (df['date'] <= '2012-1-1 11:00:00')]
  
print(x)

输出:

                 date
0 2012-01-01 00:00:00
1 2012-01-01 01:00:00                // 5 rows of Timestamps out of 1000
2 2012-01-01 02:00:00
3 2012-01-01 03:00:00
4 2012-01-01 04:00:00


                 date
2  2012-01-01 02:00:00
3  2012-01-01 03:00:00
4  2012-01-01 04:00:00
5  2012-01-01 05:00:00               //Timestamps in the given range
6  2012-01-01 06:00:00
7  2012-01-01 07:00:00
8  2012-01-01 08:00:00
9  2012-01-01 09:00:00
10 2012-01-01 10:00:00
11 2012-01-01 11:00:00

方法二：如果数据集是以时间为索引的

# Load library
import pandas as pd
  
# Create data frame
df = pd.DataFrame()
  
# Create datetimes
df['date'] = pd.date_range('1/1/2012', periods = 1000, freq ='H')
  
# Set index
df = df.set_index(df['date'])
  
print(df.head(5))
  
# Select observations between two datetimes
x = df.loc['2012-1-1 04:00:00':'2012-1-1 12:00:00']
  
print(x)

输出:

                                   date
date                                   
2012-01-01 00:00:00 2012-01-01 00:00:00
2012-01-01 01:00:00 2012-01-01 01:00:00
2012-01-01 02:00:00 2012-01-01 02:00:00
2012-01-01 03:00:00 2012-01-01 03:00:00                // 5 rows of Timestamps out of 1000
2012-01-01 04:00:00 2012-01-01 04:00:00
                                   date
date                                   
2012-01-01 04:00:00 2012-01-01 04:00:00
2012-01-01 05:00:00 2012-01-01 05:00:00
2012-01-01 06:00:00 2012-01-01 06:00:00
2012-01-01 07:00:00 2012-01-01 07:00:00
2012-01-01 08:00:00 2012-01-01 08:00:00
2012-01-01 09:00:00 2012-01-01 09:00:00               //Timestamps in the given range
2012-01-01 10:00:00 2012-01-01 10:00:00
2012-01-01 11:00:00 2012-01-01 11:00:00
2012-01-01 12:00:00 2012-01-01 12:00:00