SQL 识别重叠的日期区间,如果一个员工手头现有的项目尚未结束,却又开始了另一个新项目,那么你希望筛选出相关的数据。我们先看一下 EMP_PROJECT
表的数据。
SQL 识别重叠的日期区间 问题描述
如果一个员工手头现有的项目尚未结束,却又开始了另一个新项目,那么你希望筛选出相关的数据。我们先看一下 EMP_PROJECT
表的数据。
select *
from emp_project
EMPNO ENAME PROJ_ID PROJ_START PROJ_END
----- ---------- ------- ----------- -----------
7782 CLARK 1 16-JUN-2005 18-JUN-2005
7782 CLARK 4 19-JUN-2005 24-JUN-2005
7782 CLARK 7 22-JUN-2005 25-JUN-2005
7782 CLARK 10 25-JUN-2005 28-JUN-2005
7782 CLARK 13 28-JUN-2005 02-JUL-2005
7839 KING 2 17-JUN-2005 21-JUN-2005
7839 KING 8 23-JUN-2005 25-JUN-2005
7839 KING 14 29-JUN-2005 30-JUN-2005
7839 KING 11 26-JUN-2005 27-JUN-2005
7839 KING 5 20-JUN-2005 24-JUN-2005
7934 MILLER 3 18-JUN-2005 22-JUN-2005
7934 MILLER 12 27-JUN-2005 28-JUN-2005
7934 MILLER 15 30-JUN-2005 03-JUL-2005
7934 MILLER 9 24-JUN-2005 27-JUN-2005
7934 MILLER 6 21-JUN-2005 23-JUN-2005
看一下上述查询结果,员工 KING 在 PROJ_ID 5
结束之前就开始了 PROJ_ID 8
,并且它开始 PROJ_ID 5
的时候,PROJ_ID 2
还没有完结。因此,你希望得到的结果集如下所示。
EMPNO ENAME MSG
----- ---------- --------------------------------
7782 CLARK project 7 overlaps project 4
7782 CLARK project 10 overlaps project 7
7782 CLARK project 13 overlaps project 10
7839 KING project 8 overlaps project 5
7839 KING project 5 overlaps project 2
7934 MILLER project 12 overlaps project 9
7934 MILLER project 6 overlaps project 3
SQL 识别重叠的日期区间 解决方案
这里的关键之处在于要找出那些 PROJ_START
(新项目开始的日期)等于或者大于另一个项目 PROJ_START
的行,以及等于或者小于其他项目 PROJ_END
的行。因此,首先要逐个地比较(同一个员工的)每一个项目和其他项目。通过自连接 EMP_PROJECT
表,我们为每一个员工生成全部可能的项目组合。为找到日期重叠的项目,只要在同一个员工的项目里找出 PROJ_START
介于另一个 PROJ_START
和 PROJ_END
之间的那些行即可。
DB2、PostgreSQL 和 Oracle
EMP_PROJECT
表自连接,然后使用连接运算符||
为时间上发生了重叠的项目构造出符合要求的输出结果。
1 select a.empno,a.ename,
2 'project '||b.proj_id||
3 ' overlaps project '||a.proj_id as msg
4 from emp_project a,
5 emp_project b
6 where a.empno = b.empno
7 and b.proj_start >= a.proj_start
8 and b.proj_start <= a.proj_end
9 and a.proj_id != b.proj_id
MySQL
EMP_PROJECT
表自连接,然后使用 CONCAT
函数为时间上发生了重叠的项目构造出符合要求的输出结果。
1 select a.empno,a.ename,
2 concat('project ',b.proj_id,
3 ' overlaps project ',a.proj_id) as msg
4 from emp_project a,
5 emp_project b
6 where a.empno = b.empno
7 and b.proj_start >= a.proj_start
8 and b.proj_start <= a.proj_end
9 and a.proj_id != b.proj_id
SQL Server
EMP_PROJECT
表自连接,然后使用连接运算符 +
为时间上发生了重叠的项目构造出符合要求的输出结果。
1 select a.empno,a.ename,
2 'project '+b.proj_id+
3 ' overlaps project '+a.proj_id as msg
4 from emp_project a,
5 emp_project b
6 where a.empno = b.empno
7 and b.proj_start >= a.proj_start
8 and b.proj_start <= a.proj_end
9 and a.proj_id != b.proj_id
SQL 识别重叠的日期区间 扩展知识
上述各个解决方案之间的差别仅在于字符串连接方式不同,接下来的讨论里会使用 DB2 语法,但仍然能兼顾全部 3 种解决方案。首先自连接 EMP_PROJECT
表,这样 PROJ_START
日期值就能和其他项目逐一地做比较了。下面是针对员工 KING 的自连接查询结果。可以发现,每一个项目是怎样“看见”其他项目的。
select a.ename,
a.proj_id as a_id,
a.proj_start as a_start,
a.proj_end as a_end,
b.proj_id as b_id,
b.proj_start as b_start
from emp_project a,
emp_project b
where a.ename = 'KING'
and a.empno = b.empno
and a.proj_id != b.proj_id
order by 2
ENAME A_ID A_START A_END B_ID B_START
------ ----- ----------- ----------- ----- -----------
KING 2 17-JUN-2005 21-JUN-2005 8 23-JUN-2005
KING 2 17-JUN-2005 21-JUN-2005 14 29-JUN-2005
KING 2 17-JUN-2005 21-JUN-2005 11 26-JUN-2005
KING 2 17-JUN-2005 21-JUN-2005 5 20-JUN-2005
KING 5 20-JUN-2005 24-JUN-2005 2 17-JUN-2005
KING 5 20-JUN-2005 24-JUN-2005 8 23-JUN-2005
KING 5 20-JUN-2005 24-JUN-2005 11 26-JUN-2005
KING 5 20-JUN-2005 24-JUN-2005 14 29-JUN-2005
KING 8 23-JUN-2005 25-JUN-2005 2 17-JUN-2005
KING 8 23-JUN-2005 25-JUN-2005 14 29-JUN-2005
KING 8 23-JUN-2005 25-JUN-2005 5 20-JUN-2005
KING 8 23-JUN-2005 25-JUN-2005 11 26-JUN-2005
KING 11 26-JUN-2005 27-JUN-2005 2 17-JUN-2005
KING 11 26-JUN-2005 27-JUN-2005 8 23-JUN-2005
KING 11 26-JUN-2005 27-JUN-2005 14 29-JUN-2005
KING 11 26-JUN-2005 27-JUN-2005 5 20-JUN-2005
KING 14 29-JUN-2005 30-JUN-2005 2 17-JUN-2005
KING 14 29-JUN-2005 30-JUN-2005 8 23-JUN-2005
KING 14 29-JUN-2005 30-JUN-2005 5 20-JUN-2005
KING 14 29-JUN-2005 30-JUN-2005 11 26-JUN-2005
从上面的结果集里可以看到,自连接查询使得寻找重叠日期区间的工作变得容易多了。只要筛选出 B_START
介于 A_START
和 A_END
之间的那些行即可。DB2 解决方案第 7 行和第 8 行的 WHERE
条件做的就是这件事。
and b.proj_start >= a.proj_start
and b.proj_start <= a.proj_end
找到了所需的行之后,接下来只要连接字符串构造出适当格式的输出结果即可。
如果每一个员工的最大项目个数是固定的,那么 Oracle 用户就可以利用窗口函数 LEAD OVER
来避免使用自连接查询。在某些特定情况下使用自连接查询可能显得代价过于沉重(如果自连接耗费的机器资源远大于 LEAD OVER
),这个时候 LEAD OVER
函数就不失为一种简便的方法。例如,考虑下面针对员工 KING 的 LEAD OVER
替代方案。
select empno,
ename,
proj_id,
proj_start,
proj_end,
case
when lead(proj_start,1)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
when lead(proj_start,2)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
when lead(proj_start,3)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
when lead(proj_start,4)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
end is_overlap
from emp_project
where ename = 'KING'
EMPNO ENAME PROJ_ID PROJ_START PROJ_END IS_OVERLAP
----- ------ ------- ----------- ----------- ----------
7839 KING 2 17-JUN-2005 21-JUN-2005 5
7839 KING 5 20-JUN-2005 24-JUN-2005 8
7839 KING 8 23-JUN-2005 25-JUN-2005
7839 KING 11 26-JUN-2005 27-JUN-2005
7839 KING 14 29-JUN-2005 30-JUN-2005
对于员工 KING,因为项目个数被限定为 5 个,我们就能使用 LEAD OVER
函数逐一检查全部项目的日期值,而无须自连接查询。现在,离最终的结果集只有一步之遥了。接下来只要把 IS_OVERLAP
不是 Null
的行筛选出来即可。
select empno,ename,
'project '||is_overlap||
' overlaps project '||proj_id msg
from (
select empno,
ename,
proj_id,
proj_start,
proj_end,
case
when lead(proj_start,1)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
when lead(proj_start,2)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
when lead(proj_start,3)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
when lead(proj_start,4)over(order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(order by proj_start)
end is_overlap
from emp_project
where ename = 'KING'
)
where is_overlap is not null
EMPNO ENAME MSG
----- ------ --------------------------------
7839 KING project 5 overlaps project 2
7839 KING project 8 overlaps project 5
为了让以上解决方案适用于全体员工(而不局限于 KING),要在 LEAD OVER
函数调用里加上针对 ENAME
的分区。
select empno,ename,
'project '||is_overlap||
' overlaps project '||proj_id msg
from (
select empno,
ename,
proj_id,
proj_start,
proj_end,
case
when lead(proj_start,1)over(partition by ename
order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(partition by ename
order by proj_start)
when lead(proj_start,2)over(partition by ename
order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(partition by ename
order by proj_start)
when lead(proj_start,3)over(partition by ename
order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(partition by ename
order by proj_start)
when lead(proj_start,4)over(partition by ename
order by proj_start)
between proj_start and proj_end
then lead(proj_id)over(partition by ename
order by proj_start)
end is_overlap
from emp_project
)
where is_overlap is not null
EMPNO ENAME MSG
----- ------ -------------------------------
7782 CLARK project 7 overlaps project 4
7782 CLARK project 10 overlaps project 7
7782 CLARK project 13 overlaps project 10
7839 KING project 5 overlaps project 2
7839 KING project 8 overlaps project 5
7934 MILLER project 6 overlaps project 3
7934 MILLER project 12 overlaps project 9