SQL 解析串行化的数据,你有串行化的数据(以字符串形式存储),你希望解析这些字符串并以行的形式返回。例如,你的数据如下所示。
SQL 解析串行化的数据 问题描述
你有串行化的数据(以字符串形式存储),你希望解析这些字符串并以行的形式返回。例如,你的数据如下所示。
STRINGS
-----------------------------------
entry:stewiegriffin:lois:brian:
entry:moe::sizlack:
entry:petergriffin:meg:chris:
entry:willie:
entry:quagmire:mayorwest:cleveland:
entry:::flanders:
Entry:robo:tchi:ken:
你希望把这些字符串转换成如下所示的结果集。
VAL1 VAL2 VAL3
--------------- --------------- ---------------
moe sizlack
petergriffin meg chris
quagmire mayorwest cleveland
robo tchi ken
stewiegriffin lois brian
willie
flanders
SQL 解析串行化的数据 解决方案
本例中的每个字符串最多有 3 个值构成。这些值由冒号分隔,但是不一定每个字符串都包含 3 个值。如果一个字符串包含的值不足 3 个,我们必须小心地处理这种状况,确保解析出来的值被放入正确的列里。考虑如下所示的一行数据。
entry:::flanders:
上面这行数据里缺少了前面两个值,仅剩下第 3 个值。因此,仔细观察“问题”部分给出的结果集,会发现“flanders
”这一行 VAL1
和 VAL2
的值都是 Null
。
本解决方案的关键在于遍历字符串并解析字符串,最后再执行一个简单的行列翻转操作。我们还用到了视图 V
,下面给出了该视图的定义。另外,这里使用的是 Oracle 语法,由于本实例仅涉及部分字符串解析函数的调用,相信你能很容易地修改代码使之适用于其他数据库。
create view V
as
select 'entry:stewiegriffin:lois:brian:' strings
from dual
union all
select 'entry:moe::sizlack:'
from dual
union all
select 'entry:petergriffin:meg:chris:'
from dual
union all
select 'entry:willie:'
from dual
union all
select 'entry:quagmire:mayorwest:cleveland:'
from dual
union all
select 'entry:::flanders:'
from dual
union all
select 'entry:robo:tchi:ken:'
from dual
视图 V
负责提供示例数据,解决方案如下所示。
1 with cartesian as (
2 select level id
3 from dual
4 connect by level <= 100
5 )
6 select max(decode(id,1,substr(strings,p1+1,p2-1))) val1,
7 max(decode(id,2,substr(strings,p1+1,p2-1))) val2,
8 max(decode(id,3,substr(strings,p1+1,p2-1))) val3
9 from (
10 select v.strings,
11 c.id,
12 instr(v.strings,':',1,c.id) p1,
13 instr(v.strings,':',1,c.id+1)-instr(v.strings,':',1,c.id) p2
14 from v, cartesian c
15 where c.id <= (length(v.strings)-length(replace(v.strings,':')))-1
16 )
17 group by strings
18 order by 1
SQL 解析串行化的数据 扩展知识
第一步是遍历这些字符串。
with cartesian as (
select level id
from dual
connect by level <= 100
)
select v.strings,
c.id
from v,cartesian c
where c.id <= (length(v.strings)-length(replace(v.strings,':')))-1
STRINGS ID
----------------------------------- ---
entry:::flanders: 1
entry:::flanders: 2
entry:::flanders: 3
entry:moe::sizlack: 1
entry:moe::sizlack: 2
entry:moe::sizlack: 3
entry:petergriffin:meg:chris: 1
entry:petergriffin:meg:chris: 3
entry:petergriffin:meg:chris: 2
entry:quagmire:mayorwest:cleveland: 1
entry:quagmire:mayorwest:cleveland: 3
entry:quagmire:mayorwest:cleveland: 2
entry:robo:tchi:ken: 1
entry:robo:tchi:ken: 2
entry:robo:tchi:ken: 3
entry:stewiegriffin:lois:brian: 1
entry:stewiegriffin:lois:brian: 3
entry:stewiegriffin:lois:brian: 2
entry:willie: 1
下一步是调用 INSTR
函数找出每个字符串里每个冒号的位置。因为我们要提取的每个值都被两个冒号包围,这两个冒号的位置信息分别命名为 P1
和 P2
,意思是“位置 1”和“位置 2”2。
with cartesian as (
select level id
from dual
connect by level <= 100
)
select v.strings,
c.id,
instr(v.strings,':',1,c.id) p1,
instr(v.strings,':',1,c.id+1)-instr(v.strings,':',1,c.id) p2
from v,cartesian c
where c.id <= (length(v.strings)-length(replace(v.strings,':')))-1
order by 1
STRINGS ID P1 P2
----------------------------------- --- ---------- ----------
entry:::flanders: 1 6 1
entry:::flanders: 2 7 1
entry:::flanders: 3 8 9
entry:moe::sizlack: 1 6 4
entry:moe::sizlack: 2 10 1
entry:moe::sizlack: 3 11 8
entry:petergriffin:meg:chris: 1 6 13
entry:petergriffin:meg:chris: 3 23 6
entry:petergriffin:meg:chris: 2 19 4
entry:quagmire:mayorwest:cleveland: 1 6 9
entry:quagmire:mayorwest:cleveland: 3 25 10
entry:quagmire:mayorwest:cleveland: 2 15 10
entry:robo:tchi:ken: 1 6 5
entry:robo:tchi:ken: 2 11 5
entry:robo:tchi:ken: 3 16 4
entry:stewiegriffin:lois:brian: 1 6 14
entry:stewiegriffin:lois:brian: 3 25 6
entry:stewiegriffin:lois:brian: 2 20 5
entry:willie: 1 6 7
现在我们已经知道了每个字符串里每对冒号的位置信息,剩下要做的就是把这些信息传递给 SUBSTR
函数以提取出目标值。我们想要创建一个 3 列的结果集,因此需要调用 DECODE
函数评估上述笛卡儿积里的 ID
。
with cartesian as (
select level id
from dual
connect by level <= 100
)
select decode(id,1,substr(strings,p1+1,p2-1)) val1,
decode(id,2,substr(strings,p1+1,p2-1)) val2,
decode(id,3,substr(strings,p1+1,p2-1)) val3
from (
select v.strings,
c.id,
instr(v.strings,':',1,c.id) p1,
instr(v.strings,':',1,c.id+1)-instr(v.strings,':',1,c.id) p2
from v,cartesian c
where c.id <= (length(v.strings)-length(replace(v.strings,':')))-1
)
order by 1
VAL1 VAL2 VAL3
--------------- --------------- --------------
moe
petergriffin
quagmire
robo
stewiegriffin
willie
lois
meg
mayorwest
tchi
brian
sizlack
chris
cleveland
flanders
ken
最后,基于 STRINGS
分组并针对 SUBSTR
的返回值调用聚合函数以生成更有可读性的结果集。
with cartesian as (
select level id
from dual
connect by level <= 100
)
select max(decode(id,1,substr(strings,p1+1,p2-1))) val1,
max(decode(id,2,substr(strings,p1+1,p2-1))) val2,
max(decode(id,3,substr(strings,p1+1,p2-1))) val3
from (
select v.strings,
c.id,
instr(v.strings,':',1,c.id) p1,
instr(v.strings,':',1,c.id+1)-instr(v.strings,':',1,c.id) p2
from v,cartesian c
where c.id <= (length(v.strings)-length(replace(v.strings,':')))-1
)
group by strings
order by 1
VAL1 VAL2 VAL3
--------------- --------------- ---------------
moe sizlack
petergriffin meg chris
quagmire mayorwest cleveland
robo tchi ken
stewiegriffin lois brian
willie
flanders
严格来讲,P2 是两个冒号之间的距离,而不是右侧冒号的索引位置。——译者注