SQL 识别字符串里的数字字符|极客教程

SQL 识别字符串里的数字字符，你有一列包含字符的数据。不过，这些数据里不仅有数字，还有其他字符。考虑下面的视图 V。

SQL 识别字符串里的数字字符问题

你有一列包含字符的数据。不过，这些数据里不仅有数字，还有其他字符。考虑下面的视图 V。

create view V as
select replace(mixed,' ','') as mixed
  from (
select substr(ename,1,2)||
       cast(deptno as char(4))||
       substr(ename,3,2) as mixed
  from emp
 where deptno = 10
 union all
select cast(empno as char(4)) as mixed
  from emp
 where deptno = 20
 union all
select ename as mixed
  from emp
 where deptno = 30
       ) x
　
select * from v
　
MIXED
---------------
CL10AR
KI10NG
MI10LL
7369
7566
7788
7876
7902
ALLEN
WARD
MARTIN
BLAKE
TURNER
JAMES

你希望筛选出只包含数字或至少有一个数字的行。如果既有数字又有其他字符，你希望删除非数字字符，只返回数字。对于以上示例数据而言，你希望得到下面的结果集。

SQL 识别字符串里的数字字符解决方案

函数 REPLACE 和 TRANSLATE 对于操作字符串和单个字符非常有用。关键在于把全部数字替换为某个字符，这样就能通过读取该字符的方式很方便地隔离和识别数字字符。
DB2
使用函数 TRANSLATE、REPLACE 和 POSSTR 分离出每一行的数字字符。还需要在视图 V 里调用 CAST。否则的话，会因为类型转换错误导致视图创建失败。你需要使用 REPLACE 函数删除无关的空白字符，以便转换为固定长度的 CHAR 类型。

 1 select mixed old,
 2        cast(
 3          case
 4          when
 5             replace(
 6           translate(mixed,'9999999999','0123456789'),'9','') = ''
 7          then
 8             mixed
 9          else replace(
10             translate(mixed,
11                repeat('#',length(mixed)),
12              replace(
13               translate(mixed,'9999999999','0123456789'),'9','')),
14                       '#','')
15           end as integer ) mixed
16   from V
17  where posstr(translate(mixed,'9999999999','0123456789'),'9') > 0

MySQL
MySQL 的语法略有不同，下面是视图 V 的定义。

create view V as
select concat(
         substr(ename,1,2),
         replace(cast(deptno as char(4)),' ',''),
         substr(ename,3,2)
       ) as mixed
  from emp
 where deptno = 10
 union all
select replace(cast(empno as char(4)), ' ', '')
  from emp where deptno = 20
 union all
select ename from emp where deptno = 30

由于 MySQL 不支持 TRANSLATE 函数，我们必须遍历每一行字符串，并评估每一个字符。

 1 select cast(group_concat(c order by pos separator '') as unsigned)
 2        as MIXED1
 3   from (
 4 select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
 5   from V,
 6        ( select id pos from t10 ) iter
 7  where iter.pos <= length(v.mixed)
 8    and ascii(substr(v.mixed,iter.pos,1)) between 48 and 57
 9        ) y
10  group by mixed
11  order by 1

Oracle
使用函数 TRANSLATE、REPLACE 和 INSTR 分离出每一行的数字字符。在视图 V 中调用 CAST 不是必须的。使用 REPLACE 函数删除无关的空白字符，以便转换为固定长度的 CHAR 类型。如果希望在视图的定义里使用显式类型转换，不妨转换为 VARCHAR2 类型。

 1 select to_number (
 2        case
 3        when
 4           replace(translate(mixed,'0123456789','9999999999'),'9')
 5          is not null
 6        then
 7             replace(
 8           translate(mixed,
 9             replace(
10          translate(mixed,'0123456789','9999999999'),'9'),
11                    rpad('#',length(mixed),'#')),'#')
12        else
13              mixed
14        end
15        ) mixed
16  from V
17 where instr(translate(mixed,'0123456789','9999999999'),'9') > 0

PostgreSQL
使用函数 TRANSLATE、REPLACE 和 STRPOS 分离出每一行的数字字符。在视图 V 中调用 CAST 不是必须的。使用 REPLACE 函数删除无关的空白字符，以便转换为固定长度的 CHAR 类型。如果希望在视图的定义里使用显式类型转换，建议转换为 VARCHAR2 类型。

 1 select cast(
 2        case
 3        when
 4         replace(translate(mixed,'0123456789','9999999999'),'9','')
 5         is not null
 6        then
 7           replace(
 8          translate(mixed,
 9            replace(
10          translate(mixed,'0123456789','9999999999'),'9',''),
11                    rpad('#',length(mixed),'#')),'#','')
12        else
13          mixed
14        end as integer ) as mixed
15    from V
16  where strpos(translate(mixed,'0123456789','9999999999'),'9') > 0

SQL Server
使用内置函数 ISNUMERIC 和通配符搜索能很容易识别出含有数字的字符串，但是 SQL Server 不支持 TRANSLATE 函数，我们无法高效地从字符串里提取数字字符。

SQL 识别字符串里的数字字符扩展知识

TRANSLATE 函数在这里非常有用，有了它，我们很容易把数字字符从其他字符分离识别出来。关键在于把所有数字先替换为某个字符。这样的话，我们只需要搜索一个既定字符，而不必去匹配不同的数字。
DB2、Oracle 和 PostgreSQL
这几种数据库的语法稍有不同，但方法相同。这里我选择讨论 PostgreSQL 的解决方案。
真正的工作是由函数 TRANSLATE 和 REPLACE 完成的。为了得到最终的结果集，需要多次使用这两个函数，下面的查询包含了每一次使用函数的情况。

select mixed as orig,
translate(mixed,'0123456789','9999999999') as mixed1,
replace(translate(mixed,'0123456789','9999999999'),'9','') as mixed2,
 translate(mixed,
 replace(
 translate(mixed,'0123456789','9999999999'),'9',''),
          rpad('#',length(mixed),'#')) as mixed3,
 replace(
 translate(mixed,
 replace(
translate(mixed,'0123456789','9999999999'),'9',''),
         rpad('#',length(mixed),'#')),'#','') as mixed4
  from V
 where strpos(translate(mixed,'0123456789','9999999999'),'9') > 0
　
  ORIG  | MIXED1 | MIXED2 | MIXED3 | MIXED4 | MIXED5
--------+--------+--------+--------+--------+--------
 CL10AR | CL99AR | CLAR | ##10## | 10       |     10
 KI10NG | KI99NG | KING | ##10## | 10       |     10
 MI10LL | MI99LL | MILL | ##10## | 10       |     10
 7369   | 9999   |      | 7369   | 7369     |   7369
 7566   | 9999   |      | 7566   | 7566     |   7566
 7788   | 9999   |      | 7788   | 7788     |   7788
 7876   | 9999   |      | 7876   | 7876     |   7876
 7902   | 9999   |      | 7902   | 7902     |   7902

首先要注意到，不包含任何数字的行会被删掉。仔细阅读上述结果集的每一列之后，我们就会理解其工作原理。被筛选出来的行包括 ORIG 列的值和构成最终结果集的行。首先，调用 TRANSLATE 函数把所有数字都转换为 9（事实上，我们可以选择任何数字，这里的 9 是任意选的），这一结果就是 MIXED1 列的值。现在所有数字都变成了 9，它们能被当作一种单一字符来处理。然后，通过调用 REPLACE 函数删除所有的数字。由于数字都已经被替换成了 9，REPLACE 函数只需简单地搜索 9，就可以删除它们。这一结果用 MIXED2 列来表示。下一步，为了计算 MIXED3，需要使用 MIXED2 的返回值。把 MIXED2 列和 ORIG 列进行比较。如果 ORIG 匹配到了 MIXED2 的任意字符，那么就调用 TRANSLATE 函数将其替换为 #。MIXED3 列的结果显示出这些字母（而不是数字）已经被挑出来，并转换为某个字符（#）。现在所有非数字字符都变成了 #，因而它们也能被当作一种单一字符来处理。下一步，为了得到 MIXED4，调用 REPLACE 函数找到并删除每一行里的 # 字，剩余的部分就只有数字。最后，把数值字符转换为数字形式。现在已经完成了所有步骤，我们也就明白了 WHERE 子句的工作原理。MIXED1 的结果被传递给 STRPOS 函数，如果发现了一个 9（定位字符串中第一次出现 9 的位置），那么函数返回值一定大于 0。也就是说，若返回值大于 0，那么意味着该行至少存在一个数字，所以这一行应该被保留下来。
MySQL
首先遍历每个字符串，评估每个字符并判断其是否为数字。

select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
  from V,
     ( select id pos from t10 ) iter
 where iter.pos <= length(v.mixed)
 order by 1,2
　
+--------+------+------+
| mixed  | pos  | c    |
+--------+------+------+
| 7369   |    1 | 7    |
| 7369   |    2 | 3    |
| 7369   |    3 | 6    |
| 7369   |    4 | 9    |
...
| ALLEN  |    1 | A    |
| ALLEN  |    2 | L    |
| ALLEN  |    3 | L    |
| ALLEN  |    4 | E    |
| ALLEN  |    5 | N    |
...
　
| CL10AR |    1 | C    |
| CL10AR |    2 | L    |
| CL10AR |    3 | 1    |
| CL10AR |    4 | 0    |
| CL10AR |    5 | A    |
| CL10AR |    6 | R    |
+--------+------+------+

现在可以单独评估字符串中的每个字符，接下来需要筛选出在 C 列中有一个数字的行。

select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
  from V,
      ( select id pos from t10 ) iter
 where iter.pos <= length(v.mixed)
   and ascii(substr(v.mixed,iter.pos,1)) between 48 and 57
 order by 1,2
　
+--------+------+------+
| mixed  | pos  | c    |
+--------+------+------+
| 7369   |    1 | 7    |
| 7369   |    2 | 3    |
| 7369   |    3 | 6    |
| 7369   |    4 | 9    |
...
| CL10AR |    3 | 1    |
| CL10AR |    4 | 0    |
...
+--------+------+------+

到这一步，C 列只剩下数字。下面调用 GROUP_CONCAT 函数串接这些数字，并形成 MIXED 列的数字。需要将最后的结果转换为数字类型。

select cast(group_concat(c order by pos separator '') as unsigned)
         as MIXED1
  from (
select v.mixed, iter.pos, substr(v.mixed,iter.pos,1) as c
  from V,
      ( select id pos from t10 ) iter
 where iter.pos <= length(v.mixed)
  and ascii(substr(x.mixed,iter.pos,1)) between 48 and 57
       ) y
  group by mixed
  order by 1
　
+--------+
| MIXED1 |
+--------+
|    10  |
|    10  |
|    10  |
|  7369  |
|  7566  |
|  7788  |
|  7876  |
|  7902  |
+--------+