​A selection of topics on IT and its application to finance. 
​Send me your comments, questions ​or suggestions by clicking
h​ere


elmama
Splitting a delimited column into many rows

This article is based on an answer I gave to a question on the popular Tek-Tips technical forum where my ID is taupirho. The subject of the question was normalising column data and essentially the OP wanted to split a column into many rows.

mratx (Programmer)12 Jun 12 13:44

I have an old ugly system table that I need to normalize. It contains a pipe delimited part numbers list in one field and a corresponding node in another column.

​Here is an example:

CREATE TABLE lookup
(   part_number VARCHAR2(100),   
node        NUMBER);

INSERT   INTO lookup(part_number, node)
VALUES ('aaabbb|aaaccc|aaaddd|aaaeee', 100);
INSERT   INTO lookup(part_number, node)VALUES ('aaafff', 200);
INSERT   INTO lookup(part_number, node)VALUES ('aaaggg|aaahhh|aaaiii', 300);

SELECT *   FROM lookup;

part_number                      node
=======================
aaabbb|aaaccc|aaaddd|aaaeee     100
aaafff                             200
​aaaggg|aaahhh|aaaiii             300 




I can parse a single row like I want if I use regexp_substr and connect by like this:

SELECT regexp_substr(part_number, regex, 1, LEVEL), node FROM (SELECT part_number, node, '[^|]+' regex FROM lookup WHERE node = 100) CONNECT BY LEVEL <= LENGTH(REGEXP_REPLACE(part_number, regex)) + 1; Output: aaabbb 100 aaaccc 100 aaaddd 100 aaaeee 100

But, if I remove the node = 100 condition, then the results are wrong. I know I could parse through this easily using PL/SQL, but I'd like to see if it's possible with pure SQL first. In case it's not obvious, I'm on Oracle 11g. Thanks for the replies!


taupirho (Programmer)13 Jun 12 6:45

​This is a bit clunky and there are probably better ways but at least its a start.

SQL> column part_number format a40
SQL> select * from lookup;

PART_NUMBER NODE
---------------------------------------- ----------aaabbb|aaaccc|aaaddd|aaaeee 100
aaafff 200
aaaggg|aaahhh|aaaiii 300

1 WITH T AS
2 (SELECT node,part_number c
3 FROM lookup),
4 t2 AS
5 (SELECT node,'|'||c||'|' c FROM T)
6 SELECT distinct node,SUBSTR(c, INSTR(c,'|',1,LEVEL)+1,
​7 INSTR(c, '|', 1, LEVEL+1)- INSTR (c, '|', 1, LEVEL)- 1) "String"
8 FROM t2
9 CONNECT BY LEVEL < LENGTH(c) - LENGTH(REPLACE(c,'|'))
10* order by node
SQL> /
NODE String
---------- --------------------
100 aaabbb
100 aaaccc
100 aaaddd
100 aaaeee
200 aaafff
300 aaaggg
300 aaahhh
300 aaaiii

8 rows selected.


postmratx (Programmer)13 Jun 12 14:11

Thanks for the reply. This is essentially what my regexp does, except you added a DISTINCT to get rid of the duplicates. I appreciate seeing another way to do this, but performance wise, it's just as bad as the regexp version. I guess what I need is a query that won't create duplicates to begin with, and therefore won't require a costly DISTINCT clause to clean it up. Any ideas on that? Thanks again. 


taupirho (Programmer)14 Jun 12 8:24

See if this performs better

SQL>
1 with t as
2 (
3 select node, part_number str from lookup
4 )
5 select trim(x.column_value.extract('e/text()')) cols ,node
6 from t t, table (xmlsequence(xmltype('
' ||
7* replace(t.str,'|','
')|| '
').extract('e/e'))) x

SQL> /

COLS NODE
-------------------- ----------
aaabbb 100
aaaccc 100
aaaddd 100
aaaeee 100
aaafff 200
aaaggg 300
aaahhh 300
aaaiii 300


mratx (Programmer)14 Jun 12 11:48

Wow! Blazing fast! This is night and day compared the wh
at I had, and I would give you 5 stars if I could! Thanks so much for your help!