Recently I came across an interesting collation related problem while using a Greenplum (Postgres 8.2) database setup with an UTF-8 collation.
Each character column that undergoes an order by, min() or max() will be sorted according to that collation.
But in some cases, I needed to perform ordering or min(), max() aggregations based on the ASCII content of character columns - luckily, this is quite easily achieved on Greenplum using operator classes and custom aggregates:
create temp table col_test (a text);
insert into col_test (a) values ('"quoted"'),('Abc'),('123'),('Öfföff'),('Zürich'),('Zuhause'),('Zahnpasta');
--
-- order by UTF-8 content
--
select * from col_test order by a asc;
123
Abc
Öfföff
"quoted"
Zahnpasta
Zuhause
Zürich
select * from col_test order by a desc;
Zürich
Zuhause
Zahnpasta
"quoted"
Öfföff
Abc
123
--
-- order by ASCII content
--
select * from col_test order by a using ~<~;
"quoted"
123
Abc
Zahnpasta
Zuhause
Zürich
Öfföff
select * from col_test order by a using ~>~;
Öfföff
Zürich
Zuhause
Zahnpasta
Abc
123
"quoted"
The min(), max() aggregation can be taken care of by creating two custom aggregates:
drop aggregate ascii_max(bpchar) cascade;
drop aggregate ascii_min(bpchar) cascade;
create or replace function ascii_bpchar_larger(prev bpchar, curr bpchar) returns bpchar AS $$
begin if (prev ~>~ curr) then return prev; end if; return curr; end; $$ language plpgsql immutable strict;
create or replace function ascii_bpchar_smaller(prev bpchar, curr bpchar) returns bpchar AS $$
begin if (prev ~<~ curr) then return prev; end if; return curr; end; $$ language plpgsql immutable strict;
create aggregate ascii_max(bpchar) ( sfunc = ascii_bpchar_larger, stype = bpchar, sortop = '~>~' );
create aggregate ascii_min(bpchar) ( sfunc = ascii_bpchar_smaller, stype = bpchar, sortop = '~<~' );
The custom aggregates can be used like any builtin aggregate:
--
-- aggregate by UTF-8 content
--
select min(a), max(a) from col_test;
min | max
-----+--------
123 | Zürich
--
-- aggregate by ASCII content
--
select ascii_min(a), ascii_max(a) from col_test;
ascii_min | ascii_max
-----------+-----------
"quoted" | Öfföff
See http://www.postgresql.org/docs/8.2/static/sql-createaggregate.html for more information regarding custom aggregates and http://www.postgresql.org/docs/8.2/static/indexes-opclass.html regarding operator classes.
Montag, 20. April 2015
Abonnieren
Posts (Atom)