Recently I came across an interesting collation related problem while using a Greenplum (Postgres 8.2) database setup with an UTF-8 collation.
Each character column that undergoes an order by, min() or max() will be sorted according to that collation.
But in some cases, I needed to perform ordering or min(), max() aggregations based on the ASCII content of character columns - luckily, this is quite easily achieved on Greenplum using operator classes and custom aggregates:
create temp table col_test (a text);
insert into col_test (a) values ('"quoted"'),('Abc'),('123'),('Öfföff'),('Zürich'),('Zuhause'),('Zahnpasta'); 
-- 
-- order by UTF-8 content 
-- 
select * from col_test order by a asc;
123
Abc
Öfföff
"quoted"
Zahnpasta
Zuhause
Zürich
select * from col_test order by a desc;
Zürich
Zuhause
Zahnpasta
"quoted"
Öfföff
Abc
123 
-- 
-- order by ASCII content 
--  
select * from col_test order by a using ~<~;
"quoted"
123
Abc
Zahnpasta
Zuhause
Zürich
Öfföff
select * from col_test order by a using ~>~;
Öfföff
Zürich
Zuhause
Zahnpasta
Abc
123
"quoted" 
The min(), max() aggregation can be taken care of by creating two custom aggregates:
drop aggregate ascii_max(bpchar) cascade;
drop aggregate ascii_min(bpchar) cascade;
create or replace function ascii_bpchar_larger(prev bpchar, curr bpchar) returns bpchar AS $$
begin if (prev ~>~ curr) then return prev; end if; return curr; end; $$ language plpgsql immutable strict;
create or replace function ascii_bpchar_smaller(prev bpchar, curr bpchar) returns bpchar AS $$
begin if (prev ~<~ curr) then return prev; end if; return curr; end; $$ language plpgsql immutable strict;
create aggregate ascii_max(bpchar) ( sfunc = ascii_bpchar_larger, stype = bpchar, sortop = '~>~' );
create aggregate ascii_min(bpchar) ( sfunc = ascii_bpchar_smaller, stype = bpchar, sortop = '~<~' );
The custom aggregates can be used like any builtin aggregate:
--
-- aggregate by UTF-8 content
--
select min(a), max(a) from col_test;
 min |  max  
-----+--------
 123 | Zürich
--
-- aggregate by ASCII content
--
select ascii_min(a), ascii_max(a) from col_test;
 ascii_min | ascii_max
-----------+-----------
 "quoted"  | Öfföff
See http://www.postgresql.org/docs/8.2/static/sql-createaggregate.html for more information regarding custom aggregates and http://www.postgresql.org/docs/8.2/static/indexes-opclass.html regarding operator classes.
Montag, 20. April 2015
Abonnieren
Kommentare (Atom)
 
