« Mental BST events SQL with cluster tags included | Main | PostgreSQL version of do_sql.py »
October 01, 2009
De-dupe but leave one of each with one SQL query
Using sick sub-queries:
delete from ug_abstract where ug_abstract_id in
-- Get a list of duplicated rows based upon a comparison of the title
(
select ug.ug_abstract_id
from ug_abstract ug, ug_abstract ag
where ug.title = ag.title and ug.ug_abstract_id != ag.ug_abstract_id and ug.ug_abstract_id not in
-- Get the PK from one of each of the rows that have been duplicated
-- then exclude them from the delete run with a "not in"
(select max(ug.ug_abstract_id) as ug_abstract_id
from ug_abstract ug, ug_abstract ag
where ug.title = ag.title and ug.ug_abstract_id != ag.ug_abstract_id group by ug.title)
group by ug.title, ug.ug_abstract_id order by ug.title
)
Tags: PostgreSQL
Posted by pj at October 1, 2009 04:23 PM