« Mental BST events SQL with cluster tags included | Main | PostgreSQL version of do_sql.py »

October 01, 2009

De-dupe but leave one of each with one SQL query

Using sick sub-queries:

delete from ug_abstract where ug_abstract_id in 

-- Get a list of duplicated rows based upon a comparison of the title

(

select ug.ug_abstract_id 
from ug_abstract ug, ug_abstract ag 
where ug.title = ag.title  and ug.ug_abstract_id != ag.ug_abstract_id and ug.ug_abstract_id not in 


-- Get the PK from one of each of the rows that have been duplicated
-- then exclude them from the delete run with a "not in"

(select max(ug.ug_abstract_id) as ug_abstract_id 
from ug_abstract ug, ug_abstract ag 
where ug.title = ag.title  and ug.ug_abstract_id != ag.ug_abstract_id group by ug.title) 


group by ug.title, ug.ug_abstract_id order by ug.title

)

Tags: PostgreSQL

Posted by pj at October 1, 2009 04:23 PM

Comments