In the course of trying to answer a question for my wife this evening, I ran across an interesting type of query that, though it would seem simple and common, did take me a moment to figure out the best approach.
She had a very large set of (poorly cleaned) data that she needed to know how many of each unique set were present. Essentially, the question boils down to “how do you perform a grouping query while simultaneously removing duplicates?”
So, for example, if you had the following data set and wanted to know how many dogs of each color there are?
If you were to do the standard GROUP BY:
against this table, you would get the following:
A B COUNT ---- ----- -------- dog black 2 dog white 1
However, as discussed above, my wife didn’t want that duplicate dog counted.
Here’s the query I used (in PostgreSQL) to achieve that:
A B COUNT ---- ----- -------- dog black 1 dog white 1
Of course, the reason this rarely comes up is because, with good data/models, you really shouldn’t have duplicate rows, but you don’t always control that. :)