SQL Server Group by Date
Here we look at how create a query which groups by date in SQL Server.
This is usually useful when creating a report which can give you a daily total count for instance. We need a method of flooring the datetime into a date, that is removing the time part of the query, so we just have the date value.
In the below examples, I'll group the data by date, count how many items have that date, and then order by the items having the most items first. This is useful if you want to find out which days had the most activity, like which day was the most popular.
CAST as Date
In this example, we cast our datetime value as a date, which then allows us to group a datetime by it's date part:
SELECT CAST([CreatedDate] as Date), count(*)
FROM [dbo].[Pages]
GROUP BY CAST([CreatedDate] as Date)
ORDER BY COUNT(*) DESC
DATEDIFF and DATEADD
In this example, we use a combination of DATEDIFF and DATEADD to produce a floored date of the datetime.
SELECT DATEADD(dd, DATEDIFF(dd, 0, [CreatedDate]),0) , count(*)
FROM [dbo].[Pages]
GROUP BY DATEADD(dd, DATEDIFF(dd, 0, [CreatedDate]),0)
ORDER BY COUNT(*) DESC
CONVERT to VARCHAR
In this example, we use a CONVERT to VARCHAR.
SELECT CONVERT(VARCHAR(10),[CreatedDate],112), count(*)
FROM [dbo].[Pages]
GROUP BY CONVERT(VARCHAR(10),[CreatedDate],112)
ORDER BY COUNT(*) DESC
So here the datetime is converted to a varchar using ISO standard yyyymmdd. The first problem I find with this method is in the execution plan, which reports a warning:
Type conversion in expression (CONVERT(varchar(10), [dbo].[Pages].[CreatedDate],112)) may affect "CardinalityEstimate" in query plan choice
Using SSMS, on 'Analyze Actual Execution Plan' the Showplan Analysis click for more information... shows:
One of the most important inputs for the Query Optimizer to choose an optimal execution plan is the estimated number of rows to be retrieved per operator. These estimations model the amount of data to be processed by the query, and therefore drive cost estimation. The models used by the process of estimating number of rows, called Cardinality Estimation, may have limitations. The accuracy of those models depends on how closely they correspond to the actual data distribution, correlation, chosen parameters, and how closely statistics, the main input for Cardinality Estimation, model all aspects of actual data.
In the real, what I've found that a 8th gen i7 laptop will handle this quite fast, but on deploying this to Azure, it slows down quite substantially, so one of the other methods is probably preferred. You'll have to see what works best with your data and your platform.