Database cleanup
How to delete large amounts of data while also minimizing chance of system overload
From the administration of the DynamicWeb solution it is possible to setup automatic purge of both files and database data that are older than a given date. One reason for doing this could be to shrink the overall size of the database, as oftentimes you pay for the amount of used space.
However, sometimes one or more tables may have been missed in the automatic purge, causing them to grow significantly. Tables like GeneralLog and EcomOrderDebuggingInfo may grow in size very fast. If you then apply these tables to the automatic purge, you may encounter SqlClient CommandTimeout, as the purge will not be able to complete in due time (default 30 seconds), due to the vast amount of table rows.
But also, delete statements run in a transaction and inflict the log table of the database. This means that the database log will grow significantly in size during the purge operation, and even that it may cause for disk space to run out. Also it will have a significant draw in memory resources that my lead to performance issues/failures experienced in the frontend.
To avoid this overload on system resources it can be useful to delete data in smaller chunks, until the desired amount of data is deleted. Below are two SQL scripts to which do this:
- identify: script to list tables descending on table size.
- delete: script that will delete rows of a given table in chunks of 7 days.
Obviously the above approach requires for a date column to be present in the table from which we want to delete rows. For executing the scripts we recommend using Microsoft SQL Server Management Studio.
List tables by size
SELECT
t.NAME AS TableName,
p.rows AS RowCounter,
SUM(a.used_pages) * 8 / 1024 / 1024 AS UsedSpaceGB,
SUM(a.used_pages) * 8 / 1024 AS UsedSpaceMB,
(SUM(a.total_pages) - SUM(a.used_pages)) * 8 / 1024 AS UnusedSpaceMB
FROM
sys.tables t
INNER JOIN
sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN
sys.schemas s ON t.schema_id = s.schema_id
WHERE
t.NAME NOT LIKE 'dt%'
AND t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY
SUM(a.used_pages) desc
The top-most listed tables may then be good candidates for tables that need purging.
Delete rows in chunks of 7 days
In the example below table 'GeneralLog' and EcomOrderDebuggingInfo are used, where every row where year is less than 2024 will be deleted. Remember to point to the proper date column of the table, which in this example is column 'LogDate'. You may want to add additional tables.
-- Create a global temporary log table (accessible from other SSMS windows)
-- By querying this table, you can follow the progress on a long-running cleanup process
IF OBJECT_ID('tempdb..##LogTable') IS NOT NULL
DROP TABLE ##LogTable;
CREATE TABLE ##LogTable (
LogID INT IDENTITY(1,1) PRIMARY KEY,
LogMessage NVARCHAR(255),
LogTime DATETIME DEFAULT GETDATE()
);
DECLARE @year INT = 2020 -- Starting year. You may want to adjust this
DECLARE @month INT
DECLARE @day INT
DECLARE @maxDay INT
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
DECLARE @message NVARCHAR(255)
DECLARE @deletedRowsGeneral INT
DECLARE @deletedRowsDebug INT
DECLARE @monthlyDeletedGeneral INT
DECLARE @monthlyDeletedDebug INT
WHILE (@year <= 2024) -- Last year to include. You may want to adjust.
BEGIN
SET @month = 1 -- Reset month counter
WHILE (@month <= 12) -- Loop through months
BEGIN
-- Reset monthly counters
SET @monthlyDeletedGeneral = 0
SET @monthlyDeletedDebug = 0
SET @day = 1 -- Reset day counter
SET @maxDay = DAY(EOMONTH(DATEFROMPARTS(@year, @month, 1))) -- Get last day of month
WHILE (@day <= @maxDay) -- Ensure valid days
BEGIN
-- Cap @endDay at the last valid day of the month
DECLARE @endDay INT = CASE
WHEN @day + 6 > @maxDay THEN @maxDay
ELSE @day + 6
END;
SET @startDate = DATEFROMPARTS(@year, @month, @day); -- 'YYYY-MM-DD 00:00:00.000'
SET @endDate = DATEADD(DAY, 1, DATEFROMPARTS(@year, @month, @endDay)); -- 'YYYY-MM-DD 00:00:00.000' (next day start)
-- Delete records and track row counts
DELETE FROM GeneralLog
WHERE LogDate >= @startDate
AND LogDate < @endDate;
SET @deletedRowsGeneral = @@ROWCOUNT;
DELETE FROM EcomOrderDebuggingInfo
WHERE OrderDebuggingInfoTime >= @startDate
AND OrderDebuggingInfoTime < @endDate;
SET @deletedRowsDebug = @@ROWCOUNT;
-- Accumulate monthly totals
SET @monthlyDeletedGeneral = @monthlyDeletedGeneral + @deletedRowsGeneral;
SET @monthlyDeletedDebug = @monthlyDeletedDebug + @deletedRowsDebug;
-- Move to the next week
SET @day = @day + 7;
-- Pause for 1 seconds to give time to other processes
WAITFOR DELAY '00:00:01';
END -- End week loop
-- Log total deletions for the month
SET @message = 'Deleted ' + CAST(@monthlyDeletedGeneral AS NVARCHAR(10))
+ ' from GeneralLog, ' + CAST(@monthlyDeletedDebug AS NVARCHAR(10))
+ ' from EcomOrderDebuggingInfo for '
+ CAST(@month AS NVARCHAR(2)) + '-' + CAST(@year AS NVARCHAR(4));
INSERT INTO ##LogTable (LogMessage) VALUES (@message);
-- Pause for 5 seconds to give time to other processes
WAITFOR DELAY '00:00:05';
SET @month = @month + 1;
END -- End month loop
SET @year = @year + 1;
END -- End year loop
In some cases we have seen this take several hours to complete, so be patient. To find a starting year you may in the example of table 'GeneralLog' use
SELECT MIN(LogDate) FROM GeneralLog
to find the lowest date.
Reduce database (files) disk space usage
If a part of your goal is to reduce the size of the database, you may shrink the database after finishing your manual purging. To shrink the database (and implicitly the files), you can execute the following SQL:
DBCC SHRINKDATABASE (YourDatabaseName);
However, notice that this may be a resource-intensive task, especially on large databases, and may lead to high disk I/O and CPU usage. So ensure you do this at a point of time with the least traffic on the website.