Table of Contents

Database cleanup

How to delete large amounts of data while also minimizing chance of system overload

From the administration of the DynamicWeb solution it is possible to setup automatic purge of both files and database data that are older than a given date. One reason for doing this could be to shrink the overall size of the database, as oftentimes you pay for the amount of used space.

However, sometimes one or more tables may have been missed in the automatic purge, causing them to grow significantly. Tables like GeneralLog and EcomOrderDebuggingInfo may grow in size very fast. If you then apply these tables to the automatic purge, you may encounter SqlClient CommandTimeout, as the purge will not be able to complete in due time (default 30 seconds), due to the vast amount of table rows.

But also, delete statements run in a transaction and inflict the log table of the database. This means that the database log will grow significantly in size during the purge operation, and even that it may cause for disk space to run out. Also it will have a significant draw in memory resources that my lead to performance issues/failures experienced in the frontend.

To avoid this overload on system resources it can be useful to delete data in smaller chunks, until the desired amount of data is deleted. Below are two SQL scripts to which do this:

  • identify: script to list tables descending on table size.
  • delete: script that will delete rows of a given table in chunks of 7 days.

Obviously the above approach requires for a date column to be present in the table from which we want to delete rows. For executing the scripts we recommend using Microsoft SQL Server Management Studio.

List tables by size

SELECT 
    t.NAME AS TableName,
    p.rows AS RowCounter,
    SUM(a.used_pages) * 8 / 1024 / 1024 AS UsedSpaceGB,
 SUM(a.used_pages) * 8 / 1024 AS UsedSpaceMB, 
 (SUM(a.total_pages) - SUM(a.used_pages)) * 8 / 1024 AS UnusedSpaceMB
FROM 
    sys.tables t
INNER JOIN      
    sys.indexes i ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a ON p.partition_id = a.container_id
LEFT OUTER JOIN 
    sys.schemas s ON t.schema_id = s.schema_id
WHERE 
    t.NAME NOT LIKE 'dt%' 
    AND t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
GROUP BY 
    t.Name, s.Name, p.Rows
ORDER BY 
    SUM(a.used_pages) desc

The top-most listed tables may then be good candidates for tables that need purging.

Delete rows in chunks of 7 days

In the example below table 'GeneralLog' and EcomOrderDebuggingInfo are used, where every row where year is less than 2024 will be deleted. Remember to point to the proper date column of the table, which in this example is column 'LogDate'. You may want to add additional tables.

-- Create a global temporary log table (accessible from other SSMS windows)
-- By querying this table, you can follow the progress on a long-running cleanup process
IF OBJECT_ID('tempdb..##LogTable') IS NOT NULL
    DROP TABLE ##LogTable;


CREATE TABLE ##LogTable (
    LogID INT IDENTITY(1,1) PRIMARY KEY,
    LogMessage NVARCHAR(255),
    LogTime DATETIME DEFAULT GETDATE()
);

DECLARE @year INT = 2020 -- Starting year. You may want to adjust this
DECLARE @month INT
DECLARE @day INT
DECLARE @maxDay INT
DECLARE @startDate DATETIME
DECLARE @endDate DATETIME
DECLARE @message NVARCHAR(255)
DECLARE @deletedRowsGeneral INT
DECLARE @deletedRowsDebug INT
DECLARE @monthlyDeletedGeneral INT
DECLARE @monthlyDeletedDebug INT

WHILE (@year <= 2024) -- Last year to include. You may want to adjust.
BEGIN
    SET @month = 1 -- Reset month counter

    WHILE (@month <= 12) -- Loop through months
    BEGIN
        -- Reset monthly counters
        SET @monthlyDeletedGeneral = 0
        SET @monthlyDeletedDebug = 0

        SET @day = 1 -- Reset day counter
        SET @maxDay = DAY(EOMONTH(DATEFROMPARTS(@year, @month, 1))) -- Get last day of month

        WHILE (@day <= @maxDay) -- Ensure valid days
        BEGIN
            -- Cap @endDay at the last valid day of the month
            DECLARE @endDay INT = CASE 
                                    WHEN @day + 6 > @maxDay THEN @maxDay 
                                    ELSE @day + 6 
                                  END;

            SET @startDate = DATEFROMPARTS(@year, @month, @day); -- 'YYYY-MM-DD 00:00:00.000'
            SET @endDate = DATEADD(DAY, 1, DATEFROMPARTS(@year, @month, @endDay)); -- 'YYYY-MM-DD 00:00:00.000' (next day start)

            -- Delete records and track row counts
            DELETE FROM GeneralLog
            WHERE LogDate >= @startDate 
            AND LogDate < @endDate;
            SET @deletedRowsGeneral = @@ROWCOUNT;

            DELETE FROM EcomOrderDebuggingInfo
            WHERE OrderDebuggingInfoTime >= @startDate 
            AND OrderDebuggingInfoTime < @endDate;
            SET @deletedRowsDebug = @@ROWCOUNT;

            -- Accumulate monthly totals
            SET @monthlyDeletedGeneral = @monthlyDeletedGeneral + @deletedRowsGeneral;
            SET @monthlyDeletedDebug = @monthlyDeletedDebug + @deletedRowsDebug;

            -- Move to the next week
            SET @day = @day + 7;

	    -- Pause for 1 seconds to give time to other processes
            WAITFOR DELAY '00:00:01';

        END -- End week loop

        -- Log total deletions for the month
        SET @message = 'Deleted ' + CAST(@monthlyDeletedGeneral AS NVARCHAR(10)) 
                     + ' from GeneralLog, ' + CAST(@monthlyDeletedDebug AS NVARCHAR(10)) 
                     + ' from EcomOrderDebuggingInfo for ' 
                     + CAST(@month AS NVARCHAR(2)) + '-' + CAST(@year AS NVARCHAR(4));

        INSERT INTO ##LogTable (LogMessage) VALUES (@message);

        -- Pause for 5 seconds to give time to other processes
        WAITFOR DELAY '00:00:05';

        SET @month = @month + 1;
    END -- End month loop

    SET @year = @year + 1;
END -- End year loop

In some cases we have seen this take several hours to complete, so be patient. To find a starting year you may in the example of table 'GeneralLog' use

SELECT MIN(LogDate) FROM GeneralLog

to find the lowest date.

Reduce database (files) disk space usage

If a part of your goal is to reduce the size of the database, you may shrink the database after finishing your manual purging. To shrink the database (and implicitly the files), you can execute the following SQL:

DBCC SHRINKDATABASE (YourDatabaseName);

However, notice that this may be a resource-intensive task, especially on large databases, and may lead to high disk I/O and CPU usage. So ensure you do this at a point of time with the least traffic on the website.

To top