Monday, June 09, 2008

Purging Merge Replication History to avoid Locking/Blocking

Last week I visited a Dataware house that pulls data over the WAN from several OLTP servers setup as merge Publishers. One of the main OLTP servers I was asked to look closely, had a very high processor utilization and consistent spikes of processor queue length because of high lock escallations and blocking. The machine was configured pretty fine with adequate hardware resources. All productions tables were fairly defragmented, partitioned; tempdb on seperate SCISIs, blah blah blah. Yet over all I/O was sky high and processor spikes were shouting that there's some thing fairly wrong.
During the investigation, I found that the merge replication agent was configured to run every 5 minutes on the server. I executed the following query that showed me the potential cuase of the problem:

select min(coldate) min_repldate, max(coldate) max_repldate , count(*) rows_replicated
from msmerge_contents a
join msmerge_genhistory b on a.generation=b.generation and a.tablenick=b.art_nick
where b.genstatus<>0


________________________________________
min_repldate, max_repldate, rows_replicated
2008-01-01 ,2008-06-08 ,12010000

There were around 120 million records in the replication history table since last six months.
Since replication doesn't automatically delete old replicated transactions from these tables except some of the tables in distribution database; each time the replication agent used to kick off, queries and entries were made to these highly fragmented tables causing high I/O and processor time and the user activity involving published tables was being blocked.
Snapshot was not adequite solution because the servers were on the WAN and published tables were substantially large.
I used following command to purge the history records older than 7 days to bring relief to the server.

use publisheddb
GO
begin tran
select count(*) from msmerge_contents with(tablockx)

select count(*) from MSmerge_genhistory with(tablockx)
declare @coldate datetime
select @coldate = getdate()-7
delete MSmerge_contents from MSmerge_contents a, MSmerge_genhistory b where a.generation = b.generation and a.tablenick=b.art_nick and b.coldate < @coldate and a.generation > 0

delete MSmerge_genhistory where coldate < @coldate and pubid is not null
commit
GO
dbcc dbreindex ( MSmerge_contents)
GO
dbcc dbreindex ( MSmerge_genhistory)
GO
update statistics MSmerge_contents
GO
update statistics MSmerge_genhistory
GO

Ideally, we need to keep these two tables (msmerge_contents, msmerge_genhistory) as narrowed as possible. That helps replication to complete quickly and avoid locking/blocking for long time.
There should be a scheduled routine to purge all previously completed transaction daily/weekly depending upon the setup and environement.