Last week I visited a Dataware house that pulls data over the WAN from several OLTP servers setup as merge Publishers. One of the main OLTP servers I was asked to look closely, had a very high processor utilization and consistent spikes of processor queue length because of high lock escallations and blocking. The machine was configured pretty fine with adequate hardware resources. All productions tables were fairly defragmented, partitioned; tempdb on seperate SCISIs, blah blah blah. Yet over all I/O was sky high and processor spikes were shouting that there's some thing fairly wrong.
During the investigation, I found that the merge replication agent was configured to run every 5 minutes on the server. I executed the following query that showed me the potential cuase of the problem:
select min(coldate) min_repldate, max(coldate) max_repldate , count(*) rows_replicated
from msmerge_contents a
join msmerge_genhistory b on a.generation=b.generation and a.tablenick=b.art_nick
where b.genstatus<>0
________________________________________
min_repldate, max_repldate, rows_replicated
2008-01-01 ,2008-06-08 ,12010000
There were around 120 million records in the replication history table since last six months.
Since replication doesn't automatically delete old replicated transactions from these tables except some of the tables in distribution database; each time the replication agent used to kick off, queries and entries were made to these highly fragmented tables causing high I/O and processor time and the user activity involving published tables was being blocked.
Snapshot was not adequite solution because the servers were on the WAN and published tables were substantially large.
I used following command to purge the history records older than 7 days to bring relief to the server.
use publisheddb
GO
begin tran
select count(*) from msmerge_contents with(tablockx)
select count(*) from MSmerge_genhistory with(tablockx)
declare @coldate datetime
select @coldate = getdate()-7
delete MSmerge_contents from MSmerge_contents a, MSmerge_genhistory b where a.generation = b.generation and a.tablenick=b.art_nick and b.coldate < @coldate and a.generation > 0
delete MSmerge_genhistory where coldate < @coldate and pubid is not null
commit
GO
dbcc dbreindex ( MSmerge_contents)
GO
dbcc dbreindex ( MSmerge_genhistory)
GO
update statistics MSmerge_contents
GO
update statistics MSmerge_genhistory
GO
Ideally, we need to keep these two tables (msmerge_contents, msmerge_genhistory) as narrowed as possible. That helps replication to complete quickly and avoid locking/blocking for long time.
There should be a scheduled routine to purge all previously completed transaction daily/weekly depending upon the setup and environement.
2 comments:
You should not have to clean the history table if your retention period is set up to a value different from 0. At least with SP3 now, this seems to work fine.
But yes, you should always keep an eye on how much data are stored in these tables, because this could be the reason for some blocking/timeout issues.
Hi, thank you for your input.
The retention is handled by the sp_mshistory_cleanup stored-procedure in the Agent history clean up: distribution Job.
This doesn't touches or purges the msmerge_contents table.
Although it contains one record only for every replicated row in an article, yet it becomes way too bigger in size and causes problem and we still have to keep an eye on it and purge it manually when needed.
Howdy!
Post a Comment