Recently I came across a client that could not get their Transactional Replication data transfer to complete within a reasonable timeframe. Replication initialisation would work ok but from then on no data changes would make it through to the subscriber database.
Initially I thought that either the Log Reader or Distribution agent had a connectivity issue but why then did the initialisation work.
I then decided to send a Tracer through to determine where the holdup was occurring in the replication topology. Using the Replication Monitor, I expanded the Publication, clicked on the Subscription item and on right hand pane I clicked on the Tracer tab. I then clicked the Insert Tracer button.
As an example below, what I saw was that the Publisher to Distributor and Distributor to Subscriber data transfer status stayed at Pending and would never advance. This pointed to an issue on the Publisher end.
Transactional Replication is based on reading information from the publisher database’s transaction log file, so first I checked if the disk supporting the transaction log file had a bottleneck, no issue was found there. Next I checked the Virtual Log File count (using this script) and there it was, thousands of VLFs. The excessively high number of VLFs was preventing the Log Reader from efficiently reading the replicated records from the transaction log file.
I stopped the Log Reader agent, I then shrunk the transaction log file to 100MB using a DBCC SHRINKFILE statement and then reallocated sufficient space in the transaction log file. The VLF count went to down to 80, I then restarted the Log Reader and bang, data transfers started to flow through in a timely manner.
Root cause was the 1 MB Autogrow option on the Transaction Log file and of course not probably sizing the file initially, i.e. the file grew by 1 MB thousands of times and therefore generating an excessive number of VLFs.