Friday, April 16, 2010

Spin locks in sql 2005 crashing sql2005

how to resolve spinlocks in 2005 as you can see below this version of sql2005 has service pack 3 and cu5 on it.
I've documented in red what is going on with the system.
0:005> lmv msqlservr

start end module name

00000000`01000000 00000000`03662000 sqlservr (deferred)

Mapped memory image file: D:\SymCache\sqlservr.exe\4A7245F62662000\sqlservr.exe

Image path: C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Binn\sqlservr.exe

Image name: sqlservr.exe

Timestamp: Fri Jul 31 09:16:38 2009 (4A7245F6)

CheckSum: 025DBEE2

ImageSize: 02662000

File version: 2005.90.4230.0

Product version: 9.0.4230.0

File flags: 0 (Mask 3F)

File OS: 40000 NT Base

File type: 1.0 App

File date: 00000000.00000000

Translations: 0409.04e4

CompanyName: Microsoft Corporation

ProductName: Microsoft SQL Server

InternalName: SQLSERVR

OriginalFilename: SQLSERVR.EXE

ProductVersion: 9.00.4230.00

FileVersion: 2005.090.4230.00

FileDescription: SQL Server Windows NT - 64 Bit

LegalCopyright: © Microsoft Corp. All rights reserved.

LegalTrademarks: Microsoft® is a registered trademark of Microsoft Corporation. Windows(TM) is a trademark of Microsoft Corporation

Comments: NT AMD64



And you have 8 CPUs there installed on the server, while all of them were executing the following:

SOS_Scheduler:3BA4080 m_id:0 status:VISIBLE ONLINE

m_pCurrentWorker:9FF61C0 Debug id:147



# Child-SP RetAddr Call Site

00 00000000`48c4ba58 00000000`77d705d6 ntdll!ZwDelayExecution+0xa

01 00000000`48c4ba60 00000000`0193a631 kernel32!SleepEx+0xaf

02 00000000`48c4bb00 00000000`0168aeb7 sqlservr!SpinlockBase::Sleep+0x1566c1

03 00000000`48c4bc80 00000000`01dc31c6 sqlservr!SpinlockBase::SpinToAcquire+0x117

04 00000000`48c4bcd0 00000000`02173ddc sqlservr!TSyncHashTable_EntryAccessor,0,16,&CQSIndexIdent::UlIndexIdentHash>::TSyncHashTable_EntryAccessor,0,16,&CQSIndexIdent::UlIndexIdentHash>+0x76

05 00000000`48c4bd10 00000000`019e2cc9 sqlservr!CQSIndexStatsMgr::AddNewMissingIndex+0x7c

06 00000000`48c4bda0 00000000`016df878 sqlservr!CIdxSuggestion::Register+0x18a819

07 00000000`48c4bfb0 00000000`0173b192 sqlservr!COptExpr::PqteConvert+0x1d8

08 00000000`48c4c040 00000000`016df77d sqlservr!CPhyOp_RestrRemap::PqteConvert+0xc2

09 00000000`48c4c2b0 00000000`0173ab18 sqlservr!COptExpr::PqteConvert+0xdd

0a 00000000`48c4c340 00000000`016ef4f7 sqlservr!COptExpr::PqteConvertTree+0x78

0b 00000000`48c4c4d0 00000000`01712037 sqlservr!COptContext::PcxteOptimizeQuery+0x7c7

0c 00000000`48c4e880 00000000`0170658d sqlservr!CQuery::Optimize+0x217

0d 00000000`48c4ea10 00000000`01509e87 sqlservr!CQuery::PqoBuild+0x54d

0e 00000000`48c4ee50 00000000`014f0c0a sqlservr!CStmtQuery::InitQuery+0x167

0f 00000000`48c4eef0 00000000`01724abd sqlservr!CStmtSelect::Init+0x8a

10 00000000`48c4ef80 00000000`0157f15b sqlservr!CCompPlan::FCompileStep+0x3ed

11 00000000`48c4f040 00000000`014e8af1 sqlservr!CSQLSource::FCompile+0x6eb

12 00000000`48c4f2d0 00000000`016786f7 sqlservr!CSQLSource::FCompWrapper+0x121

13 00000000`48c4f390 00000000`01669db2 sqlservr!CSQLSource::Transform+0x377

14 00000000`48c4f450 00000000`0166ae92 sqlservr!CSQLSource::Execute+0x292

15 00000000`48c4f5a0 00000000`0166a9ad sqlservr!process_request+0x312

16 00000000`48c4f830 00000000`0167139e sqlservr!process_commands+0x3fd

17 00000000`48c4faf0 00000000`01670b89 sqlservr!SOS_Task::Param::Execute+0xee

18 00000000`48c4fc00 00000000`01676ca4 sqlservr!SOS_Scheduler::RunTask+0xc9

19 00000000`48c4fc90 00000000`017afcb7 sqlservr!SOS_Scheduler::ProcessTasks+0xb4

1a 00000000`48c4fd00 00000000`014c5a79 sqlservr!SchedulerManager::WorkerEntryPoint+0xe7

1b 00000000`48c4fda0 00000000`017b02c0 sqlservr!SystemThread::RunWorker+0x59

1c 00000000`48c4fde0 00000000`017f9b88 sqlservr!SystemThreadDispatcher::ProcessWorker+0x130

1d 00000000`48c4fe80 00000000`781337a7 sqlservr!SchedulerManager::ThreadEntryPoint+0x128

1e 00000000`48c4ff20 00000000`78133864 msvcr80!_callthreadstartex+0x17

1f 00000000`48c4ff50 00000000`77d6b6da msvcr80!_threadstartex+0x84

20 00000000`48c4ff80 00000000`00000000 kernel32!BaseThreadStart+0x3a



0:270> ub 00000000`0168aeb7

sqlservr!SpinlockBase::SpinToAcquire+0xf4:

00000000`0168ae94 e970ffffff jmp sqlservr!SpinlockBase::SpinToAcquire+0x69 (00000000`0168ae09)

00000000`0168ae99 458bcd mov r9d,r13d

00000000`0168ae9c 458bc6 mov r8d,r14d

00000000`0168ae9f 498bd4 mov rdx,r12

00000000`0168aea2 488bcf mov rcx,rdi

00000000`0168aea5 4889742428 mov qword ptr [rsp+28h],rsi

00000000`0168aeaa c744242001000000 mov dword ptr [rsp+20h],1

00000000`0168aeb2 e8b9901500 call sqlservr!SpinlockBase::Sleep (00000000`017e3f70)



0:147> u sqlservr!SpinlockBase::Sleep

sqlservr!SpinlockBase::Sleep:

00000000`017e3f70 4053 push rbx

00000000`017e3f72 55 push rbp

00000000`017e3f73 56 push rsi

00000000`017e3f74 57 push rdi

00000000`017e3f75 4154 push r12

00000000`017e3f77 4155 push r13

00000000`017e3f79 4157 push r15

00000000`017e3f7b 4881ec40010000 sub rsp,140h



0:147> dq 00000000`48c4bb00+0x140+8+8+8

00000000`48c4bc58 00000000`801c3428 00000000`03ba0508

00000000`48c4bc68 00000000`00001e68 00000000`00002710

00000000`48c4bc78 00000000`0168aeb7 00000000`00000000

00000000`48c4bc88 00000002`0f7e0a20 00000000`801b82c8

00000000`48c4bc98 00000000`01c5c471 00000000`00000001

00000000`48c4bca8 00000000`03ba0508 00000000`0009001f

00000000`48c4bcb8 00000000`48c4bdd8 00000000`00000001

00000000`48c4bcc8 00000000`01dc31c6 00000000`801c3418



So the rdi is on the stack, and we found it to be 00000000`801c3428, and it told us the spin lock was owned by the thread whose id is 7396,

0:147> ~~[0n7396]s

ntdll!ZwSignalAndWaitForSingleObject+0xa:

00000000`77ef1b5a c3 ret

0:270> kL

Child-SP RetAddr Call Site

00000000`5f3bbab8 00000000`77d44ebb ntdll!ZwSignalAndWaitForSingleObject+0xa

00000000`5f3bbac0 00000000`0165de56 kernel32!SignalObjectAndWait+0x133

00000000`5f3bbb70 00000000`01662d87 sqlservr!SOS_Scheduler::SwitchContext+0x316

00000000`5f3bbe60 00000000`0166f5a0 sqlservr!SOS_Scheduler::Suspend+0x97

00000000`5f3bbea0 00000000`019518a1 sqlservr!SOS_Event::Wait+0x150

00000000`5f3bbf00 00000000`01673a40 sqlservr!SOS_UnfairMutexPair::LongWait+0x3fef51

00000000`5f3bbf80 00000000`01676b20 sqlservr!CMemThread::Free+0x4e0

00000000`5f3bbfe0 00000000`02173e1d sqlservr!commondelete+0x30



0000000`5f3bc010 00000000`019e2cc9 sqlservr!QSIndexStatsMgr::AddNewMissingIndex+0xbd

00000000`5f3bc0a0 00000000`016df878 sqlservr!CIdxSuggestion::Register+0x18a819

00000000`5f3bc2b0 00000000`0173ab18 sqlservr!COptExpr::PqteConvert+0x1d8

00000000`5f3bc340 00000000`016ef4f7 sqlservr!COptExpr::PqteConvertTree+0x78

00000000`5f3bc4d0 00000000`01712037 sqlservr!COptContext::PcxteOptimizeQuery+0x7c7

00000000`5f3be880 00000000`0170658d sqlservr!CQuery::Optimize+0x217

00000000`5f3bea10 00000000`01509e87 sqlservr!CQuery::PqoBuild+0x54d

00000000`5f3bee50 00000000`014f0c0a sqlservr!CStmtQuery::InitQuery+0x167

00000000`5f3beef0 00000000`01724abd sqlservr!CStmtSelect::Init+0x8a

00000000`5f3bef80 00000000`0157f15b sqlservr!CCompPlan::FCompileStep+0x3ed

00000000`5f3bf040 00000000`014e8af1 sqlservr!CSQLSource::FCompile+0x6eb

00000000`5f3bf2d0 00000000`016786f7 sqlservr!CSQLSource::FCompWrapper+0x121



When the query optimizer of SQL Server 2005 generates a query plan, the query optimizer may wait for a spinlock to be released. The spinlock is used to access internal structure to record information about the missing indexes for the query plan. While the query optimizer is waiting for the spinlock, the query optimizer does not release the scheduler. The query optimizer does not obtain the spinlock and keeps holding the scheduler. All the threads blocked in the spin lock, which cause the SQL Server cannot respond to the CheckServcieAlive request from the Windows cluster, which resulted in the SQL Server resource failure and SQL group failover.



We can see it hit a known issue in this build of SQL Server:

974205 FIX: Error message in the Errorlog file of SQL Server 2005 or of SQL Server 2008 after the SQL Server service stops responding: "Timeout occurred while waiting for latch"

http://support.microsoft.com/default.aspx?scid=kb;EN-US;974205



It was fixed in the SQL Server 2005 SP3 CU6. You can download it from the following URL:

http://support.microsoft.com/kb/974648/LN/

If you get this error or similar install the SQL Server 2005 SP3 CU6 at your earliest convenience.

No comments:

Post a Comment