64-bit _alloca. How to use from FPC and Delphi?
The C/C++ _alloca function allocates size bytes of space from the Stack. The variation of _alloca presented here will, in addition, align data to a requested value, between 16 and 4096 bytes (in powers of 2). While this _alloca can be used with advantage from C/C++, and most other programming languages including assembly language itself, it was developed with Delphi and Free Pascal Compiler in mind – two compilers that have no similar feature.
Stack reservation-to-commitment mechanism
Each new thread of an application receives a certain amount of contiguous stack space. By default, 1 MB is reserved for the thread, but only the first page (each page being 4096 bytes in size) is initially committed and the next contiguous page is marked as a guard page. When an application reads or writes to the guard page, an exception is triggered causing the OS to commit the guard page and turning the page further down into the new guard page. This is the mechanism to turn reserved into committed stack memory.
When using _alloca to allocate stack memory we need to have in mind this reservation-to-commitment mechanism.
The ASM Code
IFDEF __JWASM__
; compile with uasm64 -c -win64 -Zp8 -archSSE allocafunc64.asm
option frame:auto ;generate SEH-compatible prologues and epilogues
OPTION STACKBASE:RBP
ELSE
; Microsoft MASM: Compile with ml64" -c -Zp8 allocafunc64.asm
ENDIF
.code
OPTION PROLOGUE:NONE
OPTION EPILOGUE:NONE
; rcx=thesize
; rdx=alignm
; r8=accum - optional
_alloca proc public thesize:dword, alignm:dword, accum : ptr
mov r9, [rsp] ; return address
mov ecx, ecx ; zero-extend
mov edx, edx ; zero-extend
cmp rdx, 16
jge @F
mov rdx, 16 ; Minimum alignment to consider in Win 64 is 16 bytes
@@:
cmp rdx, 4096
jle @F
mov rdx, 4096
@@:
lea rax, [rcx]
lea r10, [rsp+8]
sub r10, rax
neg rdx
and r10, rdx
xor r11, r11
lea rax, [rsp+8h]
sub rax, r10
cmovb r10,r11
mov r11,qword ptr gs:[10h] ; Register gs points to the TEB in Windows 64-bit.
; TEB's StackLimit is in gs:[10h]. See below.
cmp r10,r11
jae @exit
and r10w,0F000h
@@:
lea r11,[r11-1000h]
mov byte ptr [r11],0
cmp r10,r11
jne @B
@exit:
sub rsp, rax
cmp r8, 0
jz @F
add dword ptr [r8], eax
@@:
mov [rsp], r9
mov rax, rsp
add rax, 8
ret
_alloca endp
_dealloca proc public accum : ptr
mov rdx, [rsp]
mov r8d, dword ptr [rcx]
mov dword ptr [rcx], 0
add rsp, r8
mov [rsp], rdx
ret
_dealloca endp
end
TEB/TIB seen with WinDbg, showing StackLimit at offset 0x10.
+0x000 ExceptionList : Ptr64 _EXCEPTION_REGISTRATION_RECORD
+0x008 StackBase : Ptr64 Void
+0x010 StackLimit : Ptr64 Void
+0x018 SubSystemTib : Ptr64 Void
+0x020 FiberData : Ptr64 Void</span>
Download Sources and Delphi Demo