Calling Go funcs from asm and JITed code
Original (ru): https://habr.com/ru/post/489482/.
Intro
As long as your assembler code does something simple, you’re golden.
As soon as you’ll need to call a Go function from it, the first suggestion that you’ll probably get: don’t do it. Perhaps you would like to follow that advice, especially if you’re unsure why you needed it in the first place.
But what if you can’t avoid that? Read-on, we have a cure.
You’ve run into a really hairy area of asm code. My first suggestion is not try to call from assembler into Go. – Ian Lance Taylor
The Go calling convention
The first thing that we need to understand is how to pass function arguments and get its result.
I would recommend the Go functions in assembly language by Michael Munday.
Normally, the calling convention is a very platform-dependent thing. We’ll dig into GOARCH=amd64
because this is what I’m most familiar with.
Here are some facts:
- All arguments are passed via the stack, expect the “context” in closures which occupies
DX
(%rdx) register. - Function results are returned via the stack.
- Callee arguments are stored on the caller frame side.
- Frame allocation and de-allocation is performed by the callee. These actions are performed by the prologues and epilogues. Go assembler inserts them automatically.
If there is not enough goroutine stack space, it will be extended. During that process, Go tries to fix pointers to the stack, so your program can work without knowing that anything changed.
This picture may become outdated if the register-based calling convention will be adopted.
We’ll try to call gofunc
from the asmfunc
:
package main
func asmfunc(x int32) (int32, int32)
func gofunc(a1 int64, a2, a3 int32) (int32, int32) {
return int32(a1) + a2, int32(a1) + a3
}
func main() {
v1, v2 := asmfunc(10)
println(v1, v2) // => 3, 11
}
The asmfunc
is defined as:
//; func asmfunc(x int32) (int32, int32)
TEXT ·asmfunc(SB), 0, $24-12
MOVL x+0(FP), AX
MOVQ $1, 0(SP) //; First argument (a1 int64)
MOVL $2, 8(SP) //; Second argument (a2 int32)
MOVL AX, 12(SP) //; Third argument (a3 int32)
CALL ·gofunc(SB)
MOVL 16(SP), AX //; Get first result
MOVL 20(SP), CX //; Get second result
MOVL AX, ret+8(FP) //; Return first result
MOVL CX, ret+12(FP) //; Return second result
RET
$24-16 (locals=24 bytes, args=16 bytes)
0 8 12 16 20 SP
locals=24 [a1:8][a2:4][a3:4][ret:4][ret:4]
(ret belongs to asmfunc frame, it stores gofunc results)
0 4 8 12 FP
args=16 [x:4][padding:4][ret:4][ret:4]
(ret belongs to main frame, it stores asmfunc results)
Note that there is 4-byte padding between the function arguments and its result for the alignment. Go requires function results to be pointer-aligned (8-byte on x64-64).
Every argument is aligned, just like fields in a struct.
If the first argument is int32
and the second is int64
, then the offset of the latter will be 8 instead of 4. This is more or less consistent with reflect.TypeOf(T).Align()
result.
Some mistakes related to the function frame size and FP
register usage can be found with go vet
(asmdecl).
If you try to call gofunc
, there should be no problems. But don’t relax just yet.
Pointers and the stackmap
Let’s try to call a Go function with a pointer argument.
package foo
import (
"fmt"
"testing"
)
func foo(ptr *object)
type object struct {
x, y, z int64
}
func printPtr(ptr *object) {
fmt.Println(*ptr)
}
func TestFoo(t *testing.T) {
foo(&object{x: 11, y: 22, z: 33})
}
TEXT ·foo(SB), 0, $8-8
MOVQ ptr+0(FP), AX
MOVQ AX, 0(SP)
CALL ·printPtr(SB)
RET
If we run that test, we’ll get a panic:
=== RUN TestFoo
runtime: frame <censored> untyped locals 0xc00008ff38+0x8
fatal error: missing stackmap
In order to successfully find pointers on the stack, GC needs the help of so-called stackmaps. For normal Go functions, stackmaps are generated by the compiler. Assembler functions don’t have this information.
Well, we can have partial with the help of the “function stubs” (Go prototypes) with the correct types. The documentation also mentions cases where the stackmap is not necessary, but we need one in our program as it gently crashes without it.
There are at least 3 approaches we can try to take from here:
- Try to make your asm function fall into the category where the stackmap is not needed (sometimes it’s impossible).
- Manually build a stackmap inside the asm function body (difficult and error-prone).
- Use
NO_LOCAL_POINTERS
macro and pray that you know what you’re doing.
NO_LOCAL_POINTERS macro
If we add the NO_LOCAL_POINTERS
to our asm function, it’ll pass the test:
#include "funcdata.h"
TEXT ·foo(SB), 0, $8-8
NO_LOCAL_POINTERS
MOVQ ptr+0(FP), AX
MOVQ AX, 0(SP)
CALL ·printPtr(SB)
RET
Now we need to understand the hows and whys.
Why GC need to know which stack slots contain pointers? Let’s assume that these pointers are coming from the caller, they are reachable from the code that called the assembler function. So it shouldn’t be a problem that asm function local pointers are not considered to be “live”.
Pointers can point to the stack objects as well as to the heap objects. When a stack resize happens, all pointers to stack values need to be fixed. This is the responsibility of the runtime to adjust the affected pointers.
All pointers that are passed to the assembler function “escape to the heap” in the terms of the escape analysis, so it’s not that simple to have a pointer to a stack-allocated value inside the assembler function.
It’s safe to use
NO_LOCAL_POINTERS
if all local pointers store heap-allocated addresses and they are reachable for the GC from somewhere else.
With non-cooperative preemption, it’s good to keep in mind that assembler functions with NO_LOCAL_POINTERS are not interrupted.
Another example of safe usage can be found inside the Go runtime. Functions that are marked with go:nosplit
will not have a stack resize. Nosplit can be used only inside runtime
package.
GO_ARGS macro
For asm functions that have Go prototype, GO_ARGS
is automatically inserted by the assembler.
GO_ARGS
is another macro from the funcdata.h. It specifies that the arguments stackmap can be found inside a Go declaration.
It didn’t work in the past for the functions defined in a different package. Nowadays you don’t need to manually place args_stackmap
for the exported symbols.
GO_RESULTS_INITIALIZED macro
If asm function has a pointer return value and it makes calls to Go functions, it should begin with result stack slots zeroing (as they may contain garbage) followed by a GO_RESULTS_INITIALIZED
macro call.
Example:
//; func getg() interface{}
TEXT ·getg(SB), NOSPLIT, $32-16
//; Interface consists of two pointers.
//; Both of them need to be zeroed.
MOVQ $0, ret_type+0(FP)
MOVQ $0, ret_data+8(FP)
GO_RESULTS_INITIALIZED
//; The function body...
RET
In general, it’s better to avoid asm functions that return a pointer result.
You can find more GO_RESULTS_INITIALIZED
examples by using a GitHub search.
Calling Go funcs from JIT code
Now for the most exciting part: calling a Go function from a dynamically generated machine code.
Go GC expects that all code that makes a function calls is available during the compile time. That means that Go doesn’t play well with JITed code.
We’ll start by reproducing a fatal error.
// file jit.go
package main
import (
"log"
"reflect"
"syscall"
"unsafe"
)
func main() {
a := funcAddr(goFunc)
code := []byte{
// MOVQ addr(goFunc), AX
0xb8, byte(a), byte(a >> 8), byte(a >> 16), byte(a >> 24),
// CALL AX
0xff, 0xd0,
// RET
0xc3,
}
executable, err := mmapExecutable(len(code))
if err != nil {
log.Panicf("mmap: %v", err)
}
copy(executable, code)
callJIT(&executable[0])
}
func callJIT(code *byte)
func goFunc() {
println("called from JIT")
}
// Extra (scary) code is hidden under the spoiler for brevity.
Helper functions
func mmapExecutable(length int) ([]byte, error) {
const prot = syscall.PROT_READ | syscall.PROT_WRITE | syscall.PROT_EXEC
const flags = syscall.MAP_PRIVATE | syscall.MAP_ANON
return mmapLinux(0, uintptr(length), prot, flags, 0, 0)
}
func mmapLinux(addr, length, prot, flags, fd, off uintptr) ([]byte, error) {
ptr, _, err := syscall.Syscall6(
syscall.SYS_MMAP,
addr, length, prot, flags, fd, offset)
if err != 0 {
return nil, err
}
slice := *(*[]byte)(unsafe.Pointer(&reflect.SliceHeader{
Data: ptr,
Len: int(length),
Cap: int(length),
}))
return slice, nil
}
func funcAddr(fn interface{}) uintptr {
type emptyInterface struct {
typ uintptr
value *uintptr
}
e := (*emptyInterface)(unsafe.Pointer(&fn))
return *e.value
}
// file jit_amd64.s
TEXT ·calljit(SB), 0, $0-8
MOVQ code+0(FP), AX
JMP AX
If you build and run that program, it’ll look like everything works fine:
$ go build -o jit . && ./jit
called from JIT
With a single line addition to a goFunc
, we can make our program crash:
func goFunc() {
println("called from JIT")
+ runtime.GC()
}
$ go build -o jit . && ./jit
called from JIT
runtime: unexpected return pc for main.goFunc called from 0x7f9465f7c007
stack: frame={sp:0xc00008ced0, fp:0xc00008cef0} stack=[0xc00008c000,0xc00008d000)
000000c00008cdd0: 0000000000000000 00007f94681f7558
000000c00008cde0: 000000c000029270 000000000000000b
... (+ more)
The “return PC” points to a JIT code that is unknown to the runtime, hence the error:
If Go runtime doesn’t want us to call a function from a JIT code, we’ll call them from the places it can recognize.
The second version of the callJIT
will have a section that is responsible for the Go function call. Whenever we need to call a Go function we’ll jump to that gocall
section.
#include "funcdata.h"
TEXT ·callJIT(SB), 0, $8-8
NO_LOCAL_POINTERS
MOVQ code+0(FP), AX
JMP AX
gocall:
CALL CX
JMP (SP)
Some notable changes:
- We need at least 8 extra frame bytes to write the origin return address
NO_LOCAL_POINTERS
is needed due to theCALL
and non-zero frame size
The normal execution path for the callJIT
is unchanged.
gocall
handles JIT->Go calls. We expect that the caller puts the callee function address into CX
and the origin return address into (SP)
.
Now we need the gocall
label address. I used the disassembler to get that, but maybe there is a more clear way to do it.
The modified main
is shown below.
a := funcAddr(goFunc)
j := funcAddr(calljit) + 36
code := []byte{
// MOVQ funcAddr(goFunc), CX
0x48, 0xc7, 0xc1, byte(a), byte(a >> 8), byte(a >> 16), byte(a >> 24),
// MOVQ funcAddr(gocall), DI
0x48, 0xc7, 0xc7, byte(j), byte(j >> 8), byte(j >> 16), byte(j >> 24),
// LEAQ 6(PC), SI
0x48, 0x8d, 0x35, (4 + 2), 0, 0, 0,
// MOVQ SI, (SP)
0x48, 0x89, 0x34, 0x24,
// JMP DI
0xff, 0xe7,
// ADDQ $framesize, SP
0x48, 0x83, 0xc4, (8 + 8),
// RET
0xc3,
}
The “call Go” code sequence:
- The callee address is stored in
CX
- The
gocall
address is stored inDI
- The return address is evaluated to
SI
, then stored to(SP)
4+2 is a width ofMOVQ
andJMP
instructions that follow theLEAQ
- Then we jump to the
gocall
Our function now has a frame, so we need to do a cleanup before returning. 8 bytes for our frame plus 8 bytes for the BP
spilling.
here we store the return address for gocall
|
| Go stores the previous BP value here
| |
0(SP) 8(SP) 16(SP) 24(SP)
[empty] [prevBP] [retaddr] [arg1:code]
| / | |
| / | callJIT argument (caller frame)
| / |
| / pushed by the CALL during the callJIT() call from main
| /
callJIT frame, 16 bytes
With that trampoline, Go runtime sees a known callJIT
function in its call stack:
$ go build -o jit . && ./jit
called from JIT
Success!
This solution can be modified to call Go functions with arguments. All you need to do is to add some extra frame space in callJIT
to put Go function arguments there.
Go internal ABI
Go Internal ABI is another hot topic.
Long story short, Go may get a new ABI
pretty soon. It’s going to be versioned.
The proposal highlighted two things:
- The existing assembler code will continue to work correctly.
- Older calling conventions will still be supported for the new code.
The original Go calling convention is a part of ABI0
.
The experimental calling convention is a part of ABIInternal
.
If we’ll start a Go compilation with -S
flag, it’s possible to note that ABIInternal
is already there (but it’s very similar to ABI0
right now):
When ABIInternal
will be good enough, it’ll be released as ABI1
and the cycle will continue.
The good news is that our assembler code should continue to work correctly, at least in the near future. On this optimistic note, I would like to finish this article.