Series Index

Python and Go: Part I - gRPC
Python and Go: Part II - Extending Python With Go
Python and Go: Part III - Packaging Python Code
Python and Go: Part IV - Using Python in Memory

Introduction

In the previous post we saw how a Go service can call a Python service using gRPC. Using gRPC to connect a Go and Python program together can be a great choice, but there’s a complexity price that goes with it. You need to manage one more service, deployment becomes more complex, and you need monitoring plus alerting for each service. Compared to a monolithic application, there is an order of magnitude more complexity.

In this post, we’re going to reduce the complexity of using gRPC by writing a shared library in Go that a Python program can consume directly. With this approach, there’s no networking involved and depending on the data types, no marshalling as well. Out of the several approaches of calling functions from a shared library in Python, we decided to use Python’s ctypes module.

Note: ctypes uses libffi under the hood. If you want to read some really scary C code - head over to the repo and start reading. :)

I’ll also show my workflow that is one of the big factors in my productivity. We’ll first write “pure” Go code, then write code to export it to a shared library. Then we’ll switch to the Python world and use Python’s interactive prompt to Play around with the code. Once we’re happy, we’ll use what we’ve learned in the interactive prompt to write a Python module.

Example: Checking the Digital Signature of Several Files in Parallel

Imagine you have a directory with data files, and you need to validate the integrity of these files. The directory contains a sha1sum.txt file with a sha1 digital signature for every file. Go, with its concurrency primitives and ability to use all the cores of your machine, is much better suited to this task than Python.

Listing 1: sha1sum.txt

6659cb84ab403dc85962fc77b9156924bbbaab2c  httpd-00.log
5693325790ee53629d6ed3264760c4463a3615ee  httpd-01.log
fce486edf5251951c7b92a3d5098ea6400bfd63f  httpd-02.log
b5b04eb809e9c737dbb5de76576019e9db1958fd  httpd-03.log
ff0e3f644371d0fbce954dace6f678f9f77c3e08  httpd-04.log
c154b2aa27122c07da77b85165036906fb6cbc3c  httpd-05.log
28fccd72fb6fe88e1665a15df397c1d207de94ef  httpd-06.log
86ed10cd87ac6f9fb62f6c29e82365c614089ae8  httpd-07.log
feaf526473cb2887781f4904bd26f021a91ee9eb  httpd-08.log
330d03af58919dd12b32804d9742b55c7ed16038  httpd-09.log

Listing 1 shows an example of a digital signature file. It provides hash codes for all the different log files contained in the directory. This file can be used to verify that a log file was downloaded correctly or has not been tampered with. We’ll write Go code to calculate the hash code of each log file and then match it against the hash code listed in the digital signature file.

To speed this process up, we’ll calculate the digital signature of each file in a separate goroutine, spreading the work across all of the CPUs on our machine.

Architecture Overview & Work Plan

On the Python side of the code, we’re going to write a function named check_signatures and on the Go side, we’re going to write a function (that does the actual work) named CheckSignatures. In between these two functions, we’ll use the ctypes module (on the Python side) and write a verify function (on the Go side) to provide marshaling support.

Figure 1

Figure 1 shows the flow of data from the Python function to the Go function and back.

Here are the steps we’re going to follow for the rest of the post:

  • Write Go code (CheckSignature),
  • Exporting to the shared library (verify)
  • Use ctypes in the Python interactive prompt to call the Go code
  • Write and package the Python code (check_signatures)
  • We’ll do this part in the next blog post (this one is already long enough).

Go Code - The “CheckSignatures” Function

I’m not going to break down all of the Go source code here, if you’re curious to see all of it, look at this source code file.

The important part of the code to see now is the definition of the CheckSignatures function.

Listing 2: CheckSignatures function definition

// CheckSignatures calculates sha1 signatures for files in rootDir and compare
// them with signatures found at "sha1sum.txt" in the same directory. It'll
// return an error if one of the signatures don't match
func CheckSignatures(rootDir string) error {

Listing 2 shows the definition of the CheckSignatures function. This function will spin a goroutine per file to check if the calculated sha1 signature of any given file matches the one in “sha1sum.txt”. If there is a mismatch in one or more files, the function will return an error.

Exporting Go Code to a Shared Library

With the Go code written and tested, we can move on to exporting it to a shared library.

Here are the steps we’ll follow in order to compile the Go source code into a shared library so Python can call it:

  • import the C package (aka cgo)
  • Use the //export directives on every function we need to expose
  • Have an empty main function
  • Build the source code with the special -buildmode=c-shared flag

Note: Apart from the Go toolchain, we’ll also need a C compiler (such as gcc on your machine). There’s a good free C compiler for each of the major platforms: gcc for Linux, clang on OSX (via XCode) and Visual Studio for Windows

Listing 3: export.go

01 package main
02 
03 import "C"
04 
05 //export verify
06 func verify(root *C.char) *C.char {
07 	rootDir := C.GoString(root)
08 	if err := CheckSignatures(rootDir); err != nil {
09 		return C.CString(err.Error())
10 	}
11 
12 	return nil
13 }
14 
15 func main() {}

Listing 3 shows the export.go file from the project. We import “C” on line 03 and then on line 05, the verify function is marked to be exported in the shared library. It’s important that the comment is provided exactly as is. You can see on line 06, the verify function accepts a C based string pointer using the C package char type. For Go code to work with C strings, the C package provides a GoString function (which is used on line 07) and a CString function (which is used in line 09). Finally, an empty main function is declared at the end.

To build the shared library, you need to run the go build command with a special flag.

Listing 4: Building the Shared Library

$ go build -buildmode=c-shared -o _checksig.so

Listing 4 shows the command to generate the C based shared library which will be named _checksig.so.

Note: The reason for using _ is to avoid name collision with the checksig.py Python module that we’ll show later. If the shared library was named checksig.so then executing import checksig in Python will load the shared library instead of the Python file.

Preparing the Test Data

Before we can try calling verify from Python, we need some data. You’ll find a directory called logs in the code repository. This directory contains some log files and a sha1sum.txt file.

Note: The signature for http08.log is intentionally wrong.

On my machine, this directory is located at /tmp/logs.

A Python Session

I love the interactive shell in Python, it lets me play around with code in small chunks. After I have a working version, I write the Python code in a file.

Listing 5: A Python Session

$ python
Python 3.8.3 (default, May 17 2020, 18:15:42) 
[GCC 10.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

01 >>> import ctypes
02 >>> so = ctypes.cdll.LoadLibrary('./_checksig.so')
03 >>> verify = so.verify
04 >>> verify.argtypes = [ctypes.c_char_p]
05 >>> verify.restype = ctypes.c_void_p
06 >>> free = so.free
07 >>> free.argtypes = [ctypes.c_void_p]
08 >>> ptr = verify('/tmp/logs'.encode('utf-8'))
09 >>> out = ctypes.string_at(ptr)
10 >>> free(ptr)
11 >>> print(out.decode('utf-8'))
12 "/tmp/logs/httpd-08.log" - mismatch

Listing 6 shows an interactive Python session that walks you through testing the use of the exported to a shared library we wrote in Go. line 01 we import the ctypes module to start. Then on line 06, we load the shared library into memory. On lines 03-05 we load the verify function from the shared library and set the input and output types. Lines 06-07 load the free function so we can free the memory allocated by Go (see more below).

Line 08 is the actual function call to verify. We need to convert the directory name to Python’s bytes before passing it to the function. The return value, which is a C string, is stored in ptr. On line 09, we convert the C string to a Python bytes and on line 10 we free the memory allocated by Go. Finally, on line 11, we convert out from bytes to str before printing it.

Note: Line 02 assumes the shared library, _checksig.so is in the current directory. If you started the Python session elsewhere, change the path to the _checksig.so in line 02.

With very little effort we’re able to call Go code from Python.

Intermezzo: Sharing Memory Between Go & Python

Both Python & Go have a garbage collector that will automatically free unused memory. However having a garbage collector doesn’t mean you can’t leak memory.

Note: You should read Bill’s Garbage Collection In Go blog posts. They will give you a good understanding on garbage collectors in general and on the Go garbage collector specifically.

You need to be extra careful when sharing memory between Go and Python (or C). Sometimes it’s not clear when a memory allocation happens. In export.go on line 13, we have the following code:

Listing 6: Converting Go Error to C String

str := C.CString(err.Error())

The documentation for C.String says:

Listing 7

> // Go string to C string
> // The C string is allocated in the C heap using malloc.
> // **It is the caller's responsibility to arrange for it to be
> // freed**, such as by calling C.free (be sure to include stdlib.h
> // if C.free is needed).
> func C.CString(string) *C.char

To avoid a memory leak in our interactive prompt, we loaded the free function and used it to free the memory allocated by Go.

Conclusion

With very little code, you can use Go from Python. Unlike the previous installment, there’s no RPC step - meaning you don’t need to marshal and unmarshal parameters on every function call and there’s no network involved as well. Calling from Python to C this way is much faster than a gRPC call. On the other hand, you need to be more careful with memory management and the packaging process is more complex.

Note: A simple benchmark on my machine clocks gRPC function call at 128µs vs a shared library call at 3.61µs - about 35 times faster.

I hope that you’ve found this style of writing code: Pure Go first, then exporting and then trying it out in an interactive session appealing. I urge you to try this workflow yourself next time you write some code.

In the next installment, we’ll finish the last step of my workflow and package the Python code as a module.

Trusted by top technology companies

We've built our reputation as educators and bring that mentality to every project. When you partner with us, your team will learn best practices and grow along the way.

30,000+

Engineers Trained

1,000+

Companies Worldwide

12+

Years in Business