Introduction

I prefer to use relational (SQL) databases in general since they provide several features that are very useful when working with data. SQLite is a great choice since the database is a single file, which makes it easier to share data. Even though it’s a single file, SQLite can handle up to 281 terabytes of data. SQLite also comes with a command line client called sqlite3 which is great for quick prototyping.

Note: Other databases have transactions, query languages and schemas. However SQL databases tend to be much more mature and SQL is standardized.

Definitions

Here are some definitions to help provide clarity.

SQLite : is an embedded SQL database. SQLite is small, fast and the most used database out there. It’s also by far my favorite way to exchange large amounts of data between two programs.

Transactions : You insert data into an SQL database inside a transaction. This means that either all of the data gets in, or none of it. Transactions simplify retry logic in data pipelines by orders of magnitude.

Schema : Data in relational databases has a schema, which means it’s easier to check the validity of your data.

SQL : Structured Query Language is a language for selecting and changing data. You don’t need to invent yet another way to select interesting parts of data. SQL is an established format and there’s a lot of knowledge and tooling around it.

The Project

We’ll write an HTTP server in Go that will get notifications on trades and will store them in an SQLite database. Then we’ll write a Python program that will process the data.

In Go, we’ll be using github.com/mattn/go-sqlite3 which is a wrapper around the SQLite C library.

Note: Since go-sqlite uses cgo, the initial build time will be longer than usual. Using cgo means that the resulting executable depends on some shared libraries from the OS, making distribution slightly more complicated.

In Python, we’ll use the built-in sqlite3 module and Pandas read_sql function to load the data. The source code for this blog is available at https://github.com/ardanlabs/python-go/tree/master/sqlite

Go Code

The code I am going to present can be found here in the trades.go file.

Listing 1: Trade struct

37 // Trade is a buy/sell trade for symbol.
38 type Trade struct {
39     Time   time.Time
40     Symbol string
41     Price  float64
42     IsBuy  bool
43 }

Listing 1 shows the Trade data structure. It has a Time field for the trade time, a Symbol field for the stock symbol (e.g. AAPL), the Price, and a boolean flag that tells if it’s a buy or a sell trade.

Listing 2: Database Schema

24     schemaSQL = `
25 CREATE TABLE IF NOT EXISTS trades (
26     time TIMESTAMP,
27     symbol VARCHAR(32),
28     price FLOAT,
29     buy BOOLEAN
30 );
31 
32 CREATE INDEX IF NOT EXISTS trades_time ON trades(time);
33 CREATE INDEX IF NOT EXISTS trades_symbol ON trades(symbol);
34 `

Listing 2 declares the database schema that corresponds to the Trade struct. On line 25, we create a table called trades. On lines 26-29, we define the table columns that correspond to the Trade struct fields. On lines 32-33, we create indexes on the table to allow fast querying by time and symbol.

Listing 3: Insert Record SQL*

16     insertSQL = `
17 INSERT INTO trades (
18     time, symbol, price, buy
19 ) VALUES (
20     ?, ?, ?, ?
21 )
22 `

Listing 3 defines the SQL to insert a record to the database. On line 20, we use ? as place holders for the parameters to this query. Never use fmt.Sprintf to craft an SQL query - you’re risking an SQL injection.

Inserting records one-by-one is a slow process. We’re going to store records in a buffer and once it’s full, insert all the records in the buffer to the database. This has the advantage of being fast but carries the risk that we’ll lose data on a server crash.

Listing 4: DB

45 // DB is a database of stock trades.
46 type DB struct {
47     sql    *sql.DB
48     stmt   *sql.Stmt
49     buffer []Trade
50 }

Listing 4 describes the DB struct. On line 47, we hold the connection to the database. On line 48, we store a prepared (pre-compiled) statement for inserting and on line 49, we have the in-memory buffer of pending transactions.

Listing 5: NewDB

52 // NewDB constructs a Trades value for managing stock trades in a
53 // SQLite database. This API is not thread safe.
54 func NewDB(dbFile string) (*DB, error) {
55     sqlDB, err := sql.Open("sqlite3", dbFile)
56     if err != nil {
57         return nil, err
58     }
59 
60     if _, err = sqlDB.Exec(schemaSQL); err != nil {
61         return nil, err
62     }
63 
64     stmt, err := sqlDB.Prepare(insertSQL)
65     if err != nil {
66         return nil, err
67     }
68 
69     db := DB{
70         sql:    sqlDB,
71         stmt:   stmt,
72         buffer: make([]Trade, 0, 1024),
73     }
74     return &db, nil
75 }

Listing 5 shows the creation of a DB database for use. On line 55, we connect to the database using the “sqlite3” driver. On line 60, we execute the schema SQL to create the trades table if it doesn’t already exist. On line 64, we pre-compile the insert SQL statement. On line 72, we create the internal buffer with 0 length and a capacity of 1024.

Note: For simplicity, the DB API I am providing is not goroutine safe (unlike sql.DB). If multiple goroutines call into the API concurrently, you will experience data races. I will leave it up to you as an exercise to make it safe.

Listing 6: Add

77 // Add stores a trade into the buffer. Once the buffer is full, the
78 // trades are flushed to the database.
79 func (db *DB) Add(trade Trade) error {
80     if len(db.buffer) == cap(db.buffer) {
81         return errors.New("trades buffer is full")
82     }
83 
84     db.buffer = append(db.buffer, trade)
85     if len(db.buffer) == cap(db.buffer) {
86         if err := db.Flush(); err != nil {
87             return fmt.Errorf("unable to flush trades: %w", err)
88         }
89     }
90 
91     return nil
92 }

Listing 6 shows the Add method. On line 84, we append the trade to the in-memory buffer. On line 85, we check to see if the buffer is full and if it is we call Flush on line 86, that will insert the records from the buffer into the database.

Listing 7: Flush

94  // Flush inserts pending trades into the database.
95  func (db *DB) Flush() error {
96      tx, err := db.sql.Begin()
97      if err != nil {
98          return err
99      }
100 
101     for _, trade := range db.buffer {
102         _, err := tx.Stmt(db.stmt).Exec(trade.Time, trade.Symbol, trade.Price, trade.IsBuy)
103         if err != nil {
104             tx.Rollback()
105             return err
106         }
107     }
108 
109     db.buffer = db.buffer[:0]
110     return tx.Commit()
111 }

Listing 7 shows the Flush method. On line 96, we start a transaction. On line 101, we iterate over the internal buffer and on line 102, insert each trade. If there was an error inserting, we issue a rollback on line 104. On line 109, we reset the in-memory trades buffer. Finally, on line 110, we issue a commit.

Listing 8: Close

113 // Close flushes all trades to the database and prevents any future trading.
114 func (db *DB) Close() error {
115     defer func() {
116         db.stmt.Close()
117         db.sql.Close()
118     }()
119 
120     if err := db.Flush(); err != nil {
121         return err
122     }
123 
124     return nil
125 }

Listing 8 shows the Close methods. On line 120, we call Flush to insert any remaining trades into the database. On line 116 and 117, we close the statement and the database. Functions creating a DB should have a defer db.Close() to make sure the database connection is freed.

Listing 9: Imports

4 4 // Your main or test packages require this import so
5 // the sql package is properly initialized.
6 // _ "github.com/mattn/go-sqlite3"
7 
8 import (
9     "database/sql"
10     "errors"
11     "fmt"
12     "time"
13 )

Listing 9 shows the imports for the file. On line 05, we import database/sql that defines the API for working with SQL databases. database/sql does not contain any specific database driver.

As the comment above the import statement says - to use the trades package, you will need to import a package that implements an sqlite3 database driver (such as github.com/mattn/go-sqlite3). Since you import the package implementing the driver only for the side effect of registering the “sqlite3” protocol, we use _ in front of the import - telling the Go compiler it’s OK we don’t use this package in the code.

Listing 10: Example Usage

The code for these examples can be found in the trades_test.go file.

66 func ExampleDB() {
67     dbFile := "/tmp/db-test" + time.Now().Format(time.RFC3339)
68     db, err := trades.NewDB(dbFile)
69     if err != nil {
70         fmt.Println("ERROR: create -", err)
71         return
72     }
73     defer db.Close()
74 
75     const count = 10_000
76     for i := 0; i < count; i++ {
77         trade := trades.Trade{
78             Time:   time.Now(),
79             Symbol: "AAPL",
80             Price:  rand.Float64() * 200,
81             IsBuy:  i%2 == 0,
82         }
83         if err := db.Add(trade); err != nil {
84             fmt.Println("ERROR: insert - ", err)
85             return
86         }
87     }
88 
89     fmt.Printf("inserted %d records\n", count)
90     // Output:
91     // inserted 10000 records
92 }

Listing 10 shows an example use (as a form of a testable example). On line 67 we create a new database and on line 73 we make sure we close it with a defer statement. On line 76 we start a loop to insert trades and on line 83 we insert the trade to the database.

The Python Code

The code for the Python examples can be found in the analyze_trades.py file.

Listing 11: Imports

02 import sqlite3
03 from contextlib import closing
04 from datetime import datetime
05 
06 import pandas as pd

Listing 11 shows the libraries we’re using in the Python code. On line 02, we import the built-in sqlite3 module and on line 06, we import the pandas library.

Listing 12: Select SQL

08 select_sql = """
09 SELECT * FROM trades
10 WHERE time >= ? AND time <= ?
11 """

Listing 12 shows the SQL for selecting data. On line 10, we select all the columns from the trades table. On line 10, we add a WHERE clause for selecting in time range. As in the Go code, we use ? as placeholders for arguments and not construct the SQL manually.

Listing 13: Loading Trades

14 def load_trades(db_file, start_time, end_time):
15     """Load trades from db_file in given time range."""
16     conn = sqlite3.connect(db_file)
17     with closing(conn) as db:
18         df = pd.read_sql(select_sql, db, params=(start_time, end_time))
19 
20     # We can't use detect_types=sqlite3.PARSE_DECLTYPES here since Go is
21     # inserting time zone and Python's sqlite3 doesn't handle it.
22     # See https://bugs.python.org/issue29099
23     df["time"] = pd.to_datetime(df["time"])
24     return df

Listing 13 shows the code for loading trades at a given time range. On line 16, we connect to the database. On lines 17, we use a context manager, somewhat like Go’s defer to make sure the database is closed. On line 18, we use pandas read_sql function to load data from an SQL query to a DataFrame. Python has an API for connection to databases (like database/sql) and Pandas can use any compatible driver. On line 23, we convert the time column to pandas Timestamp. This is specific to SQLite that doesn’t have built-in support for TIMESTAMP types.

Listing 14: Average Price

27 def average_price(df):
28     """Return average price in df grouped by (stock, buy)"""
29     return df.groupby(["symbol", "buy"], as_index=False)["price"].mean()

Listing 14 shows how to calculate the average price per symbol and buy. On line 29, we use the DataFrame groupby to group by symbol and buy. We use as_index=False to get the symbol and buy as columns in the resulting data frame. Then we take the price column and calculate the mean per group.

Listing 15: Output

symbol,buy,price
AAPL,0,250.82925665004535
AAPL,1,248.28277375538832
GOOG,0,250.11537993385295
GOOG,1,252.4726772487683
MSFT,0,250.9214212695317
MSFT,1,248.60187022941685
NVDA,0,250.3844763417279
NVDA,1,249.3578146208962

Listing 15 shows the output of running the Python code on dummy data.

Conclusion

I highly recommend you consider using SQLite in your next project. SQLite is a mature and stable platform that can handle huge amounts of data. Many programming languages have drivers to SQLite, which makes it a good storage option.

I’ve simplified the code as much as I could to show the more interesting points. There are several places where you can try and improve it:

  • Add a retry logic to Flush
  • Do more error checking in Close
  • Make DB goroutine safe
  • Run more analysis on the Python side

I also encourage you to learn SQL, there’s a lot of data out there in SQL databases. You don’t have to be an expert, knowing the basics of selecting data will already give you a new powerful tool in your data analysis toolbox.

If you don’t want to work directly with SQL there are some alternatives, in the Go world there are packages such as sqlx, gorm and on the Python world there’s SQLAlchemy which Pandas can use.

Have fun with the code, let me know what are you doing with SQL in general and SQLite specifically.

Trusted by top technology companies

We've built our reputation as educators and bring that mentality to every project. When you partner with us, your team will learn best practices and grow along the way.

30,000+

Engineers Trained

1,000+

Companies Worldwide

12+

Years in Business