Introduction

One of the exercises I give to students is to download a single big file over HTTP concurrently using several goroutines using HTTP Range requests. An extra part of the exercise is to validate the downloaded file from a known MD5 signature. This extra part turns out to be interesting, let’s have a look.

Getting Download Information

Let’s make an HTTP HEAD request to get information about a file located in a public dataset stored on Google Cloud Storage (GCS).

https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_B2.TIF

Listing 1: HEAD request

$ curl -i -I https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_B2.TIF

HTTP/2 200 
x-guploader-uploadid: ADPycdtU_fX5RkUlYHRaU4ajofN7LOXIdjzNJUzKWyKKIOtIhxyhhyY-0JZ1avs5T1ohCZD7_0jPurQ2ByB3YlCm2D1D0A
expires: Thu, 02 Mar 2023 11:02:52 GMT
date: Thu, 02 Mar 2023 10:02:52 GMT
cache-control: public, max-age=3600
last-modified: Fri, 08 Sep 2017 09:10:25 GMT
etag: "eec1fa5ce8077d7030e194eb5989c937"
x-goog-generation: 1504861825662906
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 69962928
content-type: application/octet-stream
x-goog-hash: crc32c=ir4fqg==
x-goog-hash: md5=7sH6XOgHfXAw4ZTrWYnJNw==
x-goog-storage-class: STANDARD
accept-ranges: bytes
content-length: 69962928
server: UploadServer
alt-svc: h3=":443"; ma=2592000,h3-29=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"

Listing 1 shows a curl call where the -I flag denotes the use of the HEAD verb and the -i flag tells curl to include the response headers. By looking at the response (content-length header) we can see the size of the file is 69962928 bytes or about 66.7 MiB.

Look for the x-goog-hash header which contains the hash information for the file. The md5 prefix tells us the value represents an MD5 signature. The data that follows is the signature encoded in base64 encoding. Note that we have two x-goog-hash HTTP headers, another one for the CRC check.

Actual File Sum

Let’s check the file MD5 signature as reported by the md5sum utility.

Listing 2: MD5

$ curl https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_B2.TIF | md5sum

eec1fa5ce8077d7030e194eb5989c937  -

Listing 2 shows the calculation of MD5 signature after piping the file curl downloads to the md5sum utility program. We can see the signature is eec1fa5ce8077d7030e194eb5989c937.

Now that we have the data we need, let’s start writing Go code to get a signature for a file hosted on GCS. The code can be found here

Making a HEAD Request

Listing 3: HEAD Request in Go

12 func urlSig(ctx context.Context, url string) (string, error) {
13     req, err := http.NewRequestWithContext(ctx, http.MethodHead, url, nil)
14     if err != nil {
15         return "", err
16     }
17 
18     resp, err := http.DefaultClient.Do(req)
19     if err != nil {
20         return "", err
21     }
22     if resp.StatusCode != http.StatusOK {
23         return "", fmt.Errorf("%q: bad status - %s", url, resp.Status)
24     }

Listing 3 shows the first part of the urlSig function which will return the MD5 signature for any file at the specified url. . On line 12 we define the urlSig function with two parameters: a context and the location of the file. It’s always a good idea to use context for timeouts when dealing with the network. On line 13, we create a new HEAD request using the context and the URL parameters. The last nil parameter means we are not passing any data with the request. On line 18, we use the default HTTP client to make the call and on line 22, we check that the response status code is a 200.

Finding the Signature

Listing 4: Finding MD5 Signature HTTP Header

26     // Find MD5 hash in HTTP headers.
27     const (
28         header = "x-goog-hash"
29         prefix = "md5="
30     )
31     b64hash := ""
32     values := resp.Header.Values(header)
33     for _, v := range values {
34         if strings.HasPrefix(v, prefix) {
35             b64hash = v[len(prefix):]
36             break
37         }
38     }
39 
40     if b64hash == "" {
41         return "", fmt.Errorf("can't find md5 hash %s: %v", header, values)
42     }

Listing 4 shows how to read the MD5 signature from the HTTP header. On lines 28 and 29, we define the header and the MD5 signature keys. On line 32, we get all the values for the specified key.

Listing 5: HEAD request

x-goog-hash: crc32c=ir4fqg==
x-goog-hash: md5=7sH6XOgHfXAw4ZTrWYnJNw==

Listing 5 is a reminder that two values are provided for the x-goog-hash key in the header.

Then on lines 33-38 in listing 4, we look for the MD5 signature value from the list. Once found, on line 35 we remove the md5= prefix from the header value to get the signature. Finally on lines 40-42 we return an error if we can’t find the signature.

Decoding and Returning the Hash

Listing 6: Decoding the Hash Value

44     hash, err := base64.StdEncoding.DecodeString(b64hash)
45     if err != nil {
46         return "", err
47     }
48 
49     // Convert hash to "eec1fa5ce8077d7030e194eb5989c937" format.
50     return fmt.Sprintf("%x", hash), nil
51 }

Listing 6 shows how to decode the base64 encoded hash value. On line 44 we use the standard encoder from the encoding/base64 package to decode the signature into a []byte. Then on line 50, we convert the signature from a []byte to a hexadecimal string.

Testing the Code

Listing 7: Testing

53 func main() {
54     url := "https://storage.googleapis.com/gcp-public-data-landsat/LC08/01/044/034/LC08_L1GT_044034_20130330_20170310_01_T2/LC08_L1GT_044034_20130330_20170310_01_T2_B2.TIF"
55     ctx, cancel := context.WithTimeout(context.Background(), time.Second)
56     defer cancel()
57 
58     fmt.Println(urlSig(ctx, url))
59 }

Listing 7 shows a call to the urlSig function. On line 55 we create a context and on line 58 we call the function and print the return values.

Now, let’s run it:

Listing 8: Running the Code

$ go run dlhash.go 

eec1fa5ce8077d7030e194eb5989c937 <nil>

Listing 8 shows how to run the code. We can see the signature matches the one we got using the md5sum program at the beginning of the post.

Conclusion

Even small tasks such as verifying download integrity can lead to many interesting places. We covered the following topics to make this happen:

  • Making an HTTP head request
  • HTTP headers
  • Base64 encoding
  • MD5 signatures
  • Converting []byte to a string

I hope you learned something useful from this blog post, feel free to reach me at miki@ardanlabs.com for comments and suggestions. And of course, you are more than welcome to join one of our world class trainings.

Trusted by top technology companies

We've built our reputation as educators and bring that mentality to every project. When you partner with us, your team will learn best practices and grow along the way.

30,000+

Engineers Trained

1,000+

Companies Worldwide

12+

Years in Business