What Happens When You Go Get — A Closer Look At The Internals Of Go Modules

Gopher Mascot Image gotten from here

Whenever you need an external module while working on a project in Golang, You go get it — By running the command go get <pkg-name>. Unlike in many other languages, Go does not have a central software registry or package manager (like NPM for JavaScript, Maven for Java) where modules can be accessed from and published to. Whether you are just getting started with Go, running your first go get command or are a veteran, understanding what happens behind the scenes when you run the go command is important.

In this post, we would discuss the steps that take place when you attempt to download an external module to your project. Let’s get started 🚀🚀

The GOPROXY

Package management in go is [semi-]decentralized. There is no single server that hosts the source code for all the various modules that exist. Any git repository can be used as a module package system in Go. There are various arguments for and against the decentralized nature of Go’s module system. Personally, I think most of the arguments against Go’s decentralized module system are weak, especially with the introduction of module mirrors & the checksum database which we will discuss in the coming sections.

Although a centralized module system tends to be simpler and faster, the central server might not be available everywhere (in countries with national firewalls), it requires developers to trust a central entity.

[As I write this, I am reminded of this story when Azer Koçulu deleted an NPM package & “broke the internet” ]

A decentralized module system avoids those big problems, but you may have a lot of smaller trust and availability problems with individual servers. Go tries to strike a balance between the two with proxies.

Generally speaking, when you run a go get command, Go downloads the package to your computer if it is not present in your local cache. The source from which the go command downloads external modules from depends on the value of the GOPROXY environment variable. By default, the value of the GOPROXY variable is set to https://proxy.golang.org, direct. The above value specifies that the go command should attempt to download the specified external module from https://proxy.golang.org and fall back to the direct URL provided if it is unable to.

Module Mirrors

In this section, we will answer the questions what is https://proxy.golang.org ? Why does the go command try to download the external module I specified from there instead of the actual URL appended to the go get command?

proxy.golang.org is a module mirror run by google. The module mirror is a type of proxy that fetches modules from the origin servers(git repositories) and caches them in its own storage for use in future requests. Module mirrors ensure that changes to the source of the module/downtime in the origin servers do not affect your builds. Downloading Modules using a Proxy is more efficient, faster and requires less storage in comparison to direct module downloads.

Asides from downloading modules, the go command is also tasked with resolving the dependencies of these newly downloaded modules. Using the direct download method, the go command would have to download the entire source history of a dependency whether it is going to be used in the build or not. Using a proxy, the go command downloads a zip file which is a partial snapshot of a repository at a specific commit. The snapshot contains everything in the module’s root directory (the directory containing its go.mod file) but excludes everything in nested modules (subdirectories containing go.mod files). That includes the source code of all the module’s packages (regardless of whether they’re actually needed for a build). It may also include files in directories that aren’t Go packages. Also, the go command fetches the .mod & .info files of other dependencies[-version] by making HTTP requests to endpoints on the module proxy server.

Although the most popular, https://proxy.golang.org is not the only available module proxy. Module proxies are not sacred, in fact, you can create your own module proxy. Projects like this let you even host your own Go proxy. There is a GOPROXY protocol every module proxy must implement. Once your HTTP server implements all the specs in the GOPROXY protocol, you have yourself a module proxy. The spec includes a list of endpoints all module proxies must have.

I would also like to mention that the GOPROXY protocol, exists to ensure uniformity across all the various proxies that exist. The go command is not interested in what Proxy you have set up, It just needs to be able to access all the necessary endpoints as specified in the spec. Running go get github.com/aws/aws-sdk-go-v2, the go command will make an HTTP request to $GOPROXY/github.com/aws/aws-sdk-go-v2@v/list. The environment variable in the command above will be whatever you set it to be, go don’t care.

Let’s Talk Security

At this point, you are probably wondering about the security of the module proxy. In this section, we would discuss the steps the go command takes to ensure that modules that it downloads are secure & not tampered with.

Ever wondered what the go.sum file that sits right beside your go.mod file does? Tried to open it but realised it was incomprehensible? The first time an external module(dependency) is used by your project, a list of cryptographic hashes for the .mod & .zip file of that dependency & all its transitive dependencies are generated and added to your go.sum file. Subsequently, when the said dependency is [re-]downloaded, the go command checks to ensure that the generated hash of the .mod & .zip file of that dependency matches with the corresponding entry in the go.sum file for that dependency.

Although the go.sum file makes sure hashes match, ensuring reproducible builds, it does nothing to make sure that the first time a dependency is added, It is secure/not tampered with. The checksum database exists as a global source of truth for all publicly available module versions. Using the checksum database, a module is verified on the first download and compare with the go.sum file on subsequent downloads. sum.golang.org is an auditable checksum database powered by google. The go command uses sum.golang.org by default when downloading an external dependency that does not have any module version specified in the existing go.sum file.

In Conclusion

We have discussed Go Modules, Module Mirrors & the checksum database! I hope you got a little more clarity about Go modules and all the stuff happening behind the scenes. If you are interested in learning more, I suggest you check out the following materials.

If you have any questions or feedback, please feel free to share them with me on Twitter: @oluwatvbi or via Email: tobade02@gmail.com

Big Thanks to Chidi Williams & Jay Conrod(Jay worked on Go modules at Google🤯) for taking the time to review the original draft for this article & give points for improvements!

Hey Guys, I am looking for a new developer role! Ideally one where get to write code in Golang & work on interesting/meaningful problems. (Remote too). Are you hiring , got any leads or just want to wish me luck on my Job search ? Send me an Email at tobade02@gmail.com

Software developer, learning about distributed systems.