Introduction

What is Want?

Want is a build system.

Build systems are programs that run other programs to compute outputs from inputs, often more efficiently than rerunning the whole computation for every change in the input.

Want is sometimes called a "hermetic" build system, which means that the build computation is cut off from external resources. Want knows about all the bits going into every computation, and there is no way to sneak different or extra bits into a computation without Want knowing about it. This allows Want to cache every computation in the build, and avoid repeated work.

Why would I use Want?

If you've ever tried to get up and running with a new software project, you will have dealt with the problem of downloading all the tools that the project uses.

Even for simple tasks like building the docs, or producing shippable executables, you may have to install several tools. This requires developers to know about and understand tools which they will never work with themselves, to merely benefit from the output of a computation that others have set up.

If you or someone else put in the time to get something to work, it should also just work for you going forward. You shouldn't have to replay previous steps like: picking the right version, learning command line arguments, order of operations, etc.

With Want, every build target you add will be computed in exactly the same way on your laptop tomorrow, on your co-workers development machine, and in CI. Want forces build steps to be fully and exactly specified, so they can be reproduced by anyone at anytime, in any environment.

Why is it called Want?

You write down what you want, and Want builds it.

More seriously: .want and .wants were good extensions for the files where you specify build targets, and naming the entire build system after them came later.

Using Want

This section will get you up and running with Want.

Most of the time you should run want build. This command is suitable to use as a pre-merge check. It will build all the targets in the current module. It does not and can not perform publishing or other side effects.

Creating a Want Module

To start using Want with a new project, first we need to initialize a Module. Change the current working directory to the root of your project source code, and run the following command.

$ want init

You should see a file called WANT in the current directory. This file contains configuration for Want in the Jsonnet configuration language. Jsonnet is used for all the configuration files in Want.

Creating a Build Target

Want calls the things that you, the user want, but don't yet have, and need to build Targets. There are 2 ways to specify targets: as Expressions, and as Statements.

They are each good for different things; in this example we will use an expression.

Create a file myexpr.want anywhere in your project, and fill it with the following

local want = import "@want";

want.blob("Hello World!\n")

Now run

$ want cat myexpr.want
Hello World!

Inspecting the Build Output

Normally cat myexpr.want would print the Jsonnet file above, but instead it printed just the "Hello World!" part. This is because want cat reads from the build output, while the regular cat reads from the regular filesystem, which is the build input.

Want organizes build targets into an output filesystem. This filesystem only exists virtually, within want, but it can be inspected using similar tools to UNIX (ls and cat).

want ls <path> will list the contents of a tree in the build output. Compare this to the output of regular ls.

This simple translation of inputs to outputs is what makes Want so easy to use and reason about. You will never find yourself in a situation where you don't know when/how/where a Target is being computed.

Armed with just want ls, want cat, and want build, you can get pretty far. To learn more about what you can build with Want, read the Build Configuration section.

Concepts

This section provides a conceptual overview of Want. If something doesn't make sense or you want to learn more about Want's model, this is a good section to read.

TLDR

Want = Git + Compute

Git models filesystem data as immutable Trees and Blobs. Directories are Trees, and files are Blobs. In Want, all Values are Trees or Blobs.

As you will learn in the later sections: you can specify them literally, you can select them from the project source, you can get them from the network, and you can compute them.

It doesn't matter where they come from, all data turns into Trees and Blobs in the end.

Want allows you to perform operations (which are Trees & Blobs) on inputs (which are Trees & Blobs) to produce outputs (which are also Trees & Blobs).

There are more details, but if you know how Git works, then just imagine using Git trees to define programs, and their inputs. The programs output trees as well.

Blobs, Trees, and Refs

Want stores all data using a format called the Git-Like Filesystem (GLFS). GLFS is not identical or even a superset of the formats used by Git the version control software. However, it is very similar in terms of the shape of the datastructure.

GLFS is an open source project, distinct from Want, which you can explore here

The first step in a build is to import the entire root module from the local filesystem into Want where it is represented as a GLFS filesystem.

Blobs

A Blob is an arbitrarily large sequence of bytes. It supports random access, so you can read from anywhere in the sequence efficiently. Blobs are immutable, changing the bytes of a Blob would produce a different Blob.

Terminal nodes in a POSIX Filesystem (regular files and symbolic links) are represented as Blobs in GLFS. A Blob does not contain any references to any other filesystem objects.

Refs

Ref stands for Reference. It is information which can be used to retrieve a filesystem object, but is typically much smaller than the object itself. A Ref is like an IOU for an immutable filesystem snapshot.

Refs contain a cryptographic hash of the content that they point to. They also contain the Type of the data that they point to. The Type is either tree or blob. Refs to Trees and Refs to blobs contain an equivalent amount of information. Phrased differently, a Ref to a Tree will not take up more or less space than a Ref to a Blob.

Throughout Want, Refs are used extensively. All filesystem data is passed around as Refs. Expressions in the build language evaluate to Refs in the output.

Trees

A Tree is the GLFS analog to a POSIX filesystem directory. A Tree is a set of entries, where each entry contains:

  • Name
  • Mode
  • Ref

Names cannot contain the character /, and must be unique across all the other entires in a tree. Mode contains permission bits; they are mostly needed for compatibility when converting back to a POSIX filesystem. The Ref refers to another object in the filesystem, as was discussed above.

Trees form a recursive datastructure because they can contain references to other Trees. Trees cannot contain a reference to themselves or transitively contain trees with a reference to themselves. This is enforced by the cryptographic hash used in each Ref.

Tasks & Jobs

Tasks

A Task is a well defined unit of work. Well defined means that all the information that could be required to complete the Task is explicitly contianed or precisely implied by the Task.

Tasks have 2 parts: an Operation and an Input. Operation refers to one of a small number of built in operators. Here are some example Operations

  • glfs.place
  • glfs.pick
  • import.fromURL
  • wasm.wasip1
  • qemu.amd64_microvm

This list is not exhausted, but there are only around a dozen of them.

Input is small payload of bytes. In practice this is a Ref, which references a filesystem to perform the operation on.

These 2 components are included in a hash which becomes the TaskID.

The TaskID is used as key for looking up the result of Task. Want caches the output of Tasks when they are successfully completed, and checks this cache before computing a Task.

An immediate rerun of want build without any changes will result in all the same Tasks, which can all be skipped.

Jobs

Jobs are used to track running computations in Want.

Jobs are created to compute Tasks. Jobs contain active state like whether they have completed and the start and end time.

Jobs are organized into a forest of hierarchical trees. All the major operations in Want create a new tree in the forrest, which is a "root" Job. That job may spawn additional child Jobs.

Child Jobs give computations access to additional compute, since the Child Jobs are computed in parallel. Child Jobs also present a way for Jobs to utilize the cache, which further benefits performance.

Jobs are unable to spawn root or sibling Jobs, they can only spawn Jobs which are their immediate children. This structuring prevents runaway computations. Every new Job has an audit trail, which ultimately goes back to the User asking for something to be done.

Build Configuration

Want configuration is mostly done using 2 types of files: Expression files, which end in .want and Statement files, which end in .wants.

The 3rd configuration file is the WANT file at the root of every module.

All Want configuration is done using Jsonnet. Want provides a small library of primitives, which you can read about here.

Module Files

Want operates on units called Modules. A Module is a Tree, with a Blob containing Jsonnet at the path WANT.

The WANT file provides module level configuration. It also serves as a sanity check that the directory is supposed to be built by Want. The structure and contents of this configuration file are discussed in this section.

No user configuration is required by default. The easiest way to create a Want module is to run init.

Jsonnet Context

Want modules have access to a single import @want, which contains the standard library. It is imported in the default generated WANT file. These functions are more than enough to get any libaries needed to plan the build.

Allowed Keys

ignore: PathSet

This controls which paths will be ignored by Want. Anything in this set will not even be imported.

By default paths used by .git are ignored. If your project does not use Git, this configuration can be removed. If your project does use Git, changing this can significantly slow down importing the module.

namespace: Map[String, Expr]

namespace should resolve to a JSON object where the values are filesystem Expressions.

By default namespace contains a single entry for want.

{
    namespace: {
        "want": {blob: importstr "@want"},
    }
}

This allows the standard library to be used anywhere in the module. Anything in this namespace object will be accessible for import in any .want or .wants file in the module.

Dependencies added here should be for planning the build. You might have a single dependency here per programming language in your project. Most of the build dependencies for a project should be in the other configuration files.

Expression Files

Expression files are written as Jsonnet. They must evaluate to a single Want Expression.

Expression files evaluate to a filesystem object in the output tree. A file myexpr.want in the build input tree, would be replaced at the same path in the build output tree, but the contents would be the evaluation of the expression.

It is impossible for expression files to produce conflicts with other expression files. This is because each expression file already coexists with other expression files in the source tree, and anything produced from it in the derived tree will be within the same path.

Jsonnet expressions can access a built-in standard library of functions by importing "want".

Example 1: Download the Alpine minirootfs filesystem**

In a file called myexpr.want

local want = "want";

want.importURL(
    url="http://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/alpine-minirootfs-3.21.2-x86_64",
    algo="SHA256",
    hash="4aa3bd4a7ef994402f1da0f728abc003737c33411ff31d5da2ab2c3399ccbc5f",
    transforms=["ungzip", "untar"])

This would be replaced in the output with the alpine minirootfs

$ want ls myexpr.want
755 tree gsPPzRn4RplJ-A1y bin
755 tree flC8iUMtcPPVF3re dev
755 tree ypQygCpJM8hd2GLF etc
755 tree flC8iUMtcPPVF3re home
755 tree GmEGuDFDQW3dR5wy lib
755 tree HVW60D2PJ_3_amD6 media
755 tree flC8iUMtcPPVF3re mnt
755 tree flC8iUMtcPPVF3re opt
555 tree flC8iUMtcPPVF3re proc
700 tree flC8iUMtcPPVF3re root
755 tree flC8iUMtcPPVF3re run
755 tree RtfdNc5JvrBwxphU sbin
755 tree flC8iUMtcPPVF3re srv
755 tree flC8iUMtcPPVF3re sys
777 tree flC8iUMtcPPVF3re tmp
755 tree 7ygjwYBFIH_ckNLl usr
755 tree 3eskP3ejV1fdXjV6 var
$ 

Try running want ls myexpr.want to build the output tree for this Expression and list its contents.

You can run want build to build the whole module.

Statement Files

Statement files are any files ending in .wants (the s stands for statement)

Statement files are also Jsonnet, but instead of expressing a JSON object, they must express a JSON list of statements.

At the time of writing there is one kind of statement: put.

Put Statement

Put statements write an expression to another location in the build output.

e.g.

local want = import "@want";

[
    want.put(want.unit("./my-statement-output.txt"), want.blob("foo"))
]

You can have as many statement files, and as many statements per file, as you want.

Want Core Library

The Want Core library is available in Module files (WANT) as @want.

e.g.

local want = import "@want";

The auto-generated module file includes the standard library in the module namespace as @want as well, although this is configurable.

Literals

These functions specify an output directly, without any further computation.

blob(contents: String): Expr

Evaluates to a Blob literal.

treeEntry(name: String, mode: String, e Expr): TreeEntry

Specifies an entry in a Tree.

  • name is the name within the tree.
  • mode is parsed as an octal number.
  • e is the contents of the entry.

This is just UNIX permission bits attached to a Filesystem Expr. It is not a valid Filesystem expression on its own.

tree(ents: List[TreeEntry]): Expr

Evaluates to a Tree literal.

NOTE: All elements of a tree must be known. To assemble a tree during the build see pass

Path Sets

These functions specify sets of Paths, which are used in several places in the API.

unit(p: String): PathSet

A set with 1 path p in it.

prefix(p: String): PathSet

All paths that start with p.

suffix(s: String): PathSet

All paths that end with p.

not(x: PathSet): PathSet

All the paths that are not in x

intersect(xs: List[PathSet]): PathSet

The paths in common between all the sets in xs. i.e in xs[0] and xs[1] and xs[2], etc...

union(xs: List[PathSet]): PathSet

The paths which are in any of the sets in xs. i.e. in xs[0] or xs[1] or xs[2], etc...

subtract(l: PathSet, r: PathSet): PathSet

A convenience function which is equivalent to intersect([l, not(r)]).

  • l is the left side of a subtraction expression l - r
  • r is the right side of an subtraction expression l - r

Selections

Selections are Exprs which refer to build inputs or outputs. There are two sources GROUND and DERIVED which are constants available in every jsonnet context. They are not imported from the standard library.

Correct

local want = import "@want";

select(GROUND, want.prefix(""))

Incorrect

local want = import "@want";

select(want.GROUND, want.prefix(""))

Selections have the potential to produce cycles because it is possible to express a circular selection. Want will return an error quickly when a cycle is detected. There is no risk of launching an infinite circular process.

Selecting from GROUND never produces a cycle. GROUND is the state of Module as-is, before any build steps have been computed.

Selecting from DERIVED can produce cycles, because it depends on the build output. An expression which makes a selection can only be computed after expressions which output to that selection.

select(src: Source, q: PathSet): Expr

Evaluates to a filesystem containing paths in q, with data from src.

selectFile(src: Source, p: String): Expr

This is a convenience function for selecting files. It is equivalent to selecting unit(p) and then calling pick to extract the file.

selectDir(src: Source, p: String): Expr

This is a convenience function for selecting directories. It is equivalent to selecting union([unit(p), prefix(p + "/")]). and then calling pick to extract the directory.

Compute

These functions allow computations to be specified

input(name: String, expr: Expr): Input

Specifies an input to a computation. This is not a valid Filesystem expression on it's own.

compute(op: String, inputs: List[Inputs]): Expr

Evaluates to a computed Filesystem. An operation identified by op will be performed on the inputs provided. The inputs will also be computed if the have not been already.

These are the core functions in Want that everything is based on.

pass(inputs: List[Inputs]): Expr

Short for "passthrough". This performs no additional computation other than assembling the inputs into a single directory, which is done for all computations.

Git-Like Filesystem

Want represents all data in a format called the Git-Like Filesystem or GLFS for short. Primitive operations on the GLFS Refs are essential.

place(x: Expr, p: String): Expr

Creates a chain of Trees according to the path p. They will lead to the value of x.

For example if p was a/b/c/d then the resulting filesystem would contain:

a/b/c/d => x

pick(x: Expr, p: String): Expr

For example if x contained

a.txt       => 0000
b/foo.txt   => 1111
c/d/e.txt   => 2222
f.txt       => 3333

Then pick(x, "b/foo.txt") would evaluate to 1111

filter(x: Expr, query: PathSet): Expr

Returns x but only containing paths in query.

Imports

Computations in Want are cut off from the network and other external resources. The only way to retrieve information from the external resources is through the import system.

importURL(url: String, algo: String, hash: String, transforms: []String): Expr

Downloads data from the url. The data will be hashed using the specified algorithm. If the actual hash does not match hash, then the import will fail. Transforms are applied after the hash check.

Hash Algorithms

  • SHA2-256, SHA256
  • SHA2-512, SHA512
  • SHA3-256
  • BLAKE2b-256
  • BLAKE3-256

Transforms

  • ungzip
  • unzstd
  • unxz
  • untar (must be last)
  • unzip (must be last)

importGit(repoUrl: String, commitHash: String): Expr

Imports the Git Tree from the Commit identified by commitHash.

importOCI(url: String, algo: String, hash: String): Expr

Imports an Open Container Initiative (Docker) Image.

e.g.

want.importOCIImage(
    "docker.io/library/alpine",
    algo="sha256",
    hash="48d9183eb12a05c99bcc0bf44a003607b8e941e1d4f41f9ad12bdcc4b5672f86",
)

unpack(x: Expr, transforms: []String): String

Unpack isn't a real import, it doesn't use the network, but it uses the transform functionality from the import system. It takes an existing filesystem expression and applies the transforms in order to return the output.

unpack supports the same transforms as importURL

Statements

Statements can only be used in a statement file (ending in .wants)

put(dst: PathSet, x: Expr): Stmt

Creates a target occupying dst in the build output. The contents of dst will be taken by copying the paths from the evaluation of x.

putFile(dst: String, x: Expr): Stmt

Creates a target, which will be a single path dst in the build output. It will be populated by the evaluation of x, which must evaluate to a file. It is equivalent to put(unit(dst), place(x, dst))

putDir(dst: String, x: Expr): Stmt

Creates a target occupying prefix(dst) in the build output. It will be populated by the evaluation of x. It is equivalent to put(union([unit(dst), prefix(dst + "/")]), place(x, dst).