ruleguard: dynamic inspection rules for Go

If you found a typo or a misspelling, please file an issue or send a PR that fixes it.

Original (ru): https://habr.com/post/481696/.

This article introduces a new static analysis library (and CLI utility) go-ruleguard. It’s like a gogrep that is adapted for the use inside your CI pipeline.

You describe static analysis rules in terms of a special Go-like DSL. During the startup, ruleguard turns these definitions into a set of inspections to be executed.

As a bonus, we’ll also talk about go/analysis and it’s predecessors.

Static analysis extensibility

There is a lot of Go linters, but only a few of them can be extended. Usually, you need to write a Go code that uses a special linter API to add new inspections for it.

Two main options are Go plugins and monolith. The monolith implies that all inspections (including your own) are available during the compilation.

revive follows the monolith path as you’re expected to include the new checks into the linter core to integrate them. go-critic supports plugins model that makes it possible to build analyzer extensions independently from the go-critic code. Either way, in the end, you’re going to work with go/ast and go/types plus some linter plugin API to implement the inspection itself. Even the simplest checks can require a significant amount of boilerplate code.

go/analysis will simplify this picture since it provides a framework that can be used by the different static analysis tools, so we get rid of the custom linter API part at least partially. However, it doesn’t simplify the inspection implementation itself.

About go/loader and go/packages

When you’re implementing a Go static analyzer, you want to inspect an AST or SSA form of a target Go program. Before you can do that, source code needs to be properly “loaded”. Simply speaking, loading procedure includes parsing, type checking and dependencies importing.

Every linter has to do all these loading steps in order to perform any meaningful work. loader package was one of the first attempts to make this process less tiresome. With loader it was possible to “load” all you need with a couple of function calls. loader became deprecated before it could stop being experimental. packages was a new way to do source code loading: it has an improved API and works with Go modules.

Some time ago I created lintpack - a linters framework that was used inside go-critic. It could build a linter executable that includes different inspections providers. Now it’s deprecated as well because we have analysis framework from the Go team.

Nowadays you’re encouraged to use the above-mentioned analysis framework to create a static analysis tool without using Go parser of go/packages directly. The analysis package gives you a paradigm, a structure that you have to follow. You get a lot of good features and utilities in return. For instance, it greatly simplifies analyzer testing. If you wonder, lintpack also had linttest package that does the same, only the magic comment syntax is different.

ruleguard - created to be extended

go-ruleguard is a static analysis tool that includes zero inspections by default.

Rule definitions are loaded during the start from a special gorules file which describes bad code patterns in a declarative way. For every such pattern, there is an associated message to be printed if the pattern would match. That file is an extension point and is intended to be edited by the users.

You don’t need to re-compile the linter driver (main) program just to register new checks. This is why we can call these rules dynamic.

ruleguard driver program looks like this:

package main

import (
	"github.com/quasilyte/go-ruleguard/analyzer"
	"golang.org/x/tools/go/analysis/singlechecker"
)

func main() {
	singlechecker.Main(analyzer.Analyzer)
}

analyzer is implemented via ruleguard package. If you want to use ruleguard as a library, this is the right package to use.

ruleguard VS revive

Let’s take a simple yet real code example. Imagine that we want to bad runtime.GC() calls in our programs. Revive has a call-to-gc diagnostic for that.

call-to-gc implementation (70 lines of code):

package rule

import (
	"go/ast"

	"github.com/mgechev/revive/lint"
)

// CallToGCRule lints calls to the garbage collector.
type CallToGCRule struct{}

// Apply applies the rule to given file.
func (r *CallToGCRule) Apply(file *lint.File, _ lint.Arguments) []lint.Failure {
	var failures []lint.Failure
	onFailure := func(failure lint.Failure) {
		failures = append(failures, failure)
	}

	var gcTriggeringFunctions = map[string]map[string]bool{
		"runtime": map[string]bool{"GC": true},
	}

	w := lintCallToGC{onFailure, gcTriggeringFunctions}
	ast.Walk(w, file.AST)

	return failures
}

// Name returns the rule name.
func (r *CallToGCRule) Name() string {
	return "call-to-gc"
}

type lintCallToGC struct {
	onFailure             func(lint.Failure)
	gcTriggeringFunctions map[string]map[string]bool
}

func (w lintCallToGC) Visit(node ast.Node) ast.Visitor {
	ce, ok := node.(*ast.CallExpr)
	if !ok {
		return w // nothing to do, the node is not a call
	}

	fc, ok := ce.Fun.(*ast.SelectorExpr)
	if !ok {
		return nil // nothing to do, the call is not of the form pkg.func(...)
	}

	id, ok := fc.X.(*ast.Ident)

	if !ok {
		return nil // in case X is not an id (it should be!)
	}

	fn := fc.Sel.Name
	pkg := id.Name
	if !w.gcTriggeringFunctions[pkg][fn] {
		return nil // it isn't a call to a GC triggering function
	}

	w.onFailure(lint.Failure{
		Confidence: 1,
		Node:       node,
		Category:   "bad practice",
		Failure:    "explicit call to the garbage collector",
	})

	return w
}

This is how it’s done in ruleguard:

package gorules

import "github.com/quasilyte/go-ruleguard/dsl/fluent"

func callToGC(m fluent.Matcher) {
	m.Match(`runtime.GC()`).Report(`explicit call to the garbage collector`)
}

In my opinion, this approach is almost as terse as we can get without sacrificing the Go syntax which is useful for us to get tooling support when editing gorules.

We’ll get to the more exciting examples now, but you should already see the difference.

Quickstart

There is a rangeExprCopy checker in go-critic linter. It finds potentially unwanted array copying.

This code is iterated over a copy of the array:

var xs [2048]byte
for _, x := range xs { // Copies 2048 bytes
	// Loop body.
}

Since every iteration does element copy as well, we copy 2 times more bytes than we might expect.

A fix is quite simple, it requires only 1 character addition:

  var xs [2048]byte
- for _, x := range xs {  // Copies 2048 bytes
+ for _, x := range &xs { // No copy
  	// Loop body.
  }

Most likely, you don’t need that excessive array value copy. Fixed code performance is almost always superior, especially for a bigger array size. You can either wait until the Go compiler is better or you can find such code patterns and fix them today.

This inspection can be implemented in terms of ruleguard DSL:

package gorules

import "github.com/quasilyte/go-ruleguard/dsl/fluent"

func _(m fluent.Matcher) {
    m.Match(`for $_, $_ := range $x { $*_ }`,
            `for $_, $_ = range $x { $*_ }`).
            Where(m["x"].Addressable && m["x"].Type.Size >= 128).
            Report(`$x copy can be avoided with &$x`).
            At(m["x"]).
            Suggest(`&$x`)
}

This rule finds all for-range loops which uses both loop variables (only this case leads to unwanted copy). Where() requires that iterated expression $x is addressable and its size should be at least 128 bytes.

Report() defines an associated message that should be given to the user if the pattern is matched. Suggest() specifies a quickfix pattern that can be used by your editor through gopls (or other Go LSP) or via command-line API if you use -fix parameter (we’ll discuss that later in more detail). At() binds the warning and quickfix location to the specific part of the match. We need that location specification so we rewrite $x to &$x instead of replacing the entire for loop with that.

Both Report() and Suggest() accept a template-like string that can reference pattern submatches. Predefined variable $$ refers to the entire match, like $0 in the regular expressions.

To try it out, let’s create a rangecopy.go file:

package example

// sizeof(builtins[...]) = 240 on x86-64
var builtins = [...]string{
	"append", "cap", "close", "complex", "copy",
	"delete", "imag", "len", "make", "new", "panic",
	"print", "println", "real", "recover",
}

func builtinID(name string) int {
	for i, s := range builtins {
		if s == name {
			return i
		}
	}
	return -1
}

Now we run the ruleguard:

$ ruleguard -rules rules.go -fix rangecopy.go
rangecopy.go:12:20: builtins copy can be avoided with &builtins

If we look into rangecopy.go again, we’ll see the fixed result, because ruleguard was called with -fix argument.

By the way, simpler rules can be debugged without gorules file:

$ ruleguard -c 1 -e 'm.Match(`return -1`)' rangecopy.go
rangecopy.go:17:2: return -1
16		}
17		return -1
18	}

Thanks to the singlechecker package, we have -c option that controls how many “context lines” are printed along with the match result.

This option is a little bit weird: default value is -c=-1 which means “no context lines” while -c=0 gives you exactly one context line (the matched line itself).

Some more notable features of gorules:

  • Type templates to match expected types. For example, map[$t]$t describes all maps that have key type identical to the element type and *[$len]$elem matches all pointers to arrays.
  • There can be multiple rules inside one function. Functions themselves are called rule groups.
  • All rules inside a group are applied one after another, in the order they are defined. The first matched rule makes all remaining rules to be skipped for the matched node. This is needed in cases where patterns are defined with priorities in mind: when you want to rewrite $x=$x+$y to $x+=$y it would be desirable to have a higher priority pattern $x=$x+1 that is rewritten to $x++ instead of $x+=1.

See more information about the DSL in docs/gorules.md file.

Multi-rule function example

package gorules

import "github.com/quasilyte/go-ruleguard/dsl/fluent"

func exampleGroup(m fluent.Matcher) {
        // Find potentially incorrect usages of json.Decoder.
        // See http://golang.org/issue/36225
        m.Match(`json.NewDecoder($_).Decode($_)`).
                Report(`this json.Decoder usage is erroneous`)

        // Smart unconvert, removes redundant conversions.
        m.Match(`time.Duration($x) * time.Second`).
                Where(m["x"].Const).
                Suggest(`$x * time.Second`)

        // Suggest to replace fmt.Sprint($x) with a call to String() method
        // if $x has such method.
        m.Match(`fmt.Sprint($x)`).
                Where(m["x"].Type.Implements(`fmt.Stringer`)).
                Suggest(`$x.String()`)

        // Simplify some boolean expressions.
        m.Match(`!($x != $y)`).Suggest(`$x == $y`)
        m.Match(`!($x == $y)`).Suggest(`$x != $y`)
}

If a rule has no explicit Report() call, Suggest() message is used instead.

Submatch filters can have various property constraints:

  • Var.Pure requires an expression to be side-effect-free.
  • Var.Const expects an expression to be usable inside a const context
  • And more…

For package-qualified type names like fmt.Stringer you need to use Import() method. For convenience reasons, all stdlib packages are imported by default, this is why we don’t need to import anything in the examples above.

quickfix actions

quickfix actions are implemented by analysis framework, we get them for free.

In the analysis model, analyzer generates diagnostics and facts. Diagnostics are sent to the users, facts are used by other analyzers.

A diagnostic can have a list of suggested fixes. Every suggestion tells how to modify source code in a given location to resolve a problem found by a diagnostic.

More detailed description can be found inside suggested_fixes.md design document.

Closing words

Try ruleguard on your projects.

In case you found any bug in ruleguard or you have a feature request, please open an issue and let me know.

Here are some ideas on how you can use ruleguard:

  • Custom inspections implementations.
  • Automated code modernization or refactoring with -fix.
  • Collect and process code statistics with the help of -json flag.

ruleguard development plans:

Useful links and resources