DeCODE logo
Rego Treesitter header image

Rego Treesitter implementation

For daily routine, I spend majority of the time in the NeoVim, where I write policies for corporate application. By default, NeoVim provides the Regex system, which parses default OPA policies. But I needed more of that, so I started on investigation how can I achieve a better experience.

Treesitter libraries

I don't need to copy-paste the documentation from the official docs, so to get the general principle of it, you can see this page.

In short, you will need a repo. For this case, my repo will be tree-sitter-rego. Initialize the reporisory for the npm

yarn init -y

This will create a package.json with defaults, which you can modify

Then install the necessary dependencies:

yarn add -D tree-sitter-cli nan

Then it's necessary to create a file for the grammar. Name it grammar.js, which is the convention to name for the tree-sitter. Fill it with following to create a rego parser:

module.exports = grammar({
  name: 'rego',

  rules: {
    source_file: $ => repeat($._definition),

    _definition: $ => choice(
      $.package_definition,
      $.import_package,
      $.operator_check,
      $.comment,
      $.rego_block,
      $.builtin_function,
      $._junk
    ),

    operator: $ => choice(
      '==',
      ':=',
      '=',
      '!=',
      '<',
      '>',
      '/',
      '-',
      '+',
    ),

    true: $ => 'true',
    false: $ => 'false',

    comma: $ => ',',
    
    comment: $ => /\#.*?\n\r?/,

    function_name: $ => choice(
      'lower',
      'is_string',
      'object.get',
      'print',
      'concat',
      'contains',
      'time.now_ns',
      'io.jwt.encode_sign_raw',
      'io.jwt.encode_sign',
      'io.jwt.decode',
      'io.jwt.verify_es256',
      'strings.replace_n',
      'http.send',
    ),

    opening_parameter: $ => '(',

    closing_parameter: $ => ')',

    builtin_function: $ => seq(
      field('function_name', 
        $.function_name
      ),
      field('opening_parameter', $.opening_parameter),
      field('function_body', 
        repeat(
          choice(
            $.identifier,
            $.array_definition,
            $.true,
            $.false,
            $.number,
            $.object_field,
            $.string_definition,
            $.identifier,
            $.comma,
          ),
        ),
      ),
      field('closing_parameter', $.closing_parameter),
    ),

    string_definition: $ => seq(
      '"',
      /[a-zA-Z0-9<>@\-._:=\s\/\\]*/,
      '"',
    ),

    _array_opening: $ => '[',
    _array_closing: $ => ']',

    object_field: $ => prec(
      1, 
      seq(
        /[a-zA-Z\._]+\[/,
        choice(
          $.identifier,
          $.number,
          $.object_field,
          $.string_definition
        ),
        $._array_closing,
      ),
    ),

    array_definition: $ => seq(
      $._array_opening,
      repeat(
        choice(
          $.array_definition,
          $.string_definition,
          $.identifier,
          $.identifier,
          $.number,
          $.object_field,
          $.true,
          $.false,
          $.comma,
        ),
      ),
      $._array_closing,
    ),

    operator_check: $ => seq(
      choice(
        $.identifier,
        $.builtin_function,
        $.string_definition,
        $.object_field,
        $.array_definition,
        $.true,
        $.false,
      ),
      $.operator,
      choice(
        $.identifier,
        $.builtin_function,
        $.string_definition,
        $.object_field,
        $.array_definition,
        $.true,
        $.false,
      ),
    ),

    rego_rule: $ => prec(1, choice(
      $.identifier,
      $.operator_check,
      $.array_definition,
      $.test_case,
      $.true,
      $.false,
    ),),

    test_case: $ => seq(
      $.identifier,
      repeat(
        seq(
          $.reserved_keywords,
          $.identifier,
        ),
      ),
    ),

    rego_block: $ => seq(
      field('rego_rule_name', $.identifier),
      optional(
        seq(
          $.operator,
          $.identifier,
        ),
      ),
      '{',
        repeat($.rego_rule),
      '}',
    ),

    _junk: $ => /\n/,

    reserved_keywords: $ => choice(
      'as',
      'with',
    ),
    
    as_keyword: $ => seq(
      $.reserved_keywords,
      field('package_alias', $.identifier),
    ),

    import_package: $ => seq(
      'import',
      field('imported_package_name', 
        choice(
          $.identifier,
        ),
      ),
      optional($.as_keyword),
    ),

    package_definition: $ => seq(
      'package',
      field('package_name', $.identifier),
    ),

    identifier: $ => /[a-zA-Z\._]+/,

    number: $ => /\d+/
  }
});

After this, you can execute a command tree-sitter generate. For convenience, you can even set up the aliases in scripts for package.json, so you will be able to verify the correctness of your grammar.

{
  "scripts": {
    "generate": "tree-sitter generate",
    "build": "tree-sitter generate && node-gyp configure && node-gyp build",
    "test": "tree-sitter test"
  }
}

Complete source for the parser located here: https://github.com/FallenAngel97/tree-sitter-rego

Photo by Taneli Lahtinen on Unsplash