Beyond Syntax Highlighting in Neovim:
Unlocking the Power of Tree-sitter

atusy

atusy

Special Thanks

Questions

  • Do you know tree-sitter?
  • Do you use tree-sitter?
  • Do you use tree-sitter in Vim?

What is tree-sitter in general

  • Parsing library (enough for today)
  • More specifically, a parser generator tool & an incremental parsing library
    • supports on diverse programming languages
    • fast enough to react text changes
    • robust user experience under syntax errors

What is tree-sitter in Neovim/Vim?

  • In Neovim
    • Builtin feature to support syntax-aware featues, e.g. syntax highlighting
  • In Vim

Is tree-sitter for syntax highlighting?

  • No, tree-sitter is just a parsing library
  • Applicable to variety of syntax-aware features
    • and Neovim has already integrated tree-sitter into many features

How many are the builtin?

  1. Syntax highlighting
  2. Code folding
  3. Outline
  4. Pairing keywords (like matchpair)
  5. Toggling comments
  6. Popup menu (e.g., Open URL)
  7. Open help in browser
  8. Indent
  9. Range selection/textobject
  10. Sticky scroll

The answer is ... 7/10

  • There are more builtin- and plugin-features
  • The power of tree-sitter is already unlocked in Neovim!!
  • If you use Vim, sorry for inconvenience..., but I have a good news today

Today's goal

Be aware of tree-sitter as a tool to build your own workflow by

  1. Exploring usecases beyond syntax highlighting
  2. Showing insightful tree-sitter integration patterns
  3. Introducing treesitter-ls, a language server

What I don't cover today

  • Details of configurations and plugin development
  • Deep dive into tree-sitter internals

Usecases by Neovim-builtin features

Let's learn

  • Usage
  • Implementation pattern
  • Insight

from variety of features

Syntax Highlighting

  • Usage
    • vim.treesitter.start() to start syntax highlighting given a parser

Syntax Highlighting

  • Implementation pattern
    • Parser determines code structure
      • e.g., "foo" is string node
    • Query searches for what to highlight
      • e.g., (string) @string
    • Captures are regarded as highlight groups
      • e.g., :hi @string guifg=Black

Definitions of terms

  • Query
    • A language to search for code structures like (string) @string
    • Stored in files per language and purpose like ~/.config/nvim/queries/python/highlights.scm
  • Capture is a name of a matched node in a query

Syntax Highlighting

  • Demo with examples/url.py and runtime/queries/python/highlights.scm
    1. Highlight nothing
    2. Highlight comment and string
    3. Highlight all string_content in string
    4. Highlight only URL-like string_content

Syntax Highlighting

  • Example 1
    • Query string nodes
; node    ->  capture (highlight group)
(string)      @string

Syntax Highlighting

  • Example 2
    • Query URL-like string_content
      • by testing nest pattern and regex
(
  string (string_content) @string.special.url
  (#match? @string.special.url "^https?://\\S+$")
)

Syntax Highlighting

  • Insight
    • Query can capture complex pattern of nodes
      • e.g., nest pattern and regex pattern
    • Query is customizable
    • Users can define what to highlight per filetype without parser modification

Code folding with foldexpr

  • Usage
    • Enables structure-aware code folding based on syntax tree
:set foldmethod=expr
:set foldexpr=v:lua.vim.treesitter.foldexpr()

Code folding with foldexpr

  • Implementation pattern
    • Query searches for what to fold
      • e.g., (function_definition) @fold
    • Neovim determines how to fold
      • by calculating fold levels of the captures

Code folding with foldexpr

  • Insight
    • Users can customize foldable nodes per filetype without parser modification

Language Injections

  • Usage
    • Apply tree-sitter features to embedded languages by recursive parsing
      • e.g., highlight code blocks

Language Injections

  • Implementation pattern
    • Query determines what to inject
      • embedded source codes as @injection.content
      • corresponding languages as @injection.language

Language Injections

  • Example 1
    • Parse markdown code blocks
(fenced_code_block
  (info_string
    (language) @injection.language)
  (code_fence_content) @injection.content)

Language Injections

  • Example 2
    • Parse URL-like strings as URIs
(
  (string_content) @injection.content
  (
      #match? @injection.content 
      "^[a-zA-Z][a-zA-Z0-9]*://\\S+$"
  )
  (#set! injection.language "uri")
)

Language Injections

  • Insight
    • Opens door to apply language-specific features (highlighting, folding, etc.) to embedded content

Context-aware popup menu

  • Usage
    • Right-click popup menu shows context-specific actions

Context-aware popup menu

  • Example
    • "Open in web browser" for URL-related nodes

Context-aware popup menu

  • Implementation pattern
    • Query sets url metadata to nodes
    • Lua code tests if the node has url metadata
(
  (inline_link
      (link_destination) @_url
  ) @_label
  (#set! @_label url @_url)
)

Context-aware popup menu

  • Insight
    • Set metadata to detemine complex pattern of what to process

Quick summary 1

  • Query-based approach is a common pattern
    • Queries define what to process in language-specific way
    • Neovim APIs define how to process (often) in language-agnostic way
  • Customize queries to meet your needs

Tips to get started with queries

Usecases by plugins

Some of my favorites...

andymass/vim-matchup

  • Usage
    • Navigate and highlight matching keywords
      • keywords: if/else/end, ...
      • quotes: "", '', ``, ...
      • braces: (), [], {}, ...
  • Implementation pattern
    • Query captures open/mid/close nodes
    • Uses special query files, matchup.scm
      • This avoids conflicts with other queries such as highlights.scm and folds.scm
  • Example query for Lua loops

https://github.com/andymass/vim-matchup/blob/ca538c3b/after/queries/lua/matchup.scm?plain=1#L3-L5

(
  for_statement
  "do" @open.loop
  "end" @close.loop
) @scope.loop
  • Insight
    • Query defines what to match in language-specific way, and let Lua code handle how in language-agnostic way
    • Plugin-specific query files is a good pattern to avoid conflicts among multiple plugins

Sticky scroll

nvim-treesitter/nvim-treesitter-context

Sticky scroll

  • Usage
    • Show parent contexts at the top of the window
      • Markdown headings
      • Function/method/class definitions

Sticky scroll

  • Implementation pattern
    • Use queries to capture context nodes (@context)

Sticky scroll

  • Insight
    • Yet another example of query-based approach

Show context at the ends of functions, methods, statements, ...

haringsrob/nvim_context_vt

Show context at the ends of functions, methods, statements, ...

  • Usage
    • Shows virtual text of the current context after functions, methods, statements, etc.

Show context at the ends of functions, methods, statements, ...

  • Implementation pattern
    • Hard code node types in Lua, no queries
      • Heavy dependency on parsers
      • Common node types allow partially language-agnostic implementation (e.g., function)

Show context at the ends of functions, methods, statements, ...

  • Insight
    • Parser-based approach can be more language-agnostic because parsers tend to share common node types

Quickly select syntactic regions

atusy/treemonkey.nvim

Quickly select syntactic regions

  • Usage
    • Quickly select syntactic regions by label hints

Quickly select syntactic regions

  • Implementation pattern
    • Get node ranges of anscestor nodes by traversing syntax tree from the cursor position
    • Use two-step selection to disambiguate overlapping label hints

Quickly select syntactic regions

  • Insight
    • Tree-traversal-based approach
      • Not query-based nor parser-based approaches
      • The tree structure is only the interest
      • Parser-agnostic and query-free

Extra highlight for special nodes

atusy/tsnode-marker.nvim

Extra highlight for special nodes

  • Usage
    • Highlight nodes that satisfy conditions
    • Supports highlighting to the end of the line

Extra highlight for special nodes

  • Examples
    • Highlight markdown fenced code blocks
    • Highlight nested function definitions

Extra highlight for special nodes

  • Implementation pattern
    • Find a node that satisfies one of
      • @tsnodemarker highlight capture
      • User-defined callback functions
    • Applies additional highlights via extmarks

Extra highlight for special nodes

  • Insight
    • Callback-based approach allows flexible customization beyond query capabilities

And more...

  1. Auto-close keywords
  2. Outlines

And more...

  1. Textobjects
  2. Commenting

Quick summary 2

  • Vast variety of plugins uses tree-sitter
  • Tree-sitter integration has diverse approaches
    • query
    • parser
    • tree-traversal
    • callback

Quick summary of approaches

  1. Query-based
    • Highly declarative
    • Requires query per lanuguage
  2. Parser-based
    • Requires Lua code identify what to process
    • Partially language-agnostic logic with the help of common node types

Quick summary of approaches

  1. Tree-traversal-based
    • Applies iff only tree structure matters
    • Parser-agnostic and query-free
  2. Callback-based
    • Requires user-defined Lua functions
    • Allows maximum flexibility

Yet another approach to bring tree-sitter power to editors

🚧 Treesitter-ls 🚧

A WIP language server powered by tree-sitter

https://github.com/atusy/treesitter-ls

Language servers provide intelligence tools

  • Go to Definition
  • Find References
  • Folding Range
  • Selection Range
  • Rename
  • Semantic Tokens

Typical language servers are language-specific

such as ...

  • pyright (Python)
  • tsserver (TypeScript/JavaScript)
  • rust-analyzer (Rust)

Are language servers language-specific?

Can treesitter-ls provide intelligence tools?

Yes, for example:

  • Semantic Tokens (highlighting)
  • Go to Definition
  • Find References
  • Folding Range
  • Selection Range
  • Rename

Will treesitter-ls replace language-specific servers?

No, because ...

  • Some require deeper semantic understanding
    • e.g., type checking, linting, code actions, ...
  • treesitter-ls capable features can be better provided by language-specific servers
    • e.g., scope-aware go to definition

Why treesitter-ls?

To unlock various possibilities:

  1. Unlock availability to any LSP-capable editor
    • Allows Bram's Vim users to enjoy tree-sitter features
    • Editors may even omit builtin syntax highlighting, which tend to be essential

Why treesitter-ls?

  1. Unlock gaps in language-specific servers
    • Provide consistent Folding, SelectionRange, and SemanticTokens where absent in language-specific servers
    • People may forget about which server supports which feature

Why treesitter-ls?

  1. Unlock difficulty to support niche or emerging languages
    • Just ship a tree-sitter parser + queries instead of a full server
    • Developer productivity boost for new languages

Why treesitter-ls?

  1. Unlock injected-language workflows
    • Attach language-specific servers to code blocks or other injected contents by treesitter-ls itself being LSP-client

ENJOY!!

  • Syntax-aware features are powerful
    • e.g., highlighting, code folding, outline, sticky scroll, range selection, and more
  • Use tree-sitter to implement syntax-aware features
  • Star atusy/treesitter-ls that brings tree-sitter power to any LSP-capable editor

How come tree-sitter allows variety of usecases?

  • Identifying node type of a region
    • function definition, string literal, assignment expression, ...
  • Identifying hierarchical structure of nodes
  • Allowing query nodes by type and structure

Demo scenario - Open examples/url.py - Start with blank query - Add `((string) @string)` - Add `((string) (string_content) @string.special.url)` - Update the above to `((string) (string_content) @string.special.url) (#match? @string.special.url "^https?://\\S+$")`

### Language Injections

optional

[Source in Neovim](https://github.com/neovim/neovim/blob/a04c73ca071fdc2461365a8a10a314bd0d1d806d/runtime/lua/vim/_defaults.lua?plain=1#L487-L489)

[Source in Neovim](https://github.com/neovim/neovim/blob/a04c73ca071fdc2461365a8a10a314bd0d1d806d/runtime/queries/markdown_inline/highlights.scm?plain=1#L94-L96)

show demo what happens if highlight is done only by builtin feature