How can I access the UTF-8 Lua library in Neovim?

In Neovim 0.5, I’m trying to use the following Lua function for determining the length of a string that contains Unicode characters (such as “áéö”):

utf8.len(string)

Running :lua utf8.len(string) results in an error message:

E5108: Error executing lua [string ":lua"]:1: attempt to index global 'utf8' (a nil value)

On the other hand, I can use utf8.len in the official Lua interpreter (by running lua in the terminal, outside Neovim). I wonder if this is because utf8 library is not included in Neovim. If that is the case, I’m not sure how to reproduce the behaviour with the tools available in Neovim.


My use case: I have written a function that turns the current buffer line into a reStructuredText or Markdown heading. For instance:

before:           after:

My heading        My heading
                  ==========

For this, I could use string.len, but it doesn’t reliably tell the length of the line if it contains at least one Unicode character.

Neovim embeds LuaJIT, which is PUC Lua 5.1 plus some additions (and it falls back to Lua 5.1 on platforms that LuaJIT is not available).

And Lua 5.1 does not include the utf8 module; that’s a later addition. (To forestall questions: Lua versions are not “upgrades” like Python (minor) versions; they represent different languages for embedding.)

That being said, there’s utf8 luarock that you can install with GitHub - wbthomason/packer.nvim: A use-package inspired plugin manager for Neovim. Uses native packages, supports Luarocks dependencies, written in Lua, allows for expressive config and then use from within neovim.

1 Like

If you just want to get the width of the text, you can use vim.fn.strdisplaywidth instead of installing the entire utf8 package.

For example, string.len("äö") would return 4 but vim.fn.strdisplaywidth("äö") would return 2.

3 Likes

Thank you for your replies! I can mark only one of them as a “solution”, so I’ve marked the first one.

Although vim.fn.strdisplaywidth does what I need, without having to rely on the UTF-8 library, so that’s what I’ll be using for now.