% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/vocab.R
\name{load_vocab}
\alias{load_vocab}
\title{Load a vocabulary file}
\usage{
load_vocab(vocab_file)
}
\arguments{
\item{vocab_file}{path to vocabulary file. File is assumed to be a text file,
with one token per line, with the line number corresponding to the index of
that token in the vocabulary.}
}
\value{
The vocab as a named integer vector. Names are tokens in vocabulary,
values are integer indices. The casedness of the vocabulary is inferred
and attached as the "is_cased" attribute.

Note that from the perspective of a neural net, the numeric indices \emph{are}
the tokens, and the mapping from token to index is fixed. If we changed the
indexing, it would break any pre-trained models. This is why the vocabulary
is stored as a named integer vector, and why it starts with index zero.
}
\description{
Load a vocabulary file
}
\examples{
# Get path to sample vocabulary included with package.
vocab_path <- system.file("extdata", "tiny_vocab.txt", package = "wordpiece")
vocab <- load_vocab(vocab_file = vocab_path)
}
