LaTeX Lexer¶
This module contains all classes for lexing LaTeX code, as well as general purpose base classes for incremental LaTeX decoders and encoders, which could be useful in case you are writing your own custom LaTeX codec.
-
class
latexcodec.lexer.
Token
(name, text)¶ Token(name, text)
-
class
latexcodec.lexer.
LatexLexer
(errors='strict')¶ Bases:
latexcodec.lexer.RegexpLexer
A very simple lexer for tex/latex bytes.
-
class
latexcodec.lexer.
LatexIncrementalLexer
(errors='strict')¶ Bases:
latexcodec.lexer.LatexLexer
A very simple incremental lexer for tex/latex code. Roughly follows the state machine described in Tex By Topic, Chapter 2.
The generated tokens satisfy:
- no newline characters: paragraphs are separated by ‘par’
- spaces following control tokens are compressed
-
get_tokens
(bytes_, final=False)¶ Yield tokens while maintaining a state. Also skip whitespace after control words and (some) control symbols. Replaces newlines by spaces and par commands depending on the context.
-
class
latexcodec.lexer.
LatexIncrementalDecoder
(errors='strict')¶ Bases:
latexcodec.lexer.LatexIncrementalLexer
Simple incremental decoder. Transforms lexed LaTeX tokens into unicode.
To customize decoding, subclass and override
get_unicode_tokens()
.-
decode
(bytes_, final=False)¶ Decode LaTeX bytes_ into a unicode string.
This implementation calls
get_unicode_tokens()
and joins the resulting unicode strings together.
-
decode_token
(token)¶ Returns the decoded token text in
inputenc
encoding.Note
Control words get an extra space added at the back to make sure separation from the next token, so that decoded token sequences can be
str.join()
ed together.For example, the tokens
b'\hello'
andb'world'
will correctly result inu'\hello world'
(remember that LaTeX eats space following control words). If no space were added, this would wrongfully result inu'\helloworld'
.
-
get_unicode_tokens
(bytes_, final=False)¶ Decode every token in
inputenc
encoding. Override to process the tokens in some other way (for example, for token translation).
-
-
class
latexcodec.lexer.
LatexIncrementalEncoder
(errors='strict')¶ Bases:
codecs.IncrementalEncoder
Simple incremental encoder for LaTeX. Transforms unicode into
bytes
.To customize decoding, subclass and override
get_latex_bytes()
.-
encode
(unicode_, final=False)¶ Encode the unicode_ string into LaTeX
bytes
.This implementation calls
get_latex_bytes()
and joins the resultingbytes
together.
-
flush_unicode_tokens
()¶ Flush the buffer.
-
get_latex_bytes
(unicode_, final=False)¶ Encode every character in
inputenc
encoding. Override to process the unicode in some other way (for example, for character translation).
-
get_unicode_tokens
(unicode_, final=False)¶ Split unicode into tokens so that every token starts with a non-combining character.
-
getstate
()¶ Get state.
-
reset
()¶ Reset state.
-
setstate
(state)¶ Set state. The state must correspond to the return value of a previous
getstate()
call.
-