Getting Started¶
Overview¶
A lexer and codec to work with LaTeX code in Python.
- Download: http://pypi.python.org/pypi/latexcodec/#downloads
- Documentation: http://latexcodec.readthedocs.org/
- Development: http://github.com/mcmtroffaes/latexcodec/
Installation¶
Install the module with pip install latexcodec
, or from
source using python setup.py install
.
Minimal Example¶
Simply import the latexcodec
module to enable "latex"
to be used as an encoding:
import latexcodec
text_latex = b"\\'el\\`eve"
assert text_latex.decode("latex") == u"élève"
text_unicode = u"ångström"
assert text_unicode.encode("latex") == b'\\aa ngstr\\"om'
There are also a ulatex
encoding for text transforms.
The simplest way to use this codec goes through the codecs module
(as for all text transform codecs on Python):
import codecs
import latexcodec
text_latex = u"\\'el\\`eve"
assert codecs.decode(text_latex, "ulatex") == u"élève"
text_unicode = u"ångström"
assert codecs.encode(text_unicode, "ulatex") == u'\\aa ngstr\\"om'
By default, the LaTeX input is assumed to be ascii, as per standard LaTeX.
However, you can also specify an extra codec
as latex+<encoding>
or ulatex+<encoding>
,
where <encoding>
describes another encoding.
In this case characters will be
translated to and from that encoding whenever possible.
The following code snippet demonstrates this behaviour:
import latexcodec
text_latex = b"\xfe"
assert text_latex.decode("latex+latin1") == u"þ"
assert text_latex.decode("latex+latin2") == u"ţ"
text_unicode = u"ţ"
assert text_unicode.encode("latex+latin1") == b'\\c t' # ţ is not latin1
assert text_unicode.encode("latex+latin2") == b'\xfe' # but it is latin2
Limitations¶
- Not all unicode characters are registered. If you find any missing, please report them on the tracker: https://github.com/mcmtroffaes/latexcodec/issues
- Unicode combining characters are currently not handled.
- By design, the codec never removes curly brackets. This is because it is very hard to guess whether brackets are part of a command or not (this would require a full latex parser). Moreover, bibtex uses curly brackets as a guard against case conversion, in which case automatic removal of curly brackets may not be desired at all, even if they are not part of a command. Also see: http://stackoverflow.com/a/19754245/2863746