Pandoc

Pandoc is a utility to convert text markup formats (Markdown, reStructuredText, LaTeX, etc) to a variety of other document formats (.doc, .pdf, HTML, etc). Pandoc also provides its own extension of Markdown syntax (details here).

Be sure to install:

  • pandoc
  • pandoc-data
  • pandoc-citeproc (for citations)

Basic use for creating PDFs

pandoc -o writes output to the file following the -o, with formatting based on the file extension. For example, the following command creates a pdf file from a text file (via pdflatex):

pandoc cv_master.txt -o cv_master.pdf

There are other options for specifying input and output files or formats, but this is the simplest.

PDF formatting options

The output formatting of a pdf document can be set either at the commandline, or by using a header or template file during the conversion.

Passing variables

The -V (or --variable) option changes formatting, fonts, or other settings during a pandoc conversion.

pandoc cover_master.txt -V geometry:margin=1.5in -o  cover_master.pdf
# create pdf with 1.5in margins

Other variables that can be passed with -V are here

Note that some of these variables, such as geometry, can also be passed in a YAML metadata block. See here for details.

Include in header

When -s (or --standalone) is used, pandoc adds the header and footers needed for a standalone document in the output format. To see the default template used, type pandoc -D FORMAT. There are a couple of ways to change what is in this header. The -H=FILE (--include-in-header=FILE) option can include special formatting information contained in FILE with the default header. This can be useful when adding LaTeX formatting commands to the intermediate .tex file before creating the pdf.

pandoc cv_master.txt -H '/path/to/header/' -o cv_master.pdf

Custom document templates

A custom template for the output document header can also be specified using --template=FILE, where FILE is the custom template. If this option is specified, pandoc first looks in the current directory, then the user template directory ($HOME/.pandoc/templates), and then in the default template directory (/usr/share/pandoc...). Templates should end with an extension for the output format (.latex, .html, etc). To create a new template, copy the default for the given output format using:

pandoc -D latex > newtemplate.latex

This is good practice when starting a new project because default templates are updated in new versions of pandoc. The template can then be modified and then used in converting the document using:

pandoc manuscript_1.markdown --template=newtemplate.latex --bibliography=SNOTELsoildata.bib -o manuscript_1.pdf

You can put useful stuff in these templates, like setting margin widths, using packages, etc. A couple I use for pdfs of journal articles are:

\usepackage[margin=1in]{geometry}
\usepackage{textcomp} % provides \textdegree

Other formatting

             Option | Result

----------------------- | ----------------------------------------------------------------------------------------------------------- --toc | Automatically create a table of contents --toc-depth | Specify the header levels to be used in table of contents (implies --toc) --reference-odt=FILE | Use the FILE stylesheets as a template for .odt output. Best if FILE was created with pandoc, then modified. --reference-docx=FILE | Use the FILE stylesheets as a template for .docx output. --latex-engine=ENGINE | Choose the pdflatex|lualatex|xelatex interpreters, needed for some formatting in pdf files.

Citations and bibliographies

Pandoc is capable of including citations from an associated bibliographic database (usually a BibTex file). Citations in a pandoc markdown file look like this: [@citationid] , where citationid is defined in the associated .bib file. To convert a pandoc file with citations, run:

pandoc paper1_draft.markdown -o paper1_draft.pdf --bibliography hiddencanyon.bib

A bibliography will be automatically written after the References heading, if it is included. Citation and bibliography formatting can be specified with --csl=FILE, where FILE is a .csl file (found at http://citationstyles.org). Natbib and biblatex can be used in LaTeX output (pdf) by including them as commandline options.

Working directly with LaTeX

Pandoc markdown is a nice way to draft LaTeX documents. Pandoc markdown can be rendered to TeX (-o document.tex), or rendered as a PDF via pdflatex (-o document.pdf). Raw TeX and LaTeX can be included in a markdown document and it will be passed to the pdflatex writer. For more info on LaTeX/Tex systems see this page.

Citations

This is a citation from Smith \cite{smith.2013}

The citation should be output in BibTex format FIXME// - havent gotten this to work yet//. Inline TeX placed between \begin and \end tags will be interpreted as LaTeX instead of markdown, and will be ignored in non-LaTeX output formats.

Math mode

TeX math can be used by putting it between dollar signs. One dollar sign (each side) for inline mode, and two for display math. Most LaTeX math mode symbols are also transferred to other output formats. For example, rendering HTML documents with greek characters (such as $\theta$) will result in unicode (default) greek characters in the rendered HTML. There are also options for using MathML, MathJax, if this doesn't work.

Errors with unicode special characters

Pandoc will pass unicode characters in a document to pdflatex that it may not know how to display. This is pretty common with greek characters that are used outside of math mode. In this case it is probably best to use the Xetex interpreter instead. This can be specified by sending --latex-engine=xelatex to pandoc. Of course, Xetex must be installed, which is most easily done by installing the full TexLive distribution (large).