Pandoc

View source | View history | Atom feed for this file

Creation date: 2014-10-31
Last substantive revision date: 2014-10-31
Last modification date: 2022-05-03
Generated on: 2025-03-16
Completion status: notes
Belief: possible

Pandoc is a convenient tool for converting documents between different formats—for instance to convert a markdown document into a PDF. Another example: I use Pandoc for this site to convert from my markdown source documents to the HTML on the actual site. With some creativity, one can use Pandoc to do accomplish some neat tasks.

Installing

See my page on installing Haskell for more information.

Going from markdown to HTML

To give a concrete example of going from markdown to HTML (besides the case of creating this site, as stated above), one might want to compose an email in markdown and send it in HTML; how can this be done efficiently?

If you are on Linux and use a combination of tools like Vim and xclip in addition to Pandoc¹, then this can easily be achieved. After composing the email in Vim, simply write the buffer directly into Pandoc, convert the markdown to HTML, and send that into the clipboard with xclip:

:w !pandoc -f markdown -t html | xclip -sel clip

Then just paste the results into Gmail using Control-V (or similar).

This procedure can then be bound to a command or a keyboard shortcut for repeated use.

If you in addition choose to write the markdown buffer to a file, the message will be locally backed up, as well as being on the Gmail servers (which is good)—though you should really have a real backup method.

Going from HTML to markdown

How about going the opposite direction? A useful example here is if you want to copy HTML from a website, retain its formatting, but convert it to markdown.

This is possible again using a combination of xclip and Pandoc. After copying the HTML text from a website, one can do

xclip -t text/html -o | pandoc -f html -t markdown

(Turn on the -selection clip flag on xclip if that doesn’t work).

The other option, which is slighly more tedious, is to use something like the online editor on WordPress and to paste in the HTML on the “Visual” side and then go to the “Text” side to retrieve the converted source for the HTML; then, place the source into a local HTML file (e.g. temp.html) and run Pandoc on it using something like

pandoc -o temp.md temp.html

Pandoc filters

Using pandocfilters, one can trivially write scripts that modify the JSON representation of Pandoc documents. For this website, I’ve written a custom URL filter that takes links of the form [test](!STRING) and converts them into a DuckDuckGo bang expression. For instance, writing [Fishmans](!w) will search Wikipedia for the string “Fishmans”, and will take you to the page for the band Fishmans.

I’ve also converted the Haskell filters on the Pandoc website to their Python equivalents.

Filtering out messy HTML

From this answer:

:!%pandoc -f html -t markdown-raw_html-native_divs-native_spans

Or more fully:

:new
:r !xclip -sel clip -o -t text/html
:!%pandoc -f html -t markdown-raw_html-native_divs-native_spans --wrap=none

Of course, these can all be replaced in part or whole with other tools, as long as they have the necessary features. The editor must either support markdown–HTML conversion and be able to use the clipboard, or else be able to interact with the shell, for instance.↩︎