Lessons Learned from Writing a PhD Dissertation in Markdown
When researching ways to write my dissertation, I ended up settling on Tom Pollard’s thesis template. I was excited at the prospect of avoiding gigantic, unstable word files, being able to break down this massive work into more manageable chunks, and utilizing all the great features of modern text editors to make markhttps://www.needpix.com/photo/286807/papers-stack-heap-documents-business-paperwork-information-stacked-researchdown writing easier (intellisense, code folding, and syntax highlighting, for starters). However, this project was not without its pain points. For anyone else who ends up going this route, I’d like to share some of the ups and downs of this journey.
Dealing with Large Tables
Just because a table looks nice in markdown doesn’t mean it’s going to look correct in its PDF form. Often, the text in column headers or cells overlaps, or the spacing between columns doesn’t make sense. Thankfully, Pandoc has its own custom table syntax that allows for line breaks. My recommended workflow to do this is as follows:
- Create a markdown file outside of the source directory (perhaps in
a directory called “bin” or “scratch” for clarity). Let’s call it
tables.md
for this tutorial - Paste your problem tables (in markdown form) into this file
- Run
pandoc -t markdown+multiline_tables -o cleaned_tables.md tables.md
- Paste the new multi-line tables where the original tables would be in your document
There are some special cases where this won’t be sufficient. Even though
there’s a great package for figure short
captions, there’s
no such thing for tables as of yet. In your bin/scratch directory, run
pandoc -H ../styles/preamble.tex -o tables.tex tables.md
. Now, take
the LaTeX version of your table, and insert square brackets between
\caption
and the curly brace. This can also help with sideways
tables, as you just need to wrap the latex table with
\begin{landscape}
and \end{landscape}
.
Other Tips
My fork of the original
template uses
several other Pandoc filters, including
xnos and
shortcaptions.
These cut down on the amount of LaTeX needed. It should be said,
however, that I wasn’t able to get the autocomplete feature of fignos to
work (+@fig
and *@fig
, respectively). This can be fixed by a
simple find and replace, with +@ being replaced with Fig. @. This also
works for equations and tables, just be mindful that you’re using the
proper amount of whitespace. Other tips:
- Install a stable version of TexLive (or your distro of choice) EARLY to avoid headaches down the road.
- Make sure tlmgr works properly to install all your packages. Check
which tlmgr
both with and without sudo permissions to make sure its pointing to your install, which is especially important on Debian-based systems. - Compile early and often. As mentioned previously, there are many strange quirks with PDF compilation.
- Make sure to find you which version of Pandoc you’re using
(
pandoc --version
). This is especially necessary if you have conda in your PATH, as it installs its own version. My Mac was using 1.1.3, instead of the current 2.7! - I would highly recommend VSCode as an editor. It has phenomenal markdown syntax highlighting and previews with Markdown All in One, intellisense completion of figure referencing, and very nice cite-while-you-write extensions including Citation Picker for Zotero and Pandoc Citer. It also has version control tools built in for easier merging, and a user-level dictionary where you can put technical words that would originally be flagged by the spell checker (also an extension).
Conclusions and Caveats
So, the main question one might ask is whether this is all worth it. I would still say yes, despite the major headaches and long nights this caused. First off, Word is in my personal opinion, absolutely terrible. Between the difficulty of things as simple as moving tables and images without messing up your paragraphs, the instability with large documents with tons of images, hogging of RAM, and the potential to completely corrupt your entire dissertation thanks to EndNote (literally happened to me the night before my undergrad thesis submission), I would absolutely not recommend it. I believe Scrivener is a pretty great alternative to Word/OpenOffice/Pages in terms of large, modular documents, but it’s not available for Linux, and I’m not sure if it provides the full functionality of this workflow (auto figure/table/equation numbering, Git compatibility, etc.).
All that being said, with the amount of time I spent messing with LaTeX,
I probably could have just as easily written my own Pandoc filters in
Python to emulate some
missing functionality with HTML conversion (\listoffigures
and short
captions in particular), then styled the whole document with CSS.
Prince, which can be used as a Pandoc
pdf-engine, includes CSS styling. I’m not sure if this would calculate
line breaks and image placement as well as Xelatex, but would be worth a
try to prevent fonts and horizontal lines breaking a 200 page document.
To anyone embarking on their dissertation writing journey, I wish you the best, and want to remind you that there is a light at the end of the tunnel!