December 10, 2014

Word count of LaTeX documents

A rough word count of a LaTeX document can be achieved using a combination of the detex and wc command line utilities source

This method has the advantage that it will follow \input and \include commands in the target document. Thus performing word counts on large, multi-source, documents very quickly.

General usage is to use detex to strip all tex markup from a document, then word count the resulting text. For example with the ‘wc’ command line utility:

$ detex MacbethThesis.tex | wc -w
> 31412

Note that this method seems more accurate to the alternative of copy and pasting the contents of the output pdf file into a text editor and word counting that file. This method splits up hyphenated words into two, and counts page numbers etc. As an example, take the following which converts the same document as above to text and word counts the result:

$ pdftotext MacbethThesis.pdf - | wc -w
> 66641