Bibtex support

It would be great if there was some kind of bibtex support.

I’ve kludged some BibTeX support together with a python script that calls pandoc-citeproc to translate BibTeX to YAML. You could write Hugo templates to generate bibliography stuff directly from the YAML, but it was easier for me to have my script generate markdown files from the YAML and then move those, together with PDF’s where they’re available into directories where Hugo can process them.

See https://github.com/jonathan-g/jonathangilligan.hugo and https://github.com/jonathan-g/jonathangilligan.hugo/blob/master/bibliography/process_bibliography.py

Here is what the result looks like: https://www.jonathangilligan.org/publications/

2 Likes

This is pretty cool. I wonder if it’s technically possible to do this without the intermediate Python scripting.

1 Like

Some great stuff going on with https://github.com/jonathan-g/jonathangilligan.hugo/blob/master/bibliography/process_bibliography.py, but can you please tell which file is the input and which is the output of process_bibliography.py?

I can see process_bibliography.py takes sys.argv[1] as an argument, but what is the usage example please?

Here is the overview of how I automate my online bibliography using Hugo:

I start with a BibTeX file with my publications (I use JabRef to maintain the BibTeX file, but that’s not necessary; any BibTeX file will do).

  1. I also have PDF’s of many of the publications in bibliography/pdfs.

  2. I have a python script “bibliography/process_bibliography.py” which I call with an argument that’s the name of the bibliography file:

    python3 process_bibliography.py jg_pubs.bib
    

    This script:

    1. Calls pandoc-citeproc to translate the BibTeX file into a YAML file(jg_pubs.yml) via pandoc-citeproc
    2. Using the YAML file, the python script generates markdown content for every bibliography entry in the content/ directory. Each content entry has a YAML header with the bibliographic data (publication type, authors, title, date, BibTeX key, journal, volume, pages, etc.) and the body of the markdown file is the abstract of the publication. This is later copied to the Hugo content/publications/ directory.
    3. Copies the .md files from bibliography/content/ to content/publications/ (where Hugo will find them)
    4. Copies pdf files corresponding to each bibliography entry from bibliography/pdfs/*.pdf to static/files/pubs/pdfs/ where Hugo will find them.
  3. The HTML formatting is done by Hugo partials in the hugo-finisterre theme. It uses the markdown files generated by the python script in content/publications/*.md and links (where appropriate) to the full-text PDF files from static/files/pubs/pdfs/*.pdf

    1. layouts/section/publications.html is the main bibliographic list (in reverse-chronological order)

      {{ range { where .Data.Pages "Params.publication" "!=" "").GroupByDate "2006" "desc" }}
      

      sorts the publications in descending (reverse) chronological order and groups them by year.

      1. For each year, I emit an HTML item with the year {{ .Key }}, and then list all the publications for that year:
        {{ range .Pages.ByDate.Reverse }}
        
        goes through the publications for that year, in reverse-chronological order and then for each publication
      2. Print the publication summary, using the partial layouts/partials/pub-summary.html
        <li>{{ partial "pub-summary" . | chomp }}</li>
        
    2. layouts/publications/single.html formats a single publication with abstract (from the markdown file generated by the python script). This file calls the partial layouts/partials/publication.html to format the HTML and it also calls layouts/partials/publication-indexing-metadata.html to generate metadata for search-engine indexing and to allow Zotero to recognize the HTML as a publication so Zotero’s browser add-in will recognize the page as a publication and enable “Save to Zotero”.

    3. layouts/partials/pub-summary.html is a wrapper that calls a bunch of partials in the layouts/partials/pub_fmt directory to format the individual parts of the publication:

      • layouts/partials/pub_fmt/pub-icon.html figures out which icon to use (from fontawesome icons)
      • layouts/partials/pub_fmt/doi.html inserts a badge with a link to the DOI if there’s a “doi” key for the publication
      • layouts/partials/pub_fmt/ssrn.html inserts a badge with a link to SSRN if there’s an “ssrn” key for the publication
      • layouts/partials/pub_fmt/pdf.html inserts a badge with a link to the full-text PDF if there’s a “pdf” key for the publication
      • layouts/partials/pub_fmt/amazon.html inserts a badge with a link to the Amazon product page if there’s an “amazon” key for the publication (i.e., if it’s a book or a book chapter)
      • layouts/partials/pub_fmt/inpress.html inserts a badge indicating that the publication is in press, if the publication’s date is in the future.
      • The main formatting is done by the partial layouts/partials/publication.html. You can edit these partials if you want to change the formatting of the publications. I structure these partials very similarly to the way that formatting functions in BibTeX .bst style files are structured: There is a master partial for each kind of publication, and then there are partials for each part of a publication that would need formatting (e.g., book title, article title, journal title, volume number, page numbers, author list, etc.), and there is a bit of hierarchy, where an author-list partial will call an author partial to format each author’s name.
        • Figure out what kind of publication it is and call the appropriate partial:

          • layouts/partials/pub_fmt/article.html (Journal article)
          • layouts/partials/pub_fmt/book.html (Book)
          • layouts/partials/pub_fmt/chapter.html (Book chapter)
          • layouts/partials/pub_fmt/inproceedings.html (Conference proceedings)
          • layouts/partials/pub_fmt/patent.html (Patent)
          • layouts/partials/pub_fmt/thesis.html (Ph.D. or Masters thesis)
          • layouts/partials/pub_fmt/report.html (Technical report (BibTeX TechReport items))
        • Each of those partials, in turn calls partials in layouts/partials/pub_fmt/ that format the authors, article title, journal title, book title, etc., volume, page numbers, and so forth.

          These partials handle the formatting of specific parts of the publication, and I also use CSS classes to handle some aspects of text formatting. In the future, I plan to move away from using explicit formatting (e.g., “<b> ... </b>” or “<i> ... </i>” tags) and be more thorough about using <span>s with CSS classes to handle more of the textual formatting.

          There are three special partials that are worth noting:

          • layouts/partials/pub_fmt/author_list.html formats an author list using a parameter .cutoff, which defaults to 3: If there are more than .cutoff authors, it abbreviates the author list as First Author et al. It also has a parameter .author.initials, which indicates whether to abbreviate given names to initials.
          • layouts/partials/pub_fmt/author_list_full.html formats an author list just as author_list.html does, but does not abbreviate the list using et al., so it lists all the authors, no matter how many there are.
          • layouts/partials/pub_fmt/et_al.html does the work of checking the length of an author list and abbreviating using et al. if necessary. It also handles commas between author names if necessary, and adds a final “&” before the last author name if the list is not abbreviated using et al.
        • There are also partials beginning with meta_ that insert metadata for indexing the HTML page using schema.org ontology, which are targeted at search engines and suchlike; and also metadata to facilitate importing entries into Zotero. This is a work in progress because of limitations in both Zotero and the Dublin Core ontology. Zotero is working on support for JSON-LD metadata, and I will comprehensively rework this stuff when they do.

This should work with a generic BibTeX file, but I use a bunch of custom (non-standard) entries in the BibTeX:

  • For journal articles, I use the “journaltitle” field instead of “journal” (this is a BibLaTeX-ism)
  • I use the “file” field to point to a PDF. The format is
    file = {full text:pdfs/filename.pdf:PDF}
    
    where “full” indicates that this is the full text, “text:<foo>:PDF” indicates that the full text is in a pdf file to be found at the relative path <foo>. (This is for compatibility with JabRef)
  • Dates are specified in ISO format: YYYY-MM-DD
  • I have several keys I use to link to online sources:
    • doi gives a doi key so I can link to https://doi.org/<doi>
    • ssrn gives an SSRN abstract number so I can link to https://papers.ssrn.com/sol3/papers.cfm?abstract_id=<ssrn>
    • amazon gives an Amazon.com URL for books
  • You can decide whether you want to use non-standard keys like this and adapt the Hugo code correspondingly.
2 Likes

It should be possible to skip the python script by directly running citeproc-pandoc on the BibTeX file to translate it to YAML and then processing the big YAML file using a Data template in Hugo instead of generating separate content Markdown files the way the python script does. But unless someone wants to write a BibTeX parser in go, it will always be necessary to have a separate step that pre-processes the BibTeX file.

1 Like