Tidyverse Fun - Part 1

First part in a series of doing useful tasks with the tidyverse. This time, we generate Oxford Comma triples, and sequentially numbered BibTeX entries

tidyverse
rstats
Author

Shamindra Shrotriya

Published

July 15, 2019

Task 1: Generating Oxford Comma Triples

The central problem

Based on a fun conversation with my statistics cohort over dinner we got to discussing the famous Oxford Comma (or Serial Comma depending on your persuasion). I’ve never really adopted the use but my friends made a compelling argument on it’s apparent general lack of ambiguity when applied appropriately.

We will use the Oxford comma on the famously ambiguous phrase (here used without the Oxford Comma before leaves):

Eats, shoots and leaves

After adding in the Oxford Comma this would become:

Eats, shoots, and leaves

Goal: A fun experiment would be to generate all permutations of this phrase with and without the Oxford Comma using R and specifically the tidyverse packages.

Generating all word-triple permutations the tidy way

First, let’s load our required packages.

library(knitr)
library(magrittr)
library(tidyverse)
library(glue)

Let’s also define our unique global word values used to construct the required phrases:

WORD_VALS <- c("eats", "shoots", "leaves")

Generate all unique 3-word permutations without replacement from the three unique words. We’ll create a helper function to check that a vector of words is unique.

is_unq_perm <- function(word1, word2, word3){
    words_vec <- c(word1, word2, word3)
    return(length(words_vec) - length(unique(words_vec)) == 0)
}

We can now simply generate every possible triple with replacement using the tidyr::crossing function. We proceed to filter these \(3^3 = 27\) triples for unique triples using our is_unq_perm helper function applied row-by-row using purrr::pmap_lgl. The _lgl simply returns a TRUE/FALSE logical value as intended by the applied function.

# Generate the unique word-triples
all_perms <- tidyr::crossing(word1 = WORD_VALS,
                             word2 = WORD_VALS,
                             word3 = WORD_VALS) %>%
                mutate(.data = .,
                       is_unq_perm = purrr::pmap_lgl(.l = .,
                                                     is_unq_perm)) %>%
                filter(.data = ., is_unq_perm) %>%
                select(-is_unq_perm)

# Display output in a nice centered table
all_perms %>%
  kable(x = ., align = 'c',
        col.names = c("Word 1",
                      "Word 2",
                      "Word 3"))
Word 1 Word 2 Word 3
eats leaves shoots
eats shoots leaves
leaves eats shoots
leaves shoots eats
shoots eats leaves
shoots leaves eats

Great - that part is done! Now we just need to generate for each triple of words an oxford comma and non-oxford comma version. This is done easily using the amazing glue package as seen below:

exprs <- all_perms %>%
          mutate(non_oxford_comma =
                   glue_data(.x = .,
                             "{word1}, {word2} and {word3}"),
                 oxford_comma =
                   glue_data(.x = .,
                             "{word1}, {word2}, and {word3}")) %>%
          select(non_oxford_comma, oxford_comma)

We can display the side-by-side output of the Non-Oxford Comma vs. Oxford comma for the \(6\) generated triples as follows:

# Display output in a nice centered table
exprs %>%
  kable(x = .,
        align = 'c',
        col.names = c("Non-Oxford Comma",
                      "Oxford Comma"))
Non-Oxford Comma Oxford Comma
eats, leaves and shoots eats, leaves, and shoots
eats, shoots and leaves eats, shoots, and leaves
leaves, eats and shoots leaves, eats, and shoots
leaves, shoots and eats leaves, shoots, and eats
shoots, eats and leaves shoots, eats, and leaves
shoots, leaves and eats shoots, leaves, and eats

So there you have it. Have fun generating your own version of Oxford Comma triples to engage in civil discussions with your fellow grammar focused friends 😄.

Task 2: Generating Sequentially Numbered BibTeX Entries

The central problem

In this case I needed to generate several BibTeX entries of the form:

@misc{doe2019_lec1,
author        = {Doe, John},
title         = {Lecture Note 1 - STAT10A},
month         = {March},
year          = {2018},
url           = {https://statschool/~doe/stats10A/Lectures/Lecture01.pdf},
}

As it can be seen the lectures are numbered sequentially and change in the main BibTeX id, the title, and the url field.

Specifically I needed to construct 30 such sequential entries for lectures 1-30. Rather than do this manually, I realized that this would be fun scripting exercise with using the tidyverse packages glue, purrr, and stringr.

Goal: Create 30 such BibTeX entries and print to the console to directly-copy paste to my BibTeX file.

The tidy approach

First step is to write a function that takes a lecture number (integer) as an input and then outputs a single BibTeX entry for that lecture.

# Generate BibTeX entry for a single lecture number
get_lec_bibtex <- function(lec_num){
  # Get the 2 character padded lecture number i.e. 1 -> "01"
  lec_num_pad <- str_pad(string = lec_num, width = 2,
                         side = "left", pad = "0")

  # Construct the BibTeX entry
  out_bbtex_str <- glue(
    "@misc{doe2019_lec<lec_num>,
    author = {Doe, John},
    title  = {Lecture Note <lec_num> - STAT10A},
    month  = {March},
    year   = {2018},
    url    = {https://www.hpg/~doe/st10A/lecs/lec<lec_num_pad>.pdf}}",
    .open = "<",
    .close = ">")

  return(out_bbtex_str)
}

Note that by default glue allows you to substitute input text in between { and } markers. However BibTeX entries already have literal default {} tags that we need to include in our function output. Rather than escaping them the glue package conveniently allows us to change the default opening and closing markers 💯! We simply set these to be angle brackets < > using the .open and .close options above.

Let’s just test this out quickly:

lec_no <- 1
get_lec_bibtex(lec_num = lec_no)
@misc{doe2019_lec1,
author = {Doe, John},
title  = {Lecture Note 1 - STAT10A},
month  = {March},
year   = {2018},
url    = {https://www.hpg/~doe/st10A/lecs/lec01.pdf}}

Great - looks like it is working as required with the correct string padding in the lecture number in the pdf filename!

Apply to all lectures using purrr

Let’s finish this by creating all the entries using purrr:

lec_nums <- c(1, 30)
lec_nums %>%
  map_chr(.x = ., .f = ~get_lec_bibtex(lec_num = .x)) %>%
  cat(., sep = "\n\n")
@misc{doe2019_lec1,
author = {Doe, John},
title  = {Lecture Note 1 - STAT10A},
month  = {March},
year   = {2018},
url    = {https://www.hpg/~doe/st10A/lecs/lec01.pdf}}

@misc{doe2019_lec30,
author = {Doe, John},
title  = {Lecture Note 30 - STAT10A},
month  = {March},
year   = {2018},
url    = {https://www.hpg/~doe/st10A/lecs/lec30.pdf}}

Yay - this works as expected! We can now paste into BibTeX as required.

Note that we only created it for lectures 1 and 30 for easy scrolling. But for all lectures we can just replace c(1, 30) with 1:30 in the above code.

Conclusion

This post was for me to document and serve as a guide to automating a couple of fun text-based tasks that I came across in my work (and social life!). Using the tidy framework can be a fun way to solve these tasks (but certainly not the only way in R). Have fun playing around with the above and please post in the comments any questions/feedback you may have 👍.

Stay tuned for more blogposts solving more such tasks.

Acknowledgments

I’d like to thank Salil Shrotriya for creating the preview image for this post. The hex sticker png files were sourced from here.

Reuse

Citation

BibTeX citation:
@online{shrotriya2019,
  author = {Shamindra Shrotriya},
  title = {Tidyverse {Fun} - {Part} 1},
  date = {2019-07-15},
  url = {https://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1},
  langid = {en}
}
For attribution, please cite this work as:
Shamindra Shrotriya. 2019. “Tidyverse Fun - Part 1.” July 15, 2019. https://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1.