Tidyverse Fun - Part 1 – Shamindra Shrotriya

Task 1: Generating Oxford Comma Triples

The central problem

Based on a fun conversation with my statistics cohort over dinner we got to discussing the famous Oxford Comma (or Serial Comma depending on your persuasion). I’ve never really adopted the use but my friends made a compelling argument on its apparent general lack of ambiguity when applied appropriately.

We will use the Oxford comma on the famously ambiguous phrase (here used without the Oxford Comma before leaves):

Eats, shoots and leaves

After adding in the Oxford Comma this would become:

Eats, shoots, and leaves

Goal: A fun experiment would be to generate all permutations of this phrase with and without the Oxford Comma using R and specifically the tidyverse packages.

Generating all word-triple permutations the `tidy` way

First, let’s load our required packages.

library(knitr)
library(magrittr)
library(tidyverse)
library(glue)

Let’s also define our unique global word values used to construct the required phrases:

WORD_VALS <- c("eats", "shoots", "leaves")

Generate all unique 3-word permutations without replacement from the three unique words. We’ll create a helper function to check that a vector of words is unique.

is_unq_perm <- function(word1, word2, word3){
    words_vec <- c(word1, word2, word3)
    return(length(words_vec) - length(unique(words_vec)) == 0)
}

We can now simply generate every possible triple with replacement using the tidyr::crossing function. We proceed to filter these 3^3 = 27 triples for unique triples using our is_unq_perm helper function applied row-by-row using purrr::pmap_lgl. The _lgl simply returns a TRUE/FALSE logical value as intended by the applied function.

Note: The tidyr::crossing generates a Cartesian product of all the 3 word triples, very handy

# Generate the unique word-triples
all_perms <- tidyr::crossing(word1 = WORD_VALS,
                             word2 = WORD_VALS,
                             word3 = WORD_VALS) %>%
                mutate(.data = .,
                       is_unq_perm = purrr::pmap_lgl(.l = .,
                                                     is_unq_perm)) %>%
                filter(.data = ., is_unq_perm) %>%
                select(-is_unq_perm)

# Display output in a nice centered table
all_perms %>%
  kable(x = ., align = 'c',
        col.names = c("Word 1",
                      "Word 2",
                      "Word 3"))

Word 1	Word 2	Word 3
eats	leaves	shoots
eats	shoots	leaves
leaves	eats	shoots
leaves	shoots	eats
shoots	eats	leaves
shoots	leaves	eats

Great - that part is done! Now we just need to generate for each triple of words an oxford comma and non-oxford comma version. This is done easily using the amazing glue package as seen below:

exprs <- all_perms %>%
          mutate(non_oxford_comma =
                   glue_data(.x = .,
                             "{word1}, {word2} and {word3}"),
                 oxford_comma =
                   glue_data(.x = .,
                             "{word1}, {word2}, and {word3}")) %>%
          select(non_oxford_comma, oxford_comma)

We can display the side-by-side output of the Non-Oxford Comma vs. Oxford comma for the 6 generated triples as follows:

# Display output in a nice centered table
exprs %>%
  kable(x = .,
        align = 'c',
        col.names = c("Non-Oxford Comma",
                      "Oxford Comma"))

Non-Oxford Comma	Oxford Comma
eats, leaves and shoots	eats, leaves, and shoots
eats, shoots and leaves	eats, shoots, and leaves
leaves, eats and shoots	leaves, eats, and shoots
leaves, shoots and eats	leaves, shoots, and eats
shoots, eats and leaves	shoots, eats, and leaves
shoots, leaves and eats	shoots, leaves, and eats

So there you have it. Have fun generating your own version of Oxford Comma triples to engage in civil discussions with your fellow grammar focused friends 😄.

Task 2: Generating Sequentially Numbered BibTeX Entries

The central problem

In this case I needed to generate several BibTeX entries of the form:

@misc{doe2019_lec1,
author        = {Doe, John},
title         = {Lecture Note 1 - STAT10A},
month         = {March},
year          = {2018},
url           = {https://statschool/~doe/stats10A/Lectures/Lecture01.pdf},
}

As it can be seen the lectures are numbered sequentially and change in the main BibTeX id, the title, and the url field.

Specifically I needed to construct 30 such sequential entries for lectures 1-30. Rather than do this manually, I realized that this would be fun scripting exercise with using the tidyverse packages glue, purrr, and stringr.

Goal: Create 30 such BibTeX entries and print to the console to directly-copy paste to my BibTeX file.

The `tidy` approach

First step is to write a function that takes a lecture number (integer) as an input and then outputs a single BibTeX entry for that lecture.

# Generate BibTeX entry for a single lecture number
get_lec_bibtex <- function(lec_num){
  # Get the 2 character padded lecture number i.e. 1 -> "01"
  lec_num_pad <- str_pad(string = lec_num, width = 2,
                         side = "left", pad = "0")

  # Construct the BibTeX entry
  out_bbtex_str <- glue(
    "@misc{doe2019_lec<lec_num>,
    author = {Doe, John},
    title  = {Lecture Note <lec_num> - STAT10A},
    month  = {March},
    year   = {2018},
    url    = {https://www.hpg/~doe/st10A/lecs/lec<lec_num_pad>.pdf}}",
    .open = "<",
    .close = ">")

  return(out_bbtex_str)
}

Note that by default glue allows you to substitute input text in between { and } markers. However BibTeX entries already have literal default {} tags that we need to include in our function output. Rather than escaping them the glue package conveniently allows us to change the default opening and closing markers 💯! We simply set these to be angle brackets < > using the .open and .close options above.

Note: Luckily we don’t have literal angle brackets in our BibTeX output to deal with here

Let’s just test this out quickly:

lec_no <- 1
get_lec_bibtex(lec_num = lec_no)

@misc{doe2019_lec1,
author = {Doe, John},
title  = {Lecture Note 1 - STAT10A},
month  = {March},
year   = {2018},
url    = {https://www.hpg/~doe/st10A/lecs/lec01.pdf}}

Great - looks like it is working as required with the correct string padding in the lecture number in the pdf filename!

Note: We used the stringr str_pad to convert 1 to "01"

Apply to all lectures using `purrr`

Let’s finish this by creating all the entries using purrr:

lec_nums <- c(1, 30)
lec_nums %>%
  map_chr(.x = ., .f = ~get_lec_bibtex(lec_num = .x)) %>%
  cat(., sep = "\n\n")

@misc{doe2019_lec1,
author = {Doe, John},
title  = {Lecture Note 1 - STAT10A},
month  = {March},
year   = {2018},
url    = {https://www.hpg/~doe/st10A/lecs/lec01.pdf}}

@misc{doe2019_lec30,
author = {Doe, John},
title  = {Lecture Note 30 - STAT10A},
month  = {March},
year   = {2018},
url    = {https://www.hpg/~doe/st10A/lecs/lec30.pdf}}

Yay - this works as expected! We can now paste into BibTeX as required.

Note that we only created it for lectures 1 and 30 for easy scrolling. But for all lectures we can just replace c(1, 30) with 1:30 in the above code.

Conclusion

This post was for me to document and serve as a guide to automating a couple of fun text-based tasks that I came across in my work (and social life!). Using the tidy framework can be a fun way to solve these tasks (but certainly not the only way in R). Have fun playing around with the above and please post in the comments any questions/feedback you may have 👍.

Stay tuned for more blogposts solving more such tasks.

Acknowledgments

I’d like to thank Salil Shrotriya for creating the preview image for this post. The hex sticker png files were sourced from here

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{shrotriya2019,
  author = {Shrotriya, Shamindra},
  title = {Tidyverse {Fun} - {Part} 1},
  date = {2019-07-15},
  url = {https://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1/},
  langid = {en}
}

For attribution, please cite this work as:

Shrotriya, Shamindra. 2019. “Tidyverse Fun - Part 1.” July 15, 2019. https://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1/.

Task 1: Generating Oxford Comma Triples

The central problem

Generating all word-triple permutations the tidy way

Task 2: Generating Sequentially Numbered BibTeX Entries

The central problem

The tidy approach

Apply to all lectures using purrr

Conclusion

Acknowledgments

Reuse

Citation

Generating all word-triple permutations the `tidy` way

The `tidy` approach

Apply to all lectures using `purrr`