Shamindra Shrotriya
https://www.shamindras.com/posts.html
quarto-0.9.640Thu, 12 May 2022 04:00:00 GMTSharp constants for finite dimensional normsShamindra Shrotriya
https://www.shamindras.com/posts/2022-05-12-shrotriya2022sharplpnormconsts/index.html

TL;DR

I walk through a cool and possibly less known result for sharp bounds on norms in finite dimensional vector spaces. This result enables working with -norms directly rather than approximating them with with more widely used , and -norms, for example. This exposition is based on (Chapter 2, Wendland 2018)^{1}.

-norm on

Throughout this presentation we will work in the finite dimensional Euclidean space , for some fixed . Moreover we will we will use the -norm on this space, which is defined as follows:

Definition 1 (-norm on ) For a fixed , the -norm as denoted by , is defined as follows:

We note (without proof) that Definition 1 does indeed define a valid norm on . A previous post showed more of these details. A more detailed proof of this can be found in TODO.

Warm up: Relationships between , and -norms

Our goal will be to prove bounds for any . However, we we get our bearings by first shifting focus on the three most commonly used -norms on . These are the , and -norms. The main proposition is as follows:

Proposition 1 (-norm on ) For a fixed , the -norm as denoted by , is defined as follows:

References

Wendland, Holger. 2018. Numerical Linear Algebra. Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge.

Footnotes

Note: The presentation in this post is intentionally verbose. The goal is to give lots of intuition of the key result and its usefulness, and ensure that the proofs are rigorous. It is written with an empathetic mindset to newcomers, and to myself for future reference.↩︎

]]>linear algebramathnormshttps://www.shamindras.com/posts/2022-05-12-shrotriya2022sharplpnormconsts/index.htmlThu, 12 May 2022 04:00:00 GMTCharacterizing norm triangle inequalites via convexityShamindra Shrotriya
https://www.shamindras.com/posts/2021-12-31-shrotriya2021normtriconvexity/index.html

TL;DR

I walk through a cool and possibly less known result connecting convexity and the triangle inequalities for norms. Using this result, typical proofs of the triangle inequality for a proposed norm function are significantly simplified. This exposition is based on (Chapter 3, Robinson 2020)^{1}.

Background - Norms

Normed linear spaces are a natural setting for much applied mathematics and statistics. These are vector spaces, , endowed with a norm function, . Intuitively, norms give us a “yardstick” to measure the “lengths” of individual vectors in the given vector space space. A standard definition of a norm is as follows:

Definition 1 (Norms in vector spaces) For a given vector space , a norm , is a function satisfying the following three properties.

Positive definiteness: For a , if then .

Absolute homogeneity: , for a .

Triangle inequality: , for a .

Remarks

Remark (Derived properties from Definition 1). We note that a norm, per Definition 1, in fact, implies the following properties:

In Definition 1, we can always replace positive definiteness with the stronger claim, namely that In short, we want to show that the reverse implication to positive definiteness always holds, i.e., . To prove this observe that using absolute homogeneity in Definition 1, we have: As required.

We also have that , for a . To see this, observe that for a In effect this means the co-domain can always be changed from to .

Since these can always be derived directly from Definition 1, as shown, we can keep Definition 1 in its minimal form as noted here.

These ideas work for seminorms as well, see here for more details.

Main theorem

Theorem 1 (Characterization of norm triangle inequality) Let , be a function satisfying the following two properties^{2}.

Positive definiteness: For a , if then .

Absolute homogeneity: , for a .

We then have that:

In simple terms, the importance of Theorem 1 (as captured by Equation 1) can be summarized as follows:

Let be a function satisfying positive definiteness and absolute homogeneity. Then satisfies the triangle inequality if and only if the unit ba induced by , i.e., , is a convex set.

Remarks

Remark. In Theorem 1, we note the following:

The function , is a norm-like function, and only becomes a valid norm per Definition 1 once we establish the triangle inequality, i.e., .

To prove the triangle inequality for , the necessary condition of Theorem 1 to establish is: which will imply the triangle inequality for - huzzah!

The nice thing is, proving the convexity of can be much easier to show than trying to prove the triangle inequality property of directly, as we will soon see.

Subtle point: note that here we had to assume that the co-domain of is non-negative (not ), i.e., . This is because in a typical norm, which satisfies the triangle inequality, is always shown to be non-negative (see remark below Definition 1 for more details). Here we impose non-negativity of as an additional constraint to establish the triangle inequality property for . This is not an issue, since one would always first check the non-negativity of a candidate norm-like function .

Applications: Minkowski inequalities

Before getting into the details of the proof, let’s just see what Theorem Theorem 1 can do! We’ll consider two related applications taken from (Lemma 3.6, Example 3.13 Robinson 2020), respectively.

Application 1: -norm triangle inequality in

Example 1 (Minkowski inequality in finite dimensions) Let us consider , where . We then define the norm-like function :

One can show that satisfies positive definiteness and absolute homogeneity. To show that is a norm function we need to prove the triangle inequality. We will use Theorem 1. Let us define . We now need to show that is convex. We will need to use the fact that for each is convex. Let , we then have that for :

It follows that , and so , as required .

In fact, since it satisfies the three conditions for a norm per Definition 1 we can now denote it using the conventional -norm form, i.e.,

Application 2: -norm triangle inequality

We can also similarly prove the triangle inequality norms involving integrals efficiently. This is seen in the next example.

Let us consider . We then define the norm-like function :

Let us define . We now need to show that is convex. Let , we then have that for :

It follows that , and so , as required .

Again, we can now denote using the conventional -norm form, i.e., .

Punchline: what did Theorem 1 buy us?

We just saw that applying Theorem 1 enabled us to write very short proofs of Minkowski’s inequality in and .

To appreciate this approach, note that proving Minkowski’s inequality typicay requires one to first prove Young’s inequality and then Hölder’s inequality. Moreover these need to be done separately in and . Using Theorem 1 aowed us to achieve both of these goals using near identical style of proofs 🎉!

Proof of Theorem 1

Assuming satisfies the two properties in Theorem Theorem 1, we need to prove both implications in Equation Equation 1.

Proof - easy direction

Assume that , for each . Let be arbitrary. We need to show that this implies for each that the expression holds. This implies the convexity of .

Proof

Proof. () We proceed directly.

Assume that , for each . Let be arbitrary. We need to show that this implies for each that the expression holds. This implies the convexity of .

We observe that for our required expression is equal to either or which are both in , by assumption. Now fix . We then note:

Which implies the convexity of , as required.

Proof - interesting direction

Assume is a convex set. We need to show that this implies that , for each .

Proof

Proof. () We proceed directly.

Assume is a convex set. We need to show that this implies that , for each .

Let . We will consider four cases.

Case 1: Let . Then , by absolute homogeneity of . Indeed we then have that , as required.

Case 2: Let . Then , and it follows that , as required.

Case 3: Let . Same as Case 2, with the roles of reversed.

Case 4: Let . It then follows that , since , for each , and . Moreover we then have that and . So we can safely divide by these quantities. Let us then define . We then have by absolute homogeneity of that, . Similarly . Let us denote , and . We then have:

By the assumed convexity of . We then have that . We then observe:

As required.

Recap

In this article we learned the following about Theorem 1:

It gives an alternative way to characterize the triangle inequality for norm-like functions.

Using this characterization we can prove the triangle inequality for such norm-like functions using the convexity of the unit ba induced by such functions.

This is usuay easier since we have lots of tools from convex analysis to help us prove the convexity of .

We saw this in action since Theorem 1 enabled us to write very short proofs of Minkowski’s inequality in and .

In summary, if you have a norm-like function for which you are trying to establish the triangle inequality, try out Theorem 1 💯!

Robinson, James C. 2020. An Introduction to Functional Analysis. Cambridge University Press.

Footnotes

Note: The presentation in this post is intentionay verbose. The goal is to give lots of intuition of the key result and its usefulness, and ensure that the proofs are rigorous. It is written with an empathetic mindset to newcomers, and to myself for future reference.↩︎

We refer to such an satisfying these properties, as a norm-like function.↩︎

]]>linear algebramathnormshttps://www.shamindras.com/posts/2021-12-31-shrotriya2021normtriconvexity/index.htmlSat, 12 Feb 2022 05:00:00 GMTShamindra’s January 2020 RoundupShamindra Shrotriya
https://www.shamindras.com/posts/2020-01-27-shrotriya2020january20roundup/index.html

Introduction

Welcome to the January 2020 roundup! Similar to last time I’m going to experiment with documenting anything interesting I come across (articles, lectures, books, papers etc.) and any activities I get up to. This is more for my personal benefit but may also help others.

This the latest post from the ACLU Tech & Analytics blog. This post explains the key focus of the ACLU analytics team on having clean data pipelines and using testing and assertions to facilitate this process

Using functions like assertthat::noNA(df$source) will return a FALSE if in fact there are no NA values in the df$source column. This seems like a very useful function to use in %>% operations in my pipelines!

Used in combination with assertthat::noNA(df$source) will return the actual observations that have NA values, which is super useful!

These operations are %>% friendly and can be used to verify join operations perform as expected, for example:

Does the join have the same number of rows as the original left-hand table or did the data structure of the right-hand table create new rows?

How much of the right-hand table of the join falls away in the left join?

These checks are notably useful for the ACLU to also check for missingness in their data pipelines. They note:

A helpful check to assess whether missingness grossly misrepresents our results is to quantify the severity of the problem. What level of missingness are we willing to live with?

This definitely seems useful to me, as I use ad-hoc approaches to these same issues e.g. na.omit without doing thorough assertions. Perhaps using this with tidylog will be useful in doing EDA. Let’s try and revisit this.

In the event of well publicized (ready mass hysteria) virus spreads such as the recent Coronavirus outbreak, it is important to listen to level-headed healthcare professionals. In this case it is Prof. Elie Murray, an epidimiologist from Boston University

Since I’m definitely a newbie to understanding this epidemic, I found this to be a very pragmatic guide to help prepare and prevent any further spread of such diseases

In terms of general healthy practices in the outbreak the key takeaways are to:

Wash your hands regularly

Focus on improving the immune system

Try to not catch other infections, and ensure you recover well from any existing infections

Don’t panic

In terms of good practices to do in the event that you are sick, the key takeaways are to:

Stay home and recover

Cover your mouth e.g. sneeze into the inner elbow

Call a medical professional if you or a relative was in Wuhan recently

Seek urgent medical care if you feel really sick

Please read the article on more detail on each of the above points and also to considerations for high risk individuals, a sick family member/friend, and if you are a healthcare responder

Overall, great practical advice! It is great to see statisticians such as Prof. Murray take the lead and address the community at large with their knowledge and expertise, when we live in an era of misinformation

I enjoyed the emphasisis on solving this challenge (like many others) as a community

Teaching

I’m a TA for the STAT 36-350, the undergraduate statiscal computing course at CMU. This is a welcome change of pace from my previous TA assignment taught by Prof. Freeman, with whom I had the pleasure of teaching the course in Spring 2019.

This time we have around 150 students and, as Head-TA, I have the pleasure of managing a motivated team of 9 graduate and undergraduate TAs. Here’s to a wonderful teaching and research semester 💯.

]]>personalrounduprstatsmathhttps://www.shamindras.com/posts/2020-01-27-shrotriya2020january20roundup/index.htmlMon, 27 Jan 2020 05:00:00 GMTShamindra’s September 2019 RoundupShamindra Shrotriya
https://www.shamindras.com/posts/2019-09-19-shrotriya2019september19roundup/index.html

Introduction

Welcome to the September 2019 roundup! Similar to last time I’m going to experiment with, namely documenting anything interesting I come across (articles, lectures, books, papers etc.) and any activities I get up to. This is more for my personal benefit but may also help others.

Article notes that chess grandmasters can burn as much as 6000 calories individually during an intense day of sedentary chess playing!

Initial focus is on Fabiano Caruana, an American grandmaster in chess, and current world No. 2. Caruana has to maintain a very strict diet and exercise routine, particularly during tournaments

Primary reasons for the calorie loss are heavy mental stress of the tournament, constantly thinking of chess, thus leaving limited time to think about and consume food

Interesting quote:

…India’s first grandmaster, Viswanathan Anand, does two hours of cardio each night to tire himself out so he doesn’t dream about chess

Magnus Carlsen, reigning No. 1, for example consulted a professional nutritionist who recommended that he stop drinking orange juice (to avoid sugar spikes) and replace it with a less sugary regular/chocolate milk blend

Carlsen has also optimized sitting. This is quite amazing and something to think about personally as someone who spends many hours daily in front of a screen

Carlsen has also undertaken load management (minimizing competitions participated) to increase the amount of recuperation time between tournaments

In short, there are a lot of parallels to the research life which I undertake, and a lot of useful tips to optimize energy and time spent doing what I enjoy for longer

This is a really insightful interview with Hadley Wickham, a recent COPSS award winner, on the future of the R programming language:

Key Takeaways

Wickham notes that R vs Python language wars are not constructive in moving data science and other fields forward.

I agree wholeheartedly on this and firmly believe in using the best tools for the statistical job at hand. What should matter are more critical aspects like code readability, usability, and reproducibility in light of the given task

Interestingly Wickham notes:

A pattern that I see is that the data science team in a company uses R and the data engineering team uses Python

Wickham also has focused on bridging divides within the R community itself, namely in developing the dtplyr package to convert dplyr code to the alternative data.table package syntax. This is a promising direction ahead where tidyverse and data.table users can collaborate much more easily

There is also a focus on encouraging diversity in R usage and actively developing communities. He asks:

Can we take the R-Ladies model and help other groups that are currently underserved?

Overall it is good that Wickham was recognized recently with the famous COPSS medal in statistics and that the community is embracing software development and design as a key aspect of our profession. It seems that the future is bright for statistics!

This thoughtful presentation on Design at Quora by Rebecca Cox (VP of design at Quora). I’ve summarized what I feel are the key points from this important presentation below.

Key Takeaways

Cox notes that it is a “great time to be a designer” because design has proven again and again to be a clear competitive advantage in tech

She notes her awareness of Quora’s apparent minimalist design interface i.e. dark, red, and text heavy

Cox asks - what is Design? Some say it is the visual style, for some the user interaction, and for others “it begins and ends with the logo”

For Cox, her definition is abstract, and summarized as:

The set of decisions about a product

Not just an interface, logo etc. Designing is about making product decisions

Benefits of this broad decision-driven definition for Quora are:

A clear relationship between the product and the interface i.e. why should a dropdown even exist?

Concentrates attention on where it matters most i.e. company goals

Enables a role within Quora that balances authority and responsibility i.e. Designers should do more than “apply a coat of paint to a feature at the end”

To Cox:

Great design is all the work you don’t ask people who use your products to do

There are a lot of deep direct applications to my statistics research i.e. ensure all theoretical and empirical tools are seamlessly able to be conveyed to end users in science, industry, or academia.

I will be coming back to this over time periodically and reflect if I have undertaken this definition of design and applied it in my work and daily life

Utilizing tidyverse to sort through messy data linkage issues in a consistent framework is well thought out by the team with useful packaged functions created for use by the wider ACLU team

Here both statistics and law are used to tackle a major humanitarian issue i.e. child border separation. This is deeply inspiring and the kind of applied work that I would like to contribute to meaningfully in the future

I particularly appreciated the general data source skepticism showed by the ACLU team. As a statistician it is important to not only explore data but be very skeptical of the source quality i.e. competing legal bodies may not provide the ACLU unbiased data!

Teaching

I’m a TA for the amazing graduate statiscal computing course at CMU. This is an intensive (but very rewarding) programming course designed by CMU statistics professors Alex Reinhart and Chris Genovese. I highly recommend checking out this course website as a general programming reference in daily work/research. I know that I certainly will be 😄.

Welcome to the August 2019 roundup! Similar to last time I’m going to experiment with, namely documenting anything interesting I come across (articles, lectures, books, papers etc.) and any activities I get up to. This is more for my personal benefit but may also help others.

The document is written by the Oren Patashnik, the co-creator of BibTeX (😮) and the co-author of the great book (Graham, Knuth, and Patashnik 1994)

Interesting Books

Fiction

Started reading this fantastic book (Granville, Granville, and Lewis 2019) called the Prime Suspects by Andrew Granville, Jennifer Granville, and Robert J. Lewis (illustrator). Here is a youtube trailer for the book to get you excited!

Key Takeaways

This is essentially an introduction to analytic number theory disguised as a fast-moving graphic novel murder mystery

For any mathematics fans (who isn’t though?) there are lots of funny easter eggs to be found in the frame backgrounds

This a very unique exposition on number theory, a subject in which I have negligible knowledge (like most subjects)

The pedagogy is gentle and yet exciting emphasizing not just mathematics but the importance of communication of mathematical ideas to the wider public i.e. a meta novel if you like

I hope to see more books in this mathematical graphic novel genre. The last one I read (and know of) is Logicomix (Doxiadis et al. 2009) which I highly recommend as well for anyone wishing to venture into the mysteries of infinity!

Interesting Interviews

Really enjoyed this interview with Prof. Noga Alon a leading mathematician specializing in combinatorics, graph theory etc. Fantastic insights into the life of a (leading) theoretical researcher and certainly someone for me to look up to and learn from. I enjoyed the fact that it was so brief but dense as I normally don’t have much time for podcasts. I may check out a few more scientific based interviews from this Career Yoga podcast going forward.

Key Takeaways

This 20min interview is really part of a podcast on careers, and the focus here being the lessons/insights learned from a very successful career in theoretical mathematical research from Prof. Alon

Prof. Alon appears to be extremely organized in all aspects of life, including packing for trips. Was asked specifically not to prepare for this interview 😄

I’ve paraphrased my summaries below, any transcription errors are mine. Please listen to the original interview.

Question: What does it mean to be a mathematician?

To mostly think about mathematical problems, there are many mathematical problems that have rich history, many are interesting in their own sense. It means to ask the right questions, think about interesting questions and tell the difference between what is beautiful and what is not beautiful

Question: What does your day look like?

Many procedural things - teaching, meeting graduate students, reading mathematical papers.

Part of the time I’m just thinking! Sometimes with other people over chalkboard/whiteboard, looking at a piece of paper on the table i.e. trying to do some computations that are relevant, thinking of relationships to the problem. Finding problems that are similar enough.

Question: It is so difficult to grasp the idea of just thinking! Most of the careers are based on the idea of responding to things. Often when visiting you in Princeton it seems you are in a room staring in the air, in another room people another mathematician is staring in the air, it seems like you are not doing anything!

Right, and indeed in much of time you are not doing anything. Most of the time you are failing and you need to get used to to it Part of the satisfaction is this process is to think about something for a very long time without having an idea. Sometimes you solve something related

Question: Do you sometimes try to initiate situations that will inspire you to solve problems

You go to conferences, talk to people, read papers etc. Many times you just need to be in a different state of mind e.g. if you forget someones name, just thinking about it does not always make you remember, just need to try something else at times. This may explain why you see people staring in the air! Sometimes you can go and take walks.

Question: When did you know you wanted to be a mathematician?

As a child, before I was 10 years old I knew I was interested in mathematical puzzles. I was good at it and interested in mathematics but didn’t really know what it involved. I always liked that it is objective. I was able to explain a solution to an adult at a party on the puzzle of the Eurovision song contest. The power of convincing someone is really powerful.

Question: Was the long list of awards you aimed for?

No, it’s nice to get such prizes but never done this with the intent. In every field it is important, but the glory is very limited. This is nice but you don’t do things with this aim in mind

Question: In one of your discoveries, did you ever feel “this was something I was dreaming on and I accomplished that”

I had a few things where I was very happy. Because I had thought about it for a long time and found something new. Don’t think I had a specific time where I sat back and said this is the best discovery of their life. One should always think that the best discoveries are ahead of them.

Question: What do mathematicians do after leaving research?

Some scientists go into scientific management. Some go into industry, but this is rare. As long as what I do is what I find interesting and challenging, and what I do is not as good as my best results I would still keep doing this.

Question: Does being a grandfather and becoming older change your priorities and motivation?

Yes, in general you realize you want to spend time with family, children, and grandchildren. I don’t think it comes instead of science. I hope to keep doing good work and spend time with family.

Personal Blogging

Besides this post 😄 the main things I got up to on the personal blogging front were:

Wrote another fun blogpost on using the tidyverse to reproduce a plot on the survivorship of the Titanic. Always so cool to be able to reproduce such famous plots using modern tools.

Concluding Thoughts

Overall August 2019 was the end of summer and the start of a new year of graduate school - yay!

Please feel free to leave a comment if you found any useful articles, lectures, books, papers etc which I may find interesting.

Graham, Ronald L, Donald E Knuth, and Oren Patashnik. 1994. Concrete mathematics: a foundation for computer science. 2nd ed. Addison-Wesley Publishing Company.

Granville, Andrew, Jennifer Granville, and Robert J. Lewis. 2019. Prime suspects : the anatomy of integers and permutations. Princeton University Press.

]]>personalrounduprstatsmathhttps://www.shamindras.com/posts/2019-09-01-shrotriya2019august19roundup/index.htmlSat, 31 Aug 2019 04:00:00 GMTTidyverse Fun - Part 2Shamindra Shrotriya
https://www.shamindras.com/posts/2019-08-21-shrotriya2019tidyfunpt2/index.html

Task: Generating LaTeX newcommand macros

The central problem

In a custom macro file I needed to generate several sequential newcommand entries of the form:

Where using $\bfa$ produces and using $\bfA$ produces i.e the lowecase/uppercase mathbf commands respectively.

Specifically I needed to construct 52 such combined sequential entries for both lowercase/uppercase letter versions of these newcommand macros. Rather than do this manually, I realized that this would be another fun scripting exercise with using the tidyverse packages glue, purrr, and stringr similar to this similar previous post here.

Goal: Create 52 such lowercase/uppercase newcommand entries and print to the console to directly-copy paste to my macros file.

The tidy approach

First step is to write a function that takes as an input the following:

a single letter (case-sensitive) e.g. "a"

the macro shortcut command prefix you prefer e.g "bf" (for bold font in case you were wondering!)

the specific command that we are creating a macro shortcut for i.e. "mathbf" in this case

The function then outputs a single newcommand entry for that lecture i.e \newcommand{\bfa}{\mathbf{a}} in this case. Let’s do it!

Which generates the corresponding mathcal macros for and respectively.

So finally we can generate all 52 letter macros at time by simply replacing c("a", "A") with c(letters, LETTERS) which uses the input lowercase/uppercase letters/LETTERS vectors in base R:

]]>tidyverserstatshttps://www.shamindras.com/posts/2019-08-21-shrotriya2019tidyfunpt2/index.htmlSat, 24 Aug 2019 04:00:00 GMTUpgrading Distill Blog SettingsShamindra Shrotriya
https://www.shamindras.com/posts/2019-07-31-shrotriya2019distillpt2/index.html

Step 0: Introduction

This is a meta blogpost and a second-part in a series to describe how I setup this personal academic blog using the amazing distill package by the RStudio team.

The first part of this meta blogpost series can be found here, where I detailed the steps to setup this blog using Netlify and Google Domains. If you haven’t setup a distill themed blog then you are encouraged to check it out before reading this post.

Fortunately distill comes with easy to configure settings as well see below. I’ve only implemented some of the options available. I should note that the RStudio distill team has already created an excellent distill blog creation tutorial which I thoroughly used and highly recommend to new users to check out.

With that said, here are some key upgrades I made to this blog.

Step 1: Setup Disqus comments

I really wanted to setup some comments system for each blogpost. This way I can learn new tips from readers and find out how to improve posts going forward. I will go with the recommended Disqus comments option from the distill blog. I simply created a Disqus account and selected Get Started. I then clicked the following button to Install Disqus on my site.

I was then presented with the following Disqus site configuration menu. I entered https:://www.shamindras.com/ for my Website Name and manually set my Disqus shortname to be shamindras-distill to be easier to remember and specific to this site, in case I make more websites later on. This Disqus shortname is important to note down (🖊) as we’ll see shortly.

After clicking Create Site in the previous menu I proceeded to select the free plan option by subscribing to the Basic, Free, Ads Supported comments option as seen below:

In terms of implementing Disqus on my site, I clicked on the following button to install Disqus on my site manually:

Before finishing the manual installation of Disqus I ensured that I set the following configuration options. I particularly like setting an opiononated comments policy and selected the Grist Comment Policy:

Finally to ensure that the implementation is completed I added the following line to the _site.yml post using the Disqus shortname set earlier i.e. shamindras-distill and ensuring hidden: true so that the comments are not expanded by default:

We now see the following comments option at the bottom of every post:

You can read more about setting up comments from the official distill blog here

Step 2: Setup Google Analytics tracking

I also wanted to setup basic user viewing tracking for my site. Fortunately distill can be easily configured to work with Google Analytics. In order to set this up I simply created an account for Google Analytics (using my personal gmail account). I then logged in and selected the option to track my website as follows:

Note that I specified the Website Name field to be shamindras-distill. This is indeed the same as the Disqus shortname from earlier but did not have to be. I just did it for consistency and easy reference. I was then given a Google Analytics token and concluded this setup by adding the token to the _site.yml file as follows:

google_analytics:"UA-145015693-1"

You can read more about setting up Google Analytics from the official distill blog here

Step 3: Add Netlify Status Badge

Since Netlify is the web hosting platform for my site (see setup details here). I just logged into my Netlify account and went to my Site Details and obtained the following code from the Status Badges option.

I copy-pasted the above code in at the top of my site README.md file. This let’s me quickly know whether my website is up and running as expected by simply checking out my github page.

Step 4: Add blog post sharing options

It is easy to configure distill to allow for easy sharing of posts using a variety of social media platforms. I allow for twitter, linkedin, pinterest, and facebook. I did this by simply adding the following line in the _site.yml file:

Now the following sharing options appear at the bottom of every post:

I also added in the following lines to _site.yml to ensure that twitter cards are correctly generated when posts are shared on twitter:

twitter:site:"@shamindraas"creator:"@shamindraas"

Step 5: Add Corrections/Change Tracking and RSS feed

I frequently make edits to blogposts and intend to do so going forward. Fortunately distill makes it easy track changes/corrections made to blogposts. I did this by simply adding the site repo url to the _site.yml as follows:

Now the following appears at the bottom of all blogposts:

So users can easily track changes or file any concerns as issues, though hopefully the Disqus comment feature makes this easier for everyone.

Finally it is easy to add an RSS feed for the blog by simply adding the following to _site.yml:

base_url: https://www.shamindras.com/navbar:left:- icon: fa fa-rsshref: index.xml

The critical elements are adding in the base_url fields and adding in fa fa-rss which is derived from the index.xml file. The index.xml file is automatically generated from the index.Rmd when you render the distill blog using the usual command:

rmarkdown::render_site(here::here())

Next Steps

In terms of core distill blog settings, these are the main options that I’m happy to implement for now. For me the next steps are more about customizing my own blog workflow. This will involve setting up utilities to automatically:

Wrap Rmd files to 80 characters for consistency

Quickly delete unused files e.g. DS_Store files on mac

Clear knitr cache for all posts and thoroughly re-render the site

I expect to do this using a combination of R functions/Makefile workflow, but do stay tuned!

Concluding Thoughts

As it can be seen it is quite easy to customize distill for commonly required features. Really great work by the RStudio team in making such customizations so user-friendly 👍.

This is a new feature I’m going to experiment with, namely documenting anything interesting I come across (articles, lectures, books, papers etc.) and any activities I get up to. This is more for my personal benefit but may also help others. Let’s see how this experiment goes!

One nice feature I noticed was to use HTML to center an image

Interesting Books

Non-fiction

Continued reading this fantastic book (Steele 2004) called the Art of Mathematical inequalities by Prof. Michael Steele. This book is simply amazing in that Prof. Steele walks through deriving mathematical inequalities as if you were having a casual chat together using a whiteboard. I’ve been working through through the problems in detail and they are a great challenge! I’ll blog more about this as I finish the book in the August 2019 roundup.

Fiction

Read Celeste Ng’s Everything I Never Told You. Very nicely written debut novel by Ng, though quite dark and brooding in tone. Would not recommend as a pick me up but certainly would for a nice American-Asian character study set in the 1970s.

Interesting Papers

Particularly enjoyed reading this paper on the mathematical description of the carbon cycle (Rothman 2014). Really useful to get a good idea of the simple but powerful use of simple stochastic differential equations used to model climate change processes at various global scales.

Personal Blogging

Besides this post 😄 the main things I got up to on the personal blogging front were:

Overall July 2019 was a somewhat productive summer month. Though things are never quite as productive as I like them to be in graduate school, but always got to keep trying.

I liked documenting this personal July 2019 roundup. Normally I’m not the type to share much about my personal life, but I thought documenting what I get up to gives me more stake in the respective activities and forces me to commit to them enthusiastically. I also like sharing cool stuff that I come across with friends, so dual benefit.

I think I’ll keep these monthly roundups going for a while. Let’s see how this experiment goes when semester starts 😄.

Please post in the comments any questions/feedback you may have or anything interesting resources you came across in July 2019 👍.

In the February 2019 issue of Significance Magazine notably featured a story of the titanic disaster (Friendly, Symanzik, and Onder 2019) and visualization of key survival statistics. As a fan of R and data visualization I enjoyed this article and recommended it to anyone with similar interests. Although the subject is rather tragic, by reading the article I did get a better appreciation of how the information of the crash survivorship was conveyed to the general public through data visualization.

Reproducibility Challenge

Of particular note in the article was the following data visualization poster printed shortly after the tragedy:

I found this to be a very cool data visualization of the survivorship by class, gender, and adulthood. As a statistics graduate student, I care a lot about reproducibility of results not only as a basic check, but to really appreciate the results and more importantly any implicit assumptions behind the results. So this led to the following goal and effectively this blogpost:

Goal: Given the same Titanic survivors data could we recreate a similar looking chart using R and specifically the tidyverse set of tools?

Collecting and cleaning the data

First let’s begin by loading our required data cleaning and plotting packages. First we will load the required libraries needed for the analysis.

The following object is masked from 'package:ggplot2':
last_plot

The following object is masked from 'package:stats':
filter

The following object is masked from 'package:graphics':
layout

In the article the authors cite several resources for collecting the data for this task. Per the article we note that the data is already pre-baked into R and located in datasets::Titanic when R loads, which is convenient 😎.

We can source the data and start cleaning it for our exploration, using the handy clean_names function for column name cleaning and converting various categorical variables (age, sex, survivorship, and passenger class) to factors for easy plotting later.

Looks nice. As you can see, the data cleaning was done in stages where 3 datasets t1, t2, t3 were built up. Essentially by staring at the plot it is clear that plots are split by class i.e. Class, Class etc. This is the cleaned t1 data frame. However there are aggregate versions of these classes at combined Passenger level and Passenger and Crew level which are the t2 and t3 tibbles respectively. Finally we concatenate them together into ttnc_cln and ensure our categorical variables are cast as factors.

Next step - plotting!

Plotting the Data

The main chart object is a barplot by sex and adult status and faceted by passenger class i.e. first class, second class etc. Great, let’s do it!

out_plot<-ttnc_cln%>%ggplot(data =.,
aes(x =new_sex, y =n_sgnd, fill =survived))+geom_bar(stat ="identity")+facet_wrap(~class, ncol =1,
strip.position ="right",
scales ="free_y")+coord_flip()+scale_fill_manual(values=c("#3C4144", "#D2D3D1"))+theme_bw()+theme(panel.background =element_rect(fill ="#969898"),
panel.grid.major =element_blank(),
panel.grid.minor =element_blank(),
axis.title.x =element_blank(),
axis.title.y =element_blank(),
strip.text.y =element_text(angle =360),
legend.position ="none")+scale_y_continuous(breaks=seq(-1500,600,150))+labs(title ='The LOSS of the "TITANIC"',
subtitle =glue::glue("The Results Analyzed and Shown",
'in a special "Sphere" Diagram',
.sep =" "),
caption =glue::glue("Note: The Black color indicates",
"Passengers and Crew NOT SAVED.",
"The White color indicates SAVED.",
.sep =" "))out_plot

Conclusion

Overall looks like the plot was able to be reproduced to a decent level of accuracy

To get the colors to be close to the plot, I simply opened the article online and used the Colorzilla for Chrome addin to select the color manually. This is a really nice tool to use for reproducing colors viewed through a browser

I don’t quite like that the non-survivors here are shown on a negative scale, but this was the quick hack I could perform to get bars flipped for non-survivors vs. survivors

Summary: Overall this was a really fun challenge and I learned a lot about old-school data visualization using the glorius modern tidyverse ecosystem we have at our fingertips. Will do a similar reproducibility challenge again for sure ✌️. Have fun playing around with the above and please post in the comments any questions/feedback you may have 👍.

Acknowledgments

I’d like to thank Salil Shrotriya for creating the preview image for this post. The hex sticker png files were sourced from here.

]]>tidyverserstatsreproducibilityhttps://www.shamindras.com/posts/2019-07-21-shrotriya2019reprtitanic/index.htmlSun, 21 Jul 2019 04:00:00 GMTTidyverse Fun - Part 1Shamindra Shrotriya
https://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1/index.html

Task 1: Generating Oxford Comma Triples

The central problem

Based on a fun conversation with my statistics cohort over dinner we got to discussing the famous Oxford Comma (or Serial Comma depending on your persuasion). I’ve never really adopted the use but my friends made a compelling argument on it’s apparent general lack of ambiguity when applied appropriately.

We will use the Oxford comma on the famously ambiguous phrase (here used without the Oxford Comma before leaves):

Eats, shoots and leaves

After adding in the Oxford Comma this would become:

Eats, shoots, and leaves

Goal: A fun experiment would be to generate all permutations of this phrase with and without the Oxford Comma using R and specifically the tidyverse packages.

Generating all word-triple permutations the tidy way

Let’s also define our unique global word values used to construct the required phrases:

WORD_VALS <-c("eats", "shoots", "leaves")

Generate all unique 3-word permutations without replacement from the three unique words. We’ll create a helper function to check that a vector of words is unique.

We can now simply generate every possible triple with replacement using the tidyr::crossing function. We proceed to filter these triples for unique triples using our is_unq_perm helper function applied row-by-row using purrr::pmap_lgl. The _lgl simply returns a TRUE/FALSE logical value as intended by the applied function.

Great - that part is done! Now we just need to generate for each triple of words an oxford comma and non-oxford comma version. This is done easily using the amazing glue package as seen below:

exprs <- all_perms %>%mutate(non_oxford_comma =glue_data(.x = .,"{word1}, {word2} and {word3}"),oxford_comma =glue_data(.x = .,"{word1}, {word2}, and {word3}")) %>%select(non_oxford_comma, oxford_comma)

We can display the side-by-side output of the Non-Oxford Comma vs. Oxford comma for the generated triples as follows:

# Display output in a nice centered tableexprs %>%kable(x = .,align ='c',col.names =c("Non-Oxford Comma","Oxford Comma"))

Non-Oxford Comma

Oxford Comma

eats, leaves and shoots

eats, leaves, and shoots

eats, shoots and leaves

eats, shoots, and leaves

leaves, eats and shoots

leaves, eats, and shoots

leaves, shoots and eats

leaves, shoots, and eats

shoots, eats and leaves

shoots, eats, and leaves

shoots, leaves and eats

shoots, leaves, and eats

So there you have it. Have fun generating your own version of Oxford Comma triples to engage in civil discussions with your fellow grammar focused friends 😄.

As it can be seen the lectures are numbered sequentially and change in the main BibTeX id, the title, and the url field.

Specifically I needed to construct 30 such sequential entries for lectures 1-30. Rather than do this manually, I realized that this would be fun scripting exercise with using the tidyverse packages glue, purrr, and stringr.

Goal: Create 30 such BibTeX entries and print to the console to directly-copy paste to my BibTeX file.

The tidy approach

First step is to write a function that takes a lecture number (integer) as an input and then outputs a single BibTeX entry for that lecture.

# Generate BibTeX entry for a single lecture numberget_lec_bibtex <-function(lec_num){# Get the 2 character padded lecture number i.e. 1 -> "01" lec_num_pad <-str_pad(string = lec_num, width =2,side ="left", pad ="0")# Construct the BibTeX entry out_bbtex_str <-glue("@misc{doe2019_lec<lec_num>, author = {Doe, John}, title = {Lecture Note <lec_num> - STAT10A}, month = {March}, year = {2018}, url = {https://www.hpg/~doe/st10A/lecs/lec<lec_num_pad>.pdf}}",.open ="<",.close =">")return(out_bbtex_str)}

Note that by default glue allows you to substitute input text in between { and } markers. However BibTeX entries already have literal default {} tags that we need to include in our function output. Rather than escaping them the glue package conveniently allows us to change the default opening and closing markers 💯! We simply set these to be angle brackets < > using the .open and .close options above.

Yay - this works as expected! We can now paste into BibTeX as required.

Note that we only created it for lectures 1 and 30 for easy scrolling. But for all lectures we can just replace c(1, 30) with 1:30 in the above code.

Conclusion

This post was for me to document and serve as a guide to automating a couple of fun text-based tasks that I came across in my work (and social life!). Using the tidy framework can be a fun way to solve these tasks (but certainly not the only way in R). Have fun playing around with the above and please post in the comments any questions/feedback you may have 👍.

Stay tuned for more blogposts solving more such tasks.

Acknowledgments

I’d like to thank Salil Shrotriya for creating the preview image for this post. The hex sticker png files were sourced from here.

]]>tidyverserstatshttps://www.shamindras.com/posts/2019-07-15-shrotriya2019tidyfunpt1/index.htmlMon, 15 Jul 2019 04:00:00 GMTSetting up a Distill Blog with NetlifyShamindra Shrotriya
https://www.shamindras.com/posts/2019-07-11-shrotriya2019distillpt1/index.html

Step 0: Introduction

This is a meta blogpost to describe how I setup this personal academic blog. It is based on the relatively new distill package by the RStudio team. The main tools used I used to create this blog are:

The details of how I used the tools used are all noted below in a step-by-step manner.

Importantly, I should note that the RStudio distill team has already created an excellent distill blog creation tutorial which I thoroughly used and highly recommend to new users. I wrote this meta blogpost in my own words so that I can personally remember the details going forward. I also added more details on deployment with Google Domains and Netlify that would hopefully be useful to new R users waiting to deploy a similar distill blog.

Step 1: Create new distill blog repo

I opted to manage my blog versioning using Git/Github. I started by going to my personal github account and create a new repository. I called mine ss_personal_distill_blog and also initialized it with a README.md and included an .gitignore for R since that will the blogging language of choice here 😄. This is shown in the screenshot below.

Once created the repo will appear in github as seen in the following screenshot

Step 2: Clone the repo locally

With the github repo created, I switched locally on my mac to iterm2 terminal and cloned the repo locally using the following command:

Now I simply changed into the newly cloned blog directory by running the following terminal command:

cd ss_personal_distill_blog

I then ran the following terminal command:

tree

This resulted in the following directory structure so far:

.
├── .git
├── .gitignore
└── README.md

Great, now the repo was setup locally. At this stage there is just a simple README.md which will get added to a bit later, but the main focus is to start creating the distill blog within this directory locally.

Step 3: Create the distill blog files

In order to start creating the blog contents I opened up an instance of RStudio from within my new directory on my macbook via the following terminal command:

open-a /Applications/RStudio.app .

With RStudio opened we can now run the following R commands in just the console to create the install the required and distill blog setup packages:

Now we can create our blog using the following 2 commands from the freshly installed distill package using the following command run in the console

distill::create_blog(dir =here::here(),
title ="Shamindra's Shrotriya's blog",
gh_pages =TRUE)

Note that we set gh_pages = TRUE to ensure that we can host this blog on github pages down the line if needed. You can omit this if you don’t want the option to have github pages as your host in the future. I will be using Netlify to host my blog (see below), but it is good to have an additional host option in the future.

My local distill blog directory now looked like this (again after running the tree command in the terminal):

Pretty cool! Note that there is a newly created _posts directory for future blogposts. And there is a directory called docs to store all our processed blogposts later. If we had set gh_pages = FALSE the docs directory would be automatically replaced by a _site directory. More on this point later.

Step 4: Customize the welcome blogpost

So distill already had us underway with a default welcome blogpost contained in the welcome.Rmd file for us. There are a bunch of javascript related files automatically generated in the _posts/welcome/welcome_files directory but these don’t need to be altered by the user. I just needed to modify welcome.Rmd contents per my preference as with any regular Rmd file and click Knit in RStudio to refresh it. We can see this in the _posts directory:

One thing that slightly bothered me is that the default welcome blogpost has no date prefix in the directory. This would be nice to have in order to sort all future blogposts chronologically. I could’ve modified this default welcome blogpost Rmd and directory to include the date prefix manually. For simplicity I opted to delete the default welcome directory altogether and recreated it with the date prefix as I prefer as detailed below.

To delete the default welcome directory, I just ran the following code at my terminal:

rm-rf _posts/welcome

With the default welcome blogpost deleted, I created my own custom welcome blogpost as follows:

distill::create_post(title ="Welcome to Shamindra’s Blog",
author ="Shamindra Shrotriya",
date_prefix =TRUE)

Now the welcome blogpost has this nice date prefix structure since we passed this option as TRUE. Let’s see what the _posts directory looks like now

I modified the contents and then knitted the Rmd file once done to refresh and save the contents.

Now I had locally created my first personalized content, a simple welcome post 😎.

Step 5: Customize your blog layout

Now I needed to customize the blog header banner and setup links and update contents as required.

We will start with the _site.yml contents which controls the page layout. I modified the _site.yml file which contains default metadata settings for the blog to have the following contents:

name: "test_distill_blog"
title: "Shamindra's Blog"
description: |
Shamindra Shrotriya's personal blog/ site. Some fun posts
on math, statistics and the PhD student life.
output_dir: "_site"
navbar:
right:
- text: "Home"
href: index.html
- text: "About"
href: about.html
- icon: fa fa-rss
href: index.xml
collections:
posts:
share: [twitter, linkedin]
base_url: https://www.shamindras.com/
output: distill::distill_article

I updated the About.Rmd file as required and knit it. This is a default Rmd that distill conveniently creates this file to give readers some background on the site purpose and of course about the author.

No need to update the default distillIndex.Rmd file that is automatically created. I simply opened it and knit it manually in RStudio to update the site contents.

I also updated the README.md to add some useful information (for any users who stumble onto the github page) and saved it. No need to knit anything here as it is a simple markdown file.

Now in Rstudio I just knit the welcome.Rmd post and also ran the following command in the console

The locally created distill blog was now created and rendered and looked like this in the RStudio viewer pane:

Pretty cool - I now had a working local version of our blog in RStudio.

Step 6: Commit and push changes to github

Note that all our changes are so far in our local git repo. We need to get this blog online! A first step is to commit and push them to our github repo. I did this in my local directory in the terminal as follows: in git at my terminal as follows:

git add -A# Add all new changesgit commit -m"ENH: Created welcome post with date prefix, deleted default post"git push origin master

And the changes are now reflected in the githubmaster branch!

Step 7: Buy a Domain name (optional)

Although the blog was our blog contents are in a public online place i.e. github, I just needed to link it to a service that deploys websites from github. But first I needed to go buy a domain name for my blog. I went to Google Domains and bought www.shamindras.com for about $15/yr.

There are free alternatives e.g. Github Pages, but I wanted to have ownership on my page and found the annual fee to be reasonable with Google Domains.

Step 8: Deploy your website with Netlify

Now that the domain name is bought, I just needed to deploy the newly created blog contents on the registered domain name. Enter netlify! This is a free (and awesome) deployment service. I created a personal account following the intructions on Netlify website.

I then logged in to Netlify and clicked on the green New site from Git button to get started. In the following menu I clicked the Github Continuous Deployment icon:

I then manually searched for my blog repo i.e. ss_personal_distill_blog. Initially this did not appear, so I clicked the green Configure the Netlify app on GitHub link at the bottom and gave Netlify permissions to access this site. This is so Netlify can automatically sync with the github repo and deploy changes going forward as I make them directly to my github blog repo.

I clicked on my site that appears and then ensured that I selected to ensure that the Branch to Deploy option is set to master.

I then clicked Deploy Site and then saw the following deployment settings:

I clicked thee Site Settings button. Looks like my site name on Netlify is goofy-babbage-7f05c8. Cute, though I’ll personalize by clicking the Change Site Name button. I changed it to ss-personal-distill-blog for my easy reference.

I clicked the Build and Deploy button next and after clicking the Edit Settings button modified the Publish directory to be _site as shown below:

This is where all our blogposts in our github repo will be rendered to html by distill once we knit them. Netlify will just pick them up from here everytime you refresh them and deploy our website accordingly

Next I need to manage the domain i.e. tell Netlify to deploy my site on the custom domain I just purchased from Google Domains.

After clicking verify we have the following domains now set, with www.shamindras.com being the primary domain.

Netlify also tells me that the truncated url shamindras.com will also get routed to the blog. So I don’t event need to write the www. going forward. Thanks Netlify 🙇♀.

Step 9: Patiently wait for deployment

With everything setup and configured on github/Netlify the deployment should be near instantaneous. But after about 20 mins my blog appeared at www.shamindras.com. So effectively Netlify and github were now talking to each other and site is setup!

Step 10: Future additions and extras

Now that the blog/site is created there are a number of features I’d like to add. The most important being more blogposts and personal content. However it would also be nice to have the following features:

Documenting a general distill blogging workflow

Setting up Disqus to enable user comments on blogposts

How to setup Blog Gallery for featured posts

How to setup an email subscription service for this blog

How to setup Google Analytics service for basic user activity tracking

I will make sure to document the setup process as part of a series of future blog posts

Concluding Thoughts

If you managed to read this far, then I sincerely thank you. I hope to make even better technical and personal blogposts going forward. Please feel free to leave a friendly comment below for any questions you may have or any feedback for future blogposts.

@online{shrotriya2019,
author = {Shamindra Shrotriya},
title = {Setting up a {Distill} {Blog} with {Netlify}},
date = {2019-07-11},
url = {https://www.shamindras.com/posts/2019-07-11-shrotriya2019distillpt1},
langid = {en}
}

I’d like it to be a fun place to document interesting things I like to read about in the statistics and machine learning space (statistical theory/methodology, research, rstats, python …) as well as anything else I am generally into e.g. books, sports etc.

Feel free to pull up a chair, leave a comment, and join me so that we can explore together.

My parents for encouraging to communicate my passion for statistics. I secretly think that this is their way of minimizing my passionate rants about the bootstrap in our regular Skype chats (the rants will still continue though…).

Acknowledgments

I’d like to thank Salil Shrotriya for taking my profile pic which is the preview image for this post.