This post is co-written by Julian.
At Buffer, we’re trying hard to be mindful of diversity within our team, as well as doing as much as we can to improve diversity of people in our industry. In an effort to understand diversity across our team as well as people who apply to work at Buffer, we started started collecting diversity data via questions in an anonymous survey.
Once we had some data trickling in, we starting graphing some of the data and eventually decided to write a dashboard to share this data with ourselves and the world. The result is our very own Diversity Dashboard.
Our content crafter Courtney has written a great post on the story of the dashboard. In this post, we would like to talk about the tech side, how it was built, which technologies we used and some of the things we learned along the way.
Getting the data
All the data was collected using a Wufoo form that was emailed to applicants after they sent in an application. We made sure that the survey was completely anonymous and in no way connected to the job application.
We use Helpscout to manage our hiring inbox, so it was pretty easy to get the survey to applicants with a link in an auto reply. All the Wufoo results get collected into a Google spreadsheet via Zapier. From there we use the ‘Export to Web’ functionality to export the data to a webpage.
Reading the data from a published page into R took a little bit of code, but after a bit of Googling and the help of a pretty cool blog post on the subject we were able to get it working.
Our data and growth teams use R a lot for statistical analysis, data exploration and visualization. So we were excited to try out the Shiny framework to expose the diversity data in an dynamic way on the web. Shiny is a web framework for the R language that let’s you easily embed graphs and data analysis in a web application. It also has built in interactivity, which makes data exploration really fun and easy.
Shiny features a few abstractions on top of HTML and CSS that makes it easier to do common UI layouts and responsive websites. We used the fluidPage and sidebarLayout to quickly create a good looking, responsive webPage.
Shiny makes it easy to use a variety of themes to customize the look and feel of your apps. This is perfect for building dashboards.
This was the first time any of us had used Shiny, but we were able to combine our existing R graphing, analysis and web development skills to create a decent looking dashboard pretty rapidly.
Shiny has some peculiarities to get used to, like a custom DSL for creating your HTML, but once we got used to it, it worked pretty great for creating great looking charts and exposing them on the web.
After the dashboard started taking shape, we looked at setting up production server to run the Shiny web server. Our engineering team have been using Docker internally for a while, mostly to containerize our development environments.
We were excited to try it out with a production server. One of the cool things about Docker is that you can often find great preconfigured Dockerfiles to use as a base for your own containers. We found a great open source Dockerfile to start out.
We use Amazon’s Elastic Beanstalk to host the Buffer web application and it supports running Docker containers, so this felt like a great way to go to host our app. It ended up being really easy to deploy our Dockerized app, just by adding a Dockerrun.aws.json file and uploading the content to ELB.
We also use Fig to run the Dockerized app on our development machines and share code between the local app and the host.
Visualizing the data
The first few graphs we did were pretty straightforward; bar graphs to explore the number of men and women, as well as a breakdown of applicants‘ and teammates‘ ethnicity. We used the ggplot2 plotting system for R to generate all of the graphs in the first version of the dashboard.
We realised it might be great to have these as pie charts as well to help visualize proportions. Here we could make use of Shiny’s interactivity, by providing a radio buttons to switch between graph types.
Since we have a steady stream of data coming in from applicants, we thought it would be great to chart this data over time, which lead to a stacked area graph to show the gender, ethnicity and age ranges of applicants over time.
After playing around with the graphs a bit it felt a little overwhelming to have to pick out specific ethnicities you might be interested in looking at, so we used Shiny interactivity again to add filters. This lets you choose which gender, ethnicity age and area categories you want included in the graphs.
This only feels like the beginning, though; we are still thinking about cool ways to visualize data over time!
Although the graphs give a great overview of the general breakdown of diversity, we thought it might be great to explicitly highlight the most and least diverse areas in terms of gender, ethnicity and age. But how does one go about measuring diversity anyway? Although this felt like it could be quite subjective, we brainstormed a couple of ideas and came up with a simple algorithm that we felt comfortable with.
For each area, we get the number of applicants/teams members who fall into a certain gender/ethnicity or age range. We then get the the standard deviation for all the totals. We use this measure of diversity dispersion of each area – the higher the dispersion, the less diverse we consider it to be. That way we consider the area at Buffer with the highest standard deviation as the least diverse and the area with the lowest standard deviation to be the most diverse.
Using this calculation, we can see the least diversity in terms of gender and age in the people applying to the Development area. When it comes to ethnicity, we see the least diversity in Content.
Right now, we’re still growing our sample size and it’s fully possible that the trends might change over time. We hope that it does and that we see more diversity in all areas over time!
To put everything in context and give a complete picture, we added the data in its raw form as well. The survey allows for describing your ethnicity and gender yourself, and we have categorised these as ‘Self-described’ on the graphs. By looking at the raw data we can drill into these as well.
Since we’re lucky to have a lot of applicants and we’re constantly sending out the survey, we expect the dataset to grow over time. The Buffer team is also growing at a steady pace, so we’ve decided to send out the survey to our teammates every quarter and keep things up to date.
We were interested in using interactive charts from the very beginning of the project, and explored a couple different options before deciding to stick with ggplot2 for the first version of the dashboard.
We thought that it would be important to include interactive plots in the dashboard so that one could ‘drill in’ to see the actual numbers represented in the graphs. This seems particularly useful for the plots displaying data about ethnicity, where a great amount of information could be displayed in a single stacked bar chart.
Initially, we experimented with the Plotly graphing library because of its compatibility with both Shiny and ggplot2, not to mention the just stinkin’ beautiful visualizations!
We found that while the interactive Plotly graphs were beautiful and had some great interactive features, there were a couple of issues creating interactive pie charts and embedding the Plotly graphs in the Shiny app. We decided to take a step back and reflect on our options for generating interactive charts.
We really like the ability to dig into the data and visualize it in different ways, which could help make the charts a bit less overwhelming and a bit more approachable to some. We hope you’ll like it! ?
One of the harder things to do was to choose the right colors in our graphs. We tried to stay away from stereotypical colors for gender and ethnicity, but at the same time wanted to make the graphs look good and easy to understand.
After trying out a few different themes, we settled on a high contrast, colorblind–friendly theme, which felt like a great fit in the theme of inclusivity and accessibility.
By creating the dashboard we’ve learned and validated couple of cool things – some expected and others surprising. We suspected that we might not see as many female applicants in the area of development, which is something we would really like to work on. I was quite surprised to see a lot of ethnicity diversity in data.
The main takeaway: Although the Buffer team is still small, is does seem like we would need work on getting a more diverse group of people to know about us and feel interested in joining us to help us become more diverse as a company.
We’ve decided to open source the code and make it all available on Github. We’re hoping that this could be useful to other companies hoping to perhaps do the same thing. By open sourcing the code, we also hope that you’ll share any ideas or improvements with us. If you make any great changes, please send us a pull request and hopefully we could use it in our dashboard as well.
Since we are quite new to Shiny and building dashboards in R, any feedback on how we could improve will be much appreciated!