February 25, 2015
Proceedings of the Natural Institute of Science | Volume 2 | HARD 5
How likely are you to encounter snark on Stack Overflow? Part 1: Comparison of statistical software
PNIS Editorial Staff1
1 - PNIS
Note: This conversation was edited for length. The entire exchange can be found here.
Stack Overflow (SO) is an online forum, launched in 2008, where over 3.6 million registered users pose and answer questions that are, for the most part, related to computer programming. As of this writing, there have been over 8.9 million questions asked and SO users are estimated to have answered 92% of them (Mamykina et al. 2011). Alexa currently ranks SO as the 58th most popular website in the world.
To some users, the conversation pictured above between the questioner “F” and the answerer “N” is a typical example of an SO entry. Typical not in that the question was answered and accepted by the original poster, but that the answerer used a certain level of snark bordering on insulting. Instances of rude or hostile answers, as well as accusations of overt censorship on SO appeared quickly after the site went live (e.g., Bishop 2012). In fact, it only took 5 months before SO moderators were labeled as Nazi’s (FSK 2009). The atmosphere became so rancid that the site’s founders Joel Spoolsky and Jeff Atwood proclaimed that the summer of 2012 would be the Summer of Love, with the goal of restoring SO as a “welcoming, friendly place” and hunting out the snark (Spoolsky 2012).
The success of the Summer of Love could be called tepid at best. Not one month afterwards, bloggers continued to call out the hostility on SO (e.g., Hadaway 2012, Sergworks 2012). And judging by this steady stream of blog posts (Mendoweb 2013, Richter 2013, Schreiber 2013, Sammaye 2014) and Quora questions (one, two, three, four, five, six), the hostility seems to have remained.
So how bad is it really at SO? A mega survey was conducted by Spoolsky and Atwood as a prelude to the Summer of Love (the Spring of Inquiry?). Seven thousand randomly selected comments were rated by 20 experienced raters. Overall, 75% of these comments were rated positively (Rogers 2012), with 8% judged as really friendly (that is, rated as “nice” by 75% or more of the raters), and 2.3% judged as really unfriendly (rated as “unfriendly” by 75% or more of the raters) (Spoolsky 2012). So, you are 25% likely to get a neutral or negative comment on your question, and you (or your mother) are 2.3% likely to get called a ‘fucking lamer’.
Our goal in this paper was similar to Spoolsky and Atwood’s, except that we were interested in determining if snark rates differed based on what type of statistical software (e.g., stata, SAS, R, SPSS) the question was regarding. Do certain statistical programs breed a snarkier user?
We focused on 6 of the most popular statistical programs (Muenchen 2012) that employ a command-line interface: MATLAB, R, SAS, SPSS, Stata and Wolfram/Mathematica. We searched for each program tag at SO and randomly selected 100 questions. Then, we judged each question/answer set for two criteria: 1. Was there a snarky/insulting/rude comment or answer to the question? 2. Was the question accepted by the original poster (indicated by a checkmark next to the answer or if the questioner said ‘thanks’ or something similar)?
Results and Discussion
We made a table of our results (Table 1), as well as a redundant figure (Figure 1), although we feel Table 1 is easier to read.
For starters, the SPSS folks are the nicest you will ever want to be around. They had the lowest snark rate at 3%, and if we had kept a score of “Most Positive Encouragements”, they would have somehow reached 111%. A typical entry was reminiscent of the Discover It Card commercials. The next Chicken Soup for the Soul should just be a random collection of Stack Overflow SPSS entries. What we’re trying to say is that they are nice people, probably Canadians.
On the Comcast end of the customer service scale, we have Stata, which ended up with a 14% snark rate, the highest of any of the 6 types of statistical software. Although we didn’t collect data on this, most of the snark seemed to originate from the same user, “N”, quoted at the beginning of this article. However, because N answered so many of the Stata questions, it was unadvisable to remove N’s answers from our analysis. We should also point out that N knows a lot about Stata and that many times N’s answers were accepted by the original poster. We just wish N didn’t have to be so sarcastic.
There didn’t seem to be much of a relationship between rate of snark and rate of accepted answers. Although R users answered the greatest percentage of questions (79%), 10 of those came with snark. Thus, the winner in this analysis should be Wolfram/Mathematica, whose users answered 76/100 questions successfully, while resorting to snark in just 2 of those answers.
As stated above, Spoolsky (2012) found a 2.3% “really unfriendly” rate in their massive data analysis, while our snark rates averaged about 7.8%. This difference could have arisen from a number of differences between the two studies (such as: sample size, using all possible comments vs. using only comments pertaining to statistical software, ratings done by 20 different people vs. ratings done by us, definition of “unfriendly” vs. definition of “snark”, etc.). But the larger point is that there is still snark on SO, and you are about twice as likely to run into snark on SO than you are to be a victim of a crime in the United States.
It may be that there will always be snark and hostility on SO, no matter how many calendar seasons are dedicated to its eradication. But SO will also likely remain an amazing place to get your questions successfully answered for free by professionals, sometimes in the matter of minutes. That might be worth the potential exposure to a little snark.