Monday, January 21, 2013

Why are gun rights proponents more politically active?


In January of 2013, about a month after the horrific shootings of children in Newtown, Connecticut, the Pew Research Center released a survey of gun-related political leanings of people in America.  They first asked the respondents to identify themselves as either gun rights proponents, or gun control proponents.  They then asked the respondents questions about their political activity: did they contribute money to organizations that took a position on gun policy?  Had they contacted a public official to express an opinion on gun policy?  Had they signed a petition on gun policy?  Etc.  The results indicated that those who prioritized gun rights were 1.7 times more likely to have been politically active (i.e., participated in one or more of these activities) than those who prioritized gun control.  Why should gun rights advocates be almost twice as likely to be politically active than gun control advocates?

To understand this behavior, it is useful to consider how the human brain makes choices when faced with gains and losses. 

In 1990, Kahneman and colleagues performed an experiment in which they selected some participants and gave them a coffee mug as a gift.  They then asked them to assign a minimum price on the mug that they were willing to sell it.  These participants asked for about $7.  They then took another group of participants and showed them the same mug and asked how much they would be willing to pay to own it.  They responded around $3.  Knetsch (1989) found that people who are given a chocolate bar want $1.83 to sell it, but will pay only $0.90 to buy it.  The difference in the two prices is explained by loss aversion: the sellers evaluate the choice of giving up something that they already own by viewing it as a psychological loss.  In order to compensate for that loss, they request a lot of money.  Buyers, on the other hand, evaluate the choice as a psychological gain.  They are willing to pay much less for the pleasure that they perceive in owning it. 

In general, the pleasure that you feel if someone was to give you an item tends to be much less than the pain you feel if you were to own that item and were to lose it.  This is called an endowment effect. 

Carmon and Ariely (2000) explain this behavior by suggesting that when faced with loss of something (e.g., selling), people focus on their sentiment toward surrendering the item (and not the money that they are gaining), whereas when faced with gain of something (e.g., buying), people focus on their sentiment toward what they forgo (typically money, and not the item they are gaining).

Now let us consider the question of why gun rights proponents are more politically active than gun control proponents.  The current political climate is one in which the President and the Congress are considering laws that would limit gun rights.  This is viewed as a loss to gun rights proponents.  In contrast, the same laws are viewed as a gain for gun control advocates. 

The gun rights proponents (but not the gun control proponents) are under the influence of the endowment effect because if the proposed laws are enacted, it would result in a loss of what they already ‘own’.  For them, the proposed laws carry a negative psychological value.  If we could generalize from behavioral economics literature, we would speculate that this negative value is about twice as large as the positive psychological value that would be gained from the perspective of gun control proponents.  This may be the reason why the gun rights proponents are about twice as likely to be politically active as the gun control proponents. 

The deeper idea is that any change from the status quo will meet with much stronger resistance by those who view the change as a loss, as compared to the enthusiasm that it fosters in those who view the change as a gain.

References
Carmon, Z. and Ariely, D. (2000) Focusing on the forgone: How value can appear so different to buyers and sellers.  Journal of Consumer Research 30:15-29.
Kahneman D., Knetsch J., and Thaler R. (1990) Experimental tests of the endowment effect and the coase theorem.  Journal of Political Economy 98:1325-1348.
Knetsch J. (1989) The endowment effect and evidence for nonreversible indifference curves. American Economic Review 79:1277-1284.

Thursday, January 10, 2013

How to find an outlier


How do we know when a data point is an outlier?  Take a look at the figure below.  It represents 15 data points that were gathered in some experiment.  Would you say that the left-most point is an outlier? 


Maybe the instrument that collected this data point had a malfunction, or maybe the subject that produced that data did not follow the instructions.  If we have no other information than the data, how would we decide?

When we say a data point is an outlier, we are saying that it is unlikely that it was generated by the same process that generated the rest of our data.  For example, if we assume that our data was generated by a random process with a Gaussian distribution, then there is only a 0.13% chance that we would collect a data point that is 3 standard deviations from the mean.  So what we need to do is try to estimate the standard deviation of the underlying process that generated the data.  Here I will review two approaches, and then show how successful they are in labeling outliers.

Median Absolute Deviation (MAD)                                          
Hampel (1974) suggested that we begin with finding the median of the data set.  


Next, we make a new data set consisting of the distance (this is a positive number) between each data point and the median.  Finally, we find the median of the new data set.  That is, we compute the following:

MAD = b median( abs(x – median(x) ) )

If we set b=1.4826, then MAD is an estimate of the standard deviation of our data set, assuming that the true underlying data came from a Gaussian distribution.  For our data set above, here is the estimate of the standard deviation, centered on the median:



Based on MAD estimate of the standard deviation, we would say that the left-most data point is indeed more than 3 estimated standard deviations (MADs) from our estimate of the mean (the median). 

So a typical approach is to label as ‘outlier’ a data point that is farther than 3 times the MAD (standard deviation) than the median of the data.  That is, compute the following for each data point:

abs(x – median(x) ) / MAD

Label as ‘outlier’ the data points for which this measure gives you a number greater than 3.   But how good is this method?  To check it, I did the following experiment.  I generated data sets drawn from a normal distribution with a constant mean and standard deviation, and then computed the probability of a false positive, that is, I computed how likely it was that a point would be labeled as outlier by MAD, when in fact it was less than 3 standard deviations from the mean.  Here is the resulting probability, plotted as a function of the data size:


The above plot shows that when the data set is small (say 10 data points), about 20% of the data points that the algorithm picks as outliers are in fact within 3 standard deviations of the mean.  As the data set grows larger, the probability of false positives declines and the algorithm does better.  But even for a data set of size 20, there is better than 15% chance that the bad data point is in fact not bad.


Median Deviation of the Medians (MDM)

Rousseeuw and Croux (1993) suggested a method that, as we will see, is better.  For each data point xi, we find the distance to all other data points and find the resulting median.  We do this for all data points and we get n medians.  Now we find the median of this new data set:

MDM = c median( median( abs(xi –xj) ) )

If we set c=1.1926, then MDM is a robust estimate of the standard deviation of the data set, assuming that the true underlying data came from a Gaussian distribution.  For our data set above, here is the estimate of the standard deviation:


To check how this method compares with MAD, I generated data sets drawn from a normal distribution with a constant mean and standard deviation, and then computed the probability of a false positive, that is, I computed how likely it was that a point would be labeled as outlier by MDM, when in fact it was less than 3 standard deviations from the mean.  Here is the resulting probability, plotted as a function of the data size:


The above plot shows that regardless of the size of the data (here ranging from 6 data points to 20), a data point that MDM labels as an outlier has about 9% chance of being a false positive, i.e., not an outlier.  For small data sets, MDM is two to three times better than MAD.

References
Hampel FR (1974) The influence curve and its role in robust estimation. Journal of American Statistical Association 69:383-393.
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. Journal of American Statistical Association 88:1273-1283.