## Statistics You Need to Understand

*September 20, 2013 at 1:26 pm* *
Leave a comment *

Whenever I mention statistics, I generally see two types of reactions; either eyes glaze over or register panic. It’s even worse if I use the word *probability*. (So I won’t.) People have an irrational response to that most rational of decision-making tools. While statistics can get very complicated, most people can get by with just a few key statistics.

**What’s your best guess?**

I am not a gambler in my personal life, but as a researcher and consultant, I make predictions all the time based on the numbers. More specifically, I use the information I have to give a best guess. I look at how long it has taken to do a particular type of work and base my proposals on the *average* of past projects of a similar nature. The *average* of a set of numbers is the best guess as to what you’ll get, whether it is ratings or other measurements.

The *mean* and the *median* are the most common measures of the average. If you have a lot of numbers to average (i.e., 30+), they will tend to form a bell curve around the middle. In that case, the* mean* is your best guess. Just add up the numbers and divide the sum by how many numbers you have. If you have fewer numbers, or they are skewed with a few outliers, the *median* is a better bet. To get the median, you order all your numbers from top to bottom (or vice versa) and pick the one in the middle (or split the difference between the two middle numbers if even). If your head is spinning thinking about the math, just put the numbers into a computer spreadsheet and let Excel do it for you.

An even simpler version of best guess is used when your measurement puts people into categories, namely *counts* or *percentages*. We can count the number of people whose earning is below the poverty line and compare it to other income groups. When working with huge numbers or large populations, the size of the numbers makes it hard to picture, so we use percentages to make it easier to understand. If the percentage is greater than 50%, then that category is the best guess, all other things being equal.

**What’s the spread?**

If you were betting on getting a particular result, you would want to know how consistent the numbers are. Are they all tightly clustered around the mean? Or are they spread out? The smaller the spread, the more accurately you can predict the result and the more confidence you have in your prediction. If you are a candidate and your pollster claims to be accurate “plus or minus 3%,” you will be happier than at “plus or minus 8%,” especially if your popularity best guess is 55%.

There are some complicated ways to measure spread, but unless you are doing technical work (in which case numbers don’t panic you and this article is not for you), the one to understand is the *range* between the largest and smallest scores. The smaller the range, the more likely that any future measurement is going to be pretty close to your best guess. The range can also tell you if your scores are skewed in one direction from the median.

**Given what I know, can I predict…?**

When we gather information using numbers (e.g., rating scales), chances are that we want to describe, predict or explain something. The best guess and spread measures describe the results. These can certainly help us predict to some extent, but there are statistics that are designed to measure the degree to which two measures are related (e.g., education and income). In general, the higher the education, the higher the income. The relationship between education and income is positive, but not perfect. There are people who are not well educated, but have used their talents and ambition to become wealthy. And some well-educated people have modest incomes.

Relationships can also be negative, but equally good for prediction purposes. For instance, take IQ and need for social service support. In general, the higher the IQ, the fewer social supports needed. The relationship is not perfect as other factors besides intelligence contribute to service need.

The stronger the relationship between two measures, the more easily you can use one to predict the other. The statistic most often used is *correlation *(usually labeled as *r*). Again, Excel will do the calculations for you. A perfectly predictable positive relationship has an *r* of +1.00 and a perfectly predictable negative relationship has an *r* of -1.00. The closer the *r* is to 0.00, the less helpful one is in predicting the other.

My favourite statistic for prediction is *r ^{2} *because of how well it relates to percentages. Multiply

*r*by itself and the result tells you the proportion (or percentage) of variability in one measure that is accounted for by variation in the other factor. Suppose the relationship between education and income had an

*r*of .7. (It doesn’t, but just suppose.) Then

*r*would be .7 X .7 or .49. This would mean that nearly half the variability in incomes is accounted for by level of education. The larger the

^{2}*r*, the fewer other factors there are that can throw off your ability to predict accurately.

^{2}**Use common sense**

Statistics are tools. Some people have a big toolbox with expensive and complicated tools. Most people manage with a few simple tools. Regardless of which group you fall into, you still need to apply common sense to using the tools you have to best advantage. In short, think about what’s going on that produced the results.

Next time we will look at how you can combine information in numbers and their graphic representation to spot misinterpretations and attempts to mislead you.

Entry filed under: Management, Research. Tags: measurement, Research, Statistics, survey.

Trackback this post | Subscribe to the comments via RSS Feed