Monday, 7 October 2013

The Total Area Always Equals 100%

Normal distributions are a family of distributions that have the 

same general shape. They are symmetric with scores more concentrated in the middle than in 

the tails. Normal distributions are sometimes described as bell shaped. 

Here are some examples. Notice that they differ in how spread out they are. But even though 

the shapes are different, the area under each curve is the same. The total area accounted for 

under any curve is 100%. This never changes and is critical to our understanding of how to 

apply the Normal distribution. Because we know that the area under the curve is always 1 

or 100%, we can understand a lot about individual scores and groups of scores to which the 

Normal distribution is applied. We'll need a Normal curve table for this. More on that one, 

shortly. Click images to enlarge.

The Normal curve has other characteristics that are always true. Once again, the fact that we 

can always count on these characteristics provides a good model for understanding numeric 

trends in data. The following are other important characteristics of the Normal curve:

All Normal curves are symmetric around the mean of the distribution. In other words, the left 

half of the Normal curve is a mirror image of the right half.

All Normal curves are unimodal. Because Normal curves are symmetric, the most frequently 

observed score in a Normal distribution— the mode— is the same as the mean.

Since the Normal curves are unimodal and symmetric, the mean, median, and mode of all 

Normal distributions are equal.

All Normal curves are asymptotic to the horizontal axis of the distribution. Scores in a 

Normal distribution descend rapidly as one moves along the horizontal axis from the center 

of the distribution toward the extreme ends of the distribution, but they never actually touch 

it. This is because scores on the Normal curve are continuous and held to describe an infinity 

of observations.

All Normal curves have the same proportions of scores under the curve relative to particular 

locations on the horizontal axis when scores are expressed as areas, percentiles, probabilities, 

etc.

Hey, It's As Natural As Big Feet!

As we discussed in a previous chapter, research in any field must deal with variability. We 

know that too much variability probably means that we have more error in our methods and 

data, whereas less variability is one indication that our methods and data comprise less error. 

So, less variability is good, but there will always be some. Why? Everyone doesn't respond 

the same way to the same medication; different people have different memory abilities; some 

people are taller and some people are shorter. Turns out that variability is natural, as is the 

Normal distribution. In other words, organisms inherit physical and derivative psychological 

traits...well..."Normally."

We take this Normality as a common pattern or "process" of nature and our "observations" 

of it. And eventhough we may not be able to identify all of the factors (we never will, by the 

way) that make up the thing we like to call "intelligence," when we measure this thing in 

large numbers and with proper research methods, we get that nice Normal curve. Go figure!

Consider the phenomenon of Bigfoot or Sasquatch. I know...I know. A fairytale. Right? Like 

there could really be populations of half-human/half-ape creatures that exist in various remote 

locations and are only detectable through their forensic remains. Before we dismiss it too 

quickly, let's try the hard thing. Let's try to argue FOR the existence of Sasquatch based on 

Normality. How could we do this?










As you may know, footprints are the standard stock in 

trade of Sasquatch research, and their sometimes inhuman length assures almost immediate 

measurement, even by first-time witnesses. The process here consists of foot lengths and 

the observations are the measurements of footprints. Foot lengths are going to be affected by 

a lot of factors: Gender of the creature. Family genetics. Nutrition. Surface from which the 

foot lengths were measured--snow, mud, grass, etc. Length of time between the creation of 

the footprint and its measurement. Amount of alcohol consumed by everyone involved. It's 

complicated!

Nonetheless, as can be seen here, a sample of 410 independently collected footprints 

(ostensibly left by a Bigfoot) forms a fairly Normal curve (with frequency plotted on 

the y axis and foot length plotted on the x axis). The Normal distribution overall argues 

compellingly for the existence of Sasquatch as a genuine species, in that production of 

fictitious data over 40 years by hundreds of people independently of each other would likely 

have generated a distribution with many peaks. A further factor that supports the authenticity 

of the data is the fact that foot length, foot width, heel width, and gait are interrelated in a 

logical and cohesive fashion, a congruence not plausible by pure chance.

Hmmmm....very interesting. Are you a true believer, yet? If you want to learn a little more 

about forensic research on the big fella, you can read this research paper.


Why don't we frame this in less cryptozoological 

terms. Let's look at the SAT. The process here consists of the students taking the test, and 

the observations are the students’ scores. Now, my score, for example, is going to be due to a 

whole set of different factors: my IQ, what I had for breakfast, how much I studied the night 

before, how good my teachers are, which butterflies were flapping their wings in Beijing this 

morning, and so on. In short, my score is the result of a whole set of hard-to-predict factors. 

The same with my fellow students. And yet, even though all these factors are hard to predict, 

if you take the scores of a large number of students from a single population, the scores will 

be Normally distributed as you see here. Once again, when we see such a Normal curve in 

our data, we're inclined to think that we're on the right track.

The 68-95-99 Rule

The standard normal curve is a special example of the normal distribution. The height 

of a Normal distribution can be specified mathematically in terms of two population 

parameters: the mean (μ) and the standard deviation (σ). Instead of calculating our curve 

parameters in painstaking, mathematic long hand, we will simply use sample statistics (s 

and x-bar) to estimate the properties or distribution shape of our actual population. In other 

words, we can do some short cutting.

Every time you look at a group of scores (sample of data), you want to be thinking 

about those scores as comprising a shape. Even though you will see data listed in groups 

and columns, underneath every data set is a shape. Whenever we perform statistical analyses, 

we're hoping that this shape comes as close as possible to bell-shaped or Normal. As we 

move along with our discussion, this idea of "shape" will become more concrete.


The distances along the horizontal axis of our curve, when divided into standard deviations, 

will always include the same proportion of the total area: Between -1 and +1 standard 

deviation units lies about 68% of the area. Between -2 and +2 standard deviation units lies 

about 95% of the area. Between -3 and +3 standard deviation units lies about 99% of the area. 

This is true of a standard normal curve whether it is perfectly bell-shaped, a little narrower or 

a little wider. This graphic depicts the approximate 68-95-99 breakdown for a bell-shaped, 

standard normal curve.

This conception of the normal curve starts to become powerful when we "map" it onto 

normally distributed variables. One example of a variable that forms a normal curve is I.Q. 

In this case, we can tell what percentage of people are in any area of the curve. A normal 

distribution of 1000 cases will have 683 (about 68%) people between +/-1 standard deviation,

about 954 (about 95%) people between +/-2 standard deviations, and 997 (about 99%) people 

between +/-3 standard deviations. Only 3 people will be outside 3 standard deviations from 

the mean, if the sample size is 1000. In other words, in a perfectly normal distribution based 

on such data, we would expect only about three people to have I.Q. scores above and below 

the I.Q. scores associated with z scores of +3 and -3.


source:-http://www.mesacc.edu/~derdy61351/230_web_book/module3/normal/

No comments:

Post a Comment