normaldistribution

The Total Area Always Equals 100%

Normal distributions are a family of distributions that have the

same general shape. They are symmetric with scores more concentrated in the middle than in

the tails. Normal distributions are sometimes described as bell shaped.

Here are some examples. Notice that they differ in how spread out they are. But even though

the shapes are different, the area under each curve is the same. The total area accounted for

under any curve is 100%. This never changes and is critical to our understanding of how to

apply the Normal distribution. Because we know that the area under the curve is always 1

or 100%, we can understand a lot about individual scores and groups of scores to which the

Normal distribution is applied. We'll need a Normal curve table for this. More on that one,

shortly. Click images to enlarge.

The Normal curve has other characteristics that are always true. Once again, the fact that we

can always count on these characteristics provides a good model for understanding numeric

trends in data. The following are other important characteristics of the Normal curve:

All Normal curves are symmetric around the mean of the distribution. In other words, the left

half of the Normal curve is a mirror image of the right half.

All Normal curves are unimodal. Because Normal curves are symmetric, the most frequently

observed score in a Normal distribution— the mode— is the same as the mean.

Since the Normal curves are unimodal and symmetric, the mean, median, and mode of all

Normal distributions are equal.

All Normal curves are asymptotic to the horizontal axis of the distribution. Scores in a

Normal distribution descend rapidly as one moves along the horizontal axis from the center

of the distribution toward the extreme ends of the distribution, but they never actually touch

it. This is because scores on the Normal curve are continuous and held to describe an infinity

of observations.

All Normal curves have the same proportions of scores under the curve relative to particular

locations on the horizontal axis when scores are expressed as areas, percentiles, probabilities,

etc.

Hey, It's As Natural As Big Feet!

As we discussed in a previous chapter, research in any field must deal with variability. We

know that too much variability probably means that we have more error in our methods and

data, whereas less variability is one indication that our methods and data comprise less error.

So, less variability is good, but there will always be some. Why? Everyone doesn't respond

the same way to the same medication; different people have different memory abilities; some

people are taller and some people are shorter. Turns out that variability is natural, as is the

Normal distribution. In other words, organisms inherit physical and derivative psychological

traits...well..."Normally."

We take this Normality as a common pattern or "process" of nature and our "observations"

of it. And eventhough we may not be able to identify all of the factors (we never will, by the

way) that make up the thing we like to call "intelligence," when we measure this thing in

large numbers and with proper research methods, we get that nice Normal curve. Go figure!

Consider the phenomenon of Bigfoot or Sasquatch. I know...I know. A fairytale. Right? Like

there could really be populations of half-human/half-ape creatures that exist in various remote

locations and are only detectable through their forensic remains. Before we dismiss it too

quickly, let's try the hard thing. Let's try to argue FOR the existence of Sasquatch based on

Normality. How could we do this?

As you may know, footprints are the standard stock in

trade of Sasquatch research, and their sometimes inhuman length assures almost immediate

measurement, even by first-time witnesses. The process here consists of foot lengths and

the observations are the measurements of footprints. Foot lengths are going to be affected by

a lot of factors: Gender of the creature. Family genetics. Nutrition. Surface from which the

foot lengths were measured--snow, mud, grass, etc. Length of time between the creation of

the footprint and its measurement. Amount of alcohol consumed by everyone involved. It's

complicated!

Nonetheless, as can be seen here, a sample of 410 independently collected footprints

(ostensibly left by a Bigfoot) forms a fairly Normal curve (with frequency plotted on

the y axis and foot length plotted on the x axis). The Normal distribution overall argues

compellingly for the existence of Sasquatch as a genuine species, in that production of

fictitious data over 40 years by hundreds of people independently of each other would likely

have generated a distribution with many peaks. A further factor that supports the authenticity

of the data is the fact that foot length, foot width, heel width, and gait are interrelated in a

logical and cohesive fashion, a congruence not plausible by pure chance.

Hmmmm....very interesting. Are you a true believer, yet? If you want to learn a little more

about forensic research on the big fella, you can read this research paper.

Why don't we frame this in less cryptozoological

terms. Let's look at the SAT. The process here consists of the students taking the test, and

the observations are the students’ scores. Now, my score, for example, is going to be due to a

whole set of different factors: my IQ, what I had for breakfast, how much I studied the night

before, how good my teachers are, which butterflies were flapping their wings in Beijing this

morning, and so on. In short, my score is the result of a whole set of hard-to-predict factors.

The same with my fellow students. And yet, even though all these factors are hard to predict,

if you take the scores of a large number of students from a single population, the scores will

be Normally distributed as you see here. Once again, when we see such a Normal curve in

our data, we're inclined to think that we're on the right track.

The 68-95-99 Rule

The standard normal curve is a special example of the normal distribution. The height

of a Normal distribution can be specified mathematically in terms of two population

parameters: the mean (μ) and the standard deviation (σ). Instead of calculating our curve

parameters in painstaking, mathematic long hand, we will simply use sample statistics (s

and x-bar) to estimate the properties or distribution shape of our actual population. In other

words, we can do some short cutting.

Every time you look at a group of scores (sample of data), you want to be thinking

about those scores as comprising a shape. Even though you will see data listed in groups

and columns, underneath every data set is a shape. Whenever we perform statistical analyses,

we're hoping that this shape comes as close as possible to bell-shaped or Normal. As we

move along with our discussion, this idea of "shape" will become more concrete.

The distances along the horizontal axis of our curve, when divided into standard deviations,

will always include the same proportion of the total area: Between -1 and +1 standard

deviation units lies about 68% of the area. Between -2 and +2 standard deviation units lies

about 95% of the area. Between -3 and +3 standard deviation units lies about 99% of the area.

This is true of a standard normal curve whether it is perfectly bell-shaped, a little narrower or

a little wider. This graphic depicts the approximate 68-95-99 breakdown for a bell-shaped,

standard normal curve.

This conception of the normal curve starts to become powerful when we "map" it onto

normally distributed variables. One example of a variable that forms a normal curve is I.Q.

In this case, we can tell what percentage of people are in any area of the curve. A normal

distribution of 1000 cases will have 683 (about 68%) people between +/-1 standard deviation,

about 954 (about 95%) people between +/-2 standard deviations, and 997 (about 99%) people

between +/-3 standard deviations. Only 3 people will be outside 3 standard deviations from

the mean, if the sample size is 1000. In other words, in a perfectly normal distribution based

on such data, we would expect only about three people to have I.Q. scores above and below

the I.Q. scores associated with z scores of +3 and -3.

source:-http://www.mesacc.edu/~derdy61351/230_web_book/module3/normal/

normaldistribution

Monday, 7 October 2013

No comments:

Post a Comment