| Measures of Central Tendency
Of the measures of central tendency, the mean is the most widely
used. It takes every score into account, is the most efficient measure
of central tendency for normal distributions. Because of it's
mathematical characteristics it is possible for statisticians to develop
inferences about means. On the other
hand, the mean is not appropriate for highly skewed distributions and is
less efficient than other measures of central tendency when extreme data
scores are possible.
The median is useful in these situations because its meaning is clear and
gives more
efficient information than the mean in highly-skewed distributions.
The mode can be informative but does not offer significant interpretable
data on its own.
Measures of Variability
Sometimes researchers encounter sets of data
that have the same mean but behave very differently. Variance
can be used to measure the difference between two sets of data. This
measure allows researchers to describe sets of data that have the same
mean but are different. The range
of the data plays a significant role in this interpretation.
Information Model of Variance
The information
model of variance is based on the
count of the one-zero changes, i.e., on the count of the bits of
information contained by a binary variable.
As I understand it, the information model of variance uses an
algorithm to describe the operations which are used to construct a table
of differences among data elements.
The computation of all possible differences between elements of a
variable results in a matrix of differences between subjects of that
variable. Analyzation of this matrix looks at the positive output.
The comparison of the true meaning of variance, which rests in
squaring the deviation scores to obtain their corresponding areas,
appearing to be implausible, although it is geometrically correct (as
expressed through the words of Professor Keating in Dead Poets
Society) and the application of the mechanical
model of variance to qualitative descriptions brought to
light the power of quantitative analysis as it applies to educational
data.
Standardization
By conceptualizing the variables of a data set as
three concentric circles, the core can be reached by mathematically
striping away the layers. "The outermost layer is characterized by
the arithmetic mean, associated with the obtained scores. Once known,
the mean can be removed from the obtained scores by subtraction. By
removing the outermost layer, we obtain the deviation scores. Since the
arithmetic average was peeled from the obtained data, the mean of the
deviation scores is always zero. The statistics most closely associated
with the second layer, the deviation scores, is variance. Once computed,
it can be removed by dividing deviation scores by the standard
deviation, thus removing the second layer." The remaining core is
composed of the standard scores or z-scores.
The z score transformation is especially useful when seeking to compare
the relative standings of items from distributions with different means
and/or different standard deviations.
Normal Distribution
One reason the normal
distribution is important is that many psychological and
educational variables seem to be distributed approximately normally.
(i.e. measures of reading ability, introversion, job satisfaction, and
memory) A second reason the normal distribution is so important is
that it is easy for researchers and statisticians to work with. Many
kinds of statistical tests can be derived for normal distributions.
Fortunately, these tests work very well even if the distribution is only
approximately normally distributed. Some tests work well even with very
wide deviations from normal. If the mean
and standard deviation of a normal distribution are known, it is easy to convert
back and forth from raw scores to percentiles.
Normalization
The text explains that "the term moment is used in mechanics as
a measure of the force of rotation. The strength of this force depends
upon the distance from the point of rotation." Normalization of the
distribution is explained through the mathematical equations named: first
moment of the mean, second
moment of the mean, third
moment of the mean, and fourth
moment of the mean.
Covariance
Covariance
introduces us to the idea of correlations among data. It is an extension
of the concept of variation to the case of two variables. This is a
useful application of statistical analysis to research in education
because it is often the objective of the research to determine if what
is being studied is related to other variables within the area being
researched.
Correlation Analysis
Correlation helped bring quantitative methods to the field of social
sciences. Historically, the social sciences used causality as the
primary explanatory principle of events, and the methodology used was
solely qualitative. Through analyzation of correlation data,
quantitative association between elements opened the social sciences to
a quantitative world. The mathematical formula used is the Pearson
Product-Moment Coefficient of Correlation.
Correlation Interpretations
While the coefficient of covariance has no upper and lower limits,
the coefficient of correlation can vary from positive one to negative
one. A one, weather positive or negative, represents a perfect
relationship where as a zero would represent an absence of relationship.
Interpretation of data should always relate to the circumstances
surrounding the data, however normal coefficients between .00 and .30
are considered negligible, those between .30 and .70 are moderate and
coefficients between .70 and 1.00 are considered high. The graphical representation
of these numbers (often in a scatter plot) give an easy to interpret
picture of the data's relationship.
The Coefficient
of Determination
The
Coefficient
of Alienation
Sampling Theory When applied, statistics help scientists draw conclusions from observations and
experiments, however, the observed relationships or experiments do not guarantee
that the events are truly related in a causal sense. However, by
manipulating only one factor in an experiment (while all others are
constant), the researcher can infer some sense of causality by noting
which change of the factor caused the altered or desired result. These
methods, however, yield an all but confident results. Statistical
inference adds the desired confidence. The sampling distribution
of mean relies on defining the representative population from which the
experiments will be conducted. After the population has been defined and
the mean figured, the distribution of the sampled means approximates the
binomial distribution. The sampling distribution of means serves as a standard by which
researchers can judge a sample mean. The researcher need first identify
the relative position of a sample mean in the theoretical sampling distribution of means.
To form a z ratio, calculate the difference between a random sample mean
and the mean of the population over the standard error of the mean.
One-sample z
Test
Null Hypothesis
Significance
Level
The z Square Ratio
work sample
Estimation of Statistical Significance
Leonhard
Euler established the foundation for the theory of higher transcendental
functions by introducing the beta and gamma
transcendental functions. The t test soon replaced the z test. The
advantage is that t square ratio directly provides the information
about the effect size. Also, the knowledge of the t square ratio
can be easily transferred to learning the properties of the F ratio.
Normal distribution and t distribution are seem to be identical. The
t-test is an analogue of the z-test where the degrees of freedom replace
the n and the t-distribution replaces the normal distribution. This is
represented by
work sample
Single Classification Analysis of Variance
Partitioning
of variance into its components is a central concept of statistical data
analysis. The single classification analysis tests the differences
among two or more independent samples. This method is flexible because
the variance components can be computed for the obtained and deviation
scores, or for the standard scores. Aside of the obtained scores,
deviation scores, and standard scores frameworks, there are two
additional frameworks where data can be partitioned into extended
variance components. An example of single classifcation analysis would
include of a scientific experiment that involves two groups of
subjects, randomly selected and divided into a control and an
experimental group. It would be understood that the subjects have no
relationship to each other and different subjects are used for the
different conditions of the experiment.
work sample
Double Classification Analysis of Variance
Double
Classification Analysis varies from single classification in that it
test the differences among two or more related (or dependent) samples.
An example might include an experiment that involves two different types
of experimental treatments.
The
Worksheet Method in the context, double classification ANOVA means the
classification is taking place in both rows and columns.
Each row represents each subject and the same subject participated in
all the experimental conditions. Thus, the column variable is called the
repeated measures factor with two or more levels (columns) and the
double classification ANOVA is corresponding to one-way repeated
measures ANOVA.
work sample
The Phi Correlation and the Chi Square
Ratio
When both variables to be correlated are binary it is appropriate to use
the Phi coefficient. The visual representation consists of four points
on a scatter plot. However, correlation of the Phi coefficient with the Chi
Square Test offers sound interpretable data. A two-way contingency table analysis
can be conducted to evaluate any relation between variables. The chi
square can be expressed using the square of the phi coefficient as
work sample
Introduction Multiple Regression Analysis
Multiple regression
is used to measure the relationship between one interval dependent variable and several independent variables.
For example, how X1, X2, . .
. Xn relates to Y. The independent variables can predict the dependent variable, but the dependent variable cannot be used to predict the independent variables.
This seems to create a one-way analysis. Therefore, the independent variables selected should have strong correlations with the dependent variable but only weak correlations with other independent variables.
Also, each independent variable has the same relationship with the dependent variable at each value of other independent variables.
Multiple regression is used when the researcher wants to determine what variables contribute to the explanation of the dependent variable and to what degree.
In educational research, multiple regression allows the researcher to ask (and hopefully answer) the general question "what is the best predictor of ...".
For example, a researcher might want to look at the best predictors for achievement
in a course or grade level.
The fundamental equation of the multiple regression analysis is:
The fundamental equation of regression analysis contains two distinct operations. The first operation is the postmultiplication of the transpose of cross-correlations by the inverse of inter-correlations, resulting in the matrix of beta weights B:

The second operation is the premultiplication of the cross-correlations, by the beta weights, resulting in the coefficient of multiple determination

Relative Importance of Predictors Variance
components are used for experimental designs containing random effects
in order to estimate the variance that can be attributed to those
effects. For example, if a researcher was interested in the effect that
the technology integration at different schools, she could select a
sample of schools to estimate the amount of variance that is attributed
to differences between schools. Orthogonal designs are ones in which the independent or predictor variables are uncorrelated in the sample.
They allow for an additive partitioning of the sums of squares accounted for by a model into unique portions associated with each predictor
variable. Or, the method of principal components analysis make correlations
between predictor variables equal to zero. For this reason and for reasons of precision in statistical estimation they are generally preferred when they can be used. ANOVA
Using Multiple Regression Analysis ANOVA is used to uncover the main and interaction effects of categorical independent variables
(factors) on an interval dependent variable. The key statistic in ANOVA is the F-test of difference of group means, testing if the means of the groups formed by values of the independent variable (or combinations of values for multiple independent variables) are different enough not to have occurred by chance. If the group means do not differ significantly then it is inferred that the independent
variables did not have an effect on the dependent variable. If the F test shows that overall the independent
variables are related to the dependent variable, then multiple comparison tests of significance
can be used to explore which value groups of the independent variables have the most to do with the relationship. In
the case presented in the readings, the F test showed no significant
differentiation of nonsense syllables remembered (17%), thus indicating
that the experimental drug did not improve memory. |