REFLECTIONS of
LEARNING

Measures of Central Tendency

Of the measures of central tendency, the mean is the most widely used. It takes every score into account, is the most efficient measure of central tendency for normal distributions. Because of it's mathematical characteristics it is possible for statisticians to develop inferences about means. On the other hand, the mean is not appropriate for highly skewed distributions and is less efficient than other measures of central tendency when extreme data scores are possible. The median is useful in these situations because its meaning is clear and gives more efficient information than the mean in highly-skewed distributions. The mode can be informative but does not offer significant interpretable data on its own.

Measures of Variability

Sometimes researchers encounter sets of data that have the same mean but behave very differently. Variance can be used to measure the difference between two sets of data. This measure allows researchers to describe sets of data that have the same mean but are different. The range of the data plays a significant role in this interpretation. 

Information Model of Variance

The information model of variance is based on the count of the one-zero changes, i.e., on the count of the bits of information contained by a binary variable.  As I understand it, the information model of variance uses an algorithm to describe the operations which are used to construct a table of differences among data elements. The computation of all possible differences between elements of a variable results in a matrix of differences between subjects of that variable. Analyzation of this matrix looks at the positive output.

The comparison of the true meaning of variance, which rests in squaring the deviation scores to obtain their corresponding areas, appearing to be implausible, although it is geometrically correct (as expressed through the words of Professor Keating in Dead Poets Society) and the application of the mechanical model of variance to qualitative descriptions brought to light the power of quantitative analysis as it applies to educational data. 

Standardization

By conceptualizing the variables of a data set as three concentric circles, the core can be reached by mathematically striping away the layers. "The outermost layer is characterized by the arithmetic mean, associated with the obtained scores. Once known, the mean can be removed from the obtained scores by subtraction. By removing the outermost layer, we obtain the deviation scores. Since the arithmetic average was peeled from the obtained data, the mean of the deviation scores is always zero. The statistics most closely associated with the second layer, the deviation scores, is variance. Once computed, it can be removed by dividing deviation scores by the standard deviation, thus removing the second layer." The remaining core is composed of the standard scores or z-scores. The z score transformation is especially useful when seeking to compare the relative standings of items from distributions with different means and/or different standard deviations.

Normal Distribution

One reason the normal distribution is important is that many psychological and educational variables seem to be distributed approximately normally. (i.e. measures of reading ability, introversion, job satisfaction, and memory)  A second reason the normal distribution is so important is that it is easy for researchers and statisticians to work with. Many kinds of statistical tests can be derived for normal distributions. Fortunately, these tests work very well even if the distribution is only approximately normally distributed. Some tests work well even with very wide deviations from normal. If the mean and standard deviation of a normal distribution are known, it is easy to convert back and forth from raw scores to percentiles.

Normalization

The text explains that "the term moment is used in mechanics as a measure of the force of rotation. The strength of this force depends upon the distance from the point of rotation." Normalization of the distribution is explained through the mathematical equations named: first moment of the mean, second moment of the mean, third moment of the mean, and fourth moment of the mean.

Covariance

Covariance introduces us to the idea of correlations among data. It is an extension of the concept of variation to the case of two variables. This is a useful application of statistical analysis to research in education because it is often the objective of the research to determine if what is being studied is related to other variables within the area being researched. 

Correlation Analysis

Correlation helped bring quantitative methods to the field of social sciences. Historically, the social sciences used causality as the primary explanatory principle of events, and the methodology used was solely qualitative. Through analyzation of correlation data, quantitative association between elements opened the social sciences to a quantitative world. The mathematical formula used is the Pearson Product-Moment Coefficient of Correlation.

Correlation Interpretations

While the coefficient of covariance has no upper and lower limits, the coefficient of correlation can vary from positive one to negative one. A one, weather positive or negative, represents a perfect relationship where as a zero would represent an absence of relationship. Interpretation of data should always relate to the circumstances surrounding the data, however normal coefficients between .00 and .30 are considered negligible, those between .30 and .70 are moderate and coefficients between .70 and 1.00 are considered high. The graphical representation of these numbers (often in a scatter plot) give an easy to interpret picture of the data's relationship.
The Coefficient of Determination
The Coefficient of Alienation

Sampling Theory

When applied, statistics help scientists draw conclusions from observations and experiments, however, the observed relationships or experiments do not guarantee that the events are truly related in a causal sense. However, by manipulating only one factor in an experiment (while all others are constant), the researcher can infer some sense of causality by noting which change of the factor caused the altered or desired result. These methods, however, yield an all but confident results. Statistical inference adds the desired confidence. 

The sampling distribution of mean relies on defining the representative population from which the experiments will be conducted. After the population has been defined and the mean figured, the distribution of the sampled means approximates the binomial distribution.

The sampling distribution of means serves as a standard by which researchers can judge a sample mean. The researcher need first identify the relative position of a sample mean in the theoretical sampling distribution of means. To form a z ratio, calculate the difference between a random sample mean and the mean of the population over the standard error of the mean.

One-sample z Test
Null Hypothesis
Significance Level

The z Square Ratio

work sample

Estimation of Statistical Significance

Leonhard Euler established the foundation for the theory of higher transcendental functions by introducing the beta and gamma transcendental functions. The t test soon replaced the z test. The advantage is that  t square ratio directly provides the information about the effect size.  Also, the knowledge of the t square ratio can be easily transferred to learning the properties of the F ratio.  Normal distribution and t distribution are seem to be identical. The t-test is an analogue of the z-test where the degrees of freedom replace the n and the t-distribution replaces the normal distribution. This is represented by

work sample

Single Classification Analysis of Variance

Partitioning of variance into its components is a central concept of statistical data analysis.  The single classification analysis tests the differences among two or more independent samples. This method is flexible because the variance components can be computed for the obtained and deviation scores, or for the standard scores. Aside of the obtained scores, deviation scores, and standard scores frameworks, there are two additional frameworks where data can be partitioned into extended variance components. An example of single classifcation analysis would include  of a scientific experiment that involves two groups of subjects, randomly selected and divided into a control and an experimental group. It would be understood that the subjects have no relationship to each other and different subjects are used for the different conditions of the experiment.

work sample

Double Classification Analysis of Variance

Double Classification Analysis varies from single classification in that it test the differences among two or more related (or dependent) samples. An example might include an experiment that involves two different types of experimental treatments.

The Worksheet Method in the context, double classification ANOVA means the classification is taking place in both rows and columns.   Each row represents each subject and the same subject participated in all the experimental conditions. Thus, the column variable is called the repeated measures factor with two or more levels (columns) and the double classification ANOVA is corresponding to one-way repeated measures ANOVA. 

work sample

The Phi Correlation and the Chi Square Ratio

When both variables to be correlated are binary it is appropriate to use the Phi coefficient. The visual representation consists of four points on a scatter plot. However, correlation of the Phi coefficient with the Chi Square Test offers sound interpretable data. A two-way contingency table analysis can be conducted to evaluate any relation between variables. The chi square can be expressed using the square of the phi coefficient as

   

work sample

Introduction Multiple Regression Analysis

Multiple regression is used to measure the relationship between one interval dependent variable and several independent variables. For example, how X1, X2, . . . Xn relates to Y. The independent variables can predict the dependent variable, but the dependent variable cannot be used to predict the independent variables. This seems to create a one-way analysis. Therefore, the independent variables selected should have strong correlations with the dependent variable but only weak correlations with other independent variables. Also, each independent variable has the same relationship with the dependent variable at each value of other independent variables. 

Multiple regression is used when the researcher wants to determine what variables contribute to the explanation of the dependent variable and to what degree. In educational research, multiple regression allows the researcher to ask (and hopefully answer) the general question "what is the best predictor of ...". For example, a researcher might want to look at the best predictors for achievement in a course or grade level.


The fundamental equation of the multiple regression analysis is:

The fundamental equation of regression analysis contains two distinct operations. The first operation is the postmultiplication of the transpose of cross-correlations by the inverse of inter-correlations, resulting in the matrix of beta weights B:



The second operation is the premultiplication of the cross-correlations, by the beta weights, resulting in the coefficient of multiple determination



Relative Importance of Predictors

Variance components are used for experimental designs containing random effects in order to estimate the variance that can be attributed to those effects. For example, if a researcher was interested in the effect that the technology integration at different schools, she could select a sample of schools to estimate the amount of variance that is attributed to differences between schools.

Orthogonal designs are ones in which the independent or predictor variables are uncorrelated in the sample. They allow for an additive partitioning of the sums of squares accounted for by a model into unique portions associated with each predictor variable. Or, the method of principal components analysis make correlations between predictor variables equal to zero. For this reason and for reasons of precision in statistical estimation they are generally preferred when they can be used. 

ANOVA Using Multiple Regression Analysis

ANOVA  is used to uncover the main and interaction effects of categorical independent variables (factors) on an interval dependent variable. The key statistic in ANOVA is the F-test of difference of group means, testing if the means of the groups formed by values of the independent variable (or combinations of values for multiple independent variables) are different enough not to have occurred by chance. If the group means do not differ significantly then it is inferred that the independent variables did not have an effect on the dependent variable. If the F test shows that overall the independent variables are related to the dependent variable, then multiple comparison tests of significance can be used to explore which value groups of the independent variables have the most to do with the relationship.

In the case presented in the readings, the F test showed no significant differentiation of nonsense syllables remembered (17%), thus indicating that the experimental drug did not improve memory.

 

 
Introduction Reflections of Learning Work Samples
PT3 Home Page Email

© 2001 Mia Kim Williams