Is completely specified by two numbers, mean μ and standard deviation σ.
Back
median (M)
Front
midpoint of a distribution, typical value; in a skewed distribution, the mean is usually farther out
Back
distribution
Front
describes what values the variable takes and how often it takes them
Back
Correlation
Front
Measures the direction and strength of the linear relationship between two quantitative variables.
Back
IQR
Front
measures the range of the middle 50% of the data; IQR=Q3-Q1; resistant
Back
Negative association
Front
Above-average values of one variable tend to accompany below-average values of the other, and vice versa.
Back
describing a distribution of quantitative data
Front
SOCS (Shape-Outlier-Center-Spread)
Back
Density curve
Front
A curve that (a) is always on or above the horizontal axis, and (b) has exactly 1 area underneath it.
Back
outlier
Front
individual value that falls outside the overall pattern; it is an outlier if it is more than 1.5 x IQR above the third quarter or below the first quartile
Back
five-number summary; summary of spread and center
Front
Minimum, Q1, M, Q3, Maximum
Back
Explanatory variable
Front
A variable that may help explain or influences changes in a response variable.
Back
bimodal
Front
two clear peaks
Back
First Quartile (Q1)
Front
one quarter up the list; resistant
Back
skewed to the right
Front
if the right side of the graph with larger values is longer than the left
Back
histogram
Front
plot the counts (frequencies) or percents (relative frequencies) of values in a equal-width classes; show distribution of a quantitative variable
Back
dotplot
Front
individual values on a number line; show distribution of a quantitative variable
Back
relative frequency table
Front
the distribution of a categorical variable lists the categories and gives the percent of individuals that fall in each category
Back
Describing a scatterplot
Front
Can be described by the direction, form, and strength of the relationship.
Back
Pth percentile
Front
The value with P percent of the observations less than it.
Back
Third Quartile (Q3)
Front
three-quarters up the list; resistant
Back
pie charts, bar graphs
Front
display the distribution of a categorical variable
Back
Least-squares regression line
Front
Line that makes the sum of the squared vertical distances of the data points from the line as small as possible.
Back
frequency table
Front
distribution of a categorical variable lists the categories and gives the count of individuals that fall in each category
Back
stemplot
Front
separate each observation into a stem and a one-digit leaf; show distribution of a quantitative variable
Back
two-way table
Front
organizes data about two categorical variables; often used to summarize the large amounts of information by grouping outcomes into categories
Back
statistical question
Front
a question that can be answered by collecting data and where there will be variability in that data
Back
x-bar
Front
the mean of a set of observations/sample (add their values and divide by the number of observations), use for reasonably symmetric distributions
Back
Influential Point
Front
An observation that if removed it would markedly change the result of the calculation.
Back
range
Front
subtract the smallest value from largest value
Back
Standardized values (z-scores)
Front
Tells how many standard deviations a data point is from mean
Back
multimodal
Front
multiple peaks
Back
Standard Normal distribution
Front
Has mean 0 and standard deviation 1
Back
mean
Front
arithmetic average, measure of center, NOT RESISTANT MEASURE OF CENTER, average value
Back
standard deviation (s sub-x)
Front
measures the average distance of the observations from their mean; measures spread about the mean, always greater or equal to 0, not resistant, use for reasonably symmetric distributions
Back
Cumulative relative frequency graph
Front
Used to examine location within a distribution. Completed graph shows the accumulating percent of observations
Back
Median of a density curve
Front
Equal areas point
Back
unimodal
Front
single peak
Back
Mean of a density curve
Front
Balance point
Back
boxplot
Front
based on 5 number summary, useful for comparing distributions, shows spread of central half of distribution
Back
marginal distributions
Front
the row totals and column totals
Back
symmetric
Front
the right and left sides of the graph are symmetric
Back
conditional distributions
Front
describes the values of that variable among individuals who have a specific value of another variable. Can be displayed with a SIDE-BY-SIDE BAR GRAPH or a SEGMENTED BAR GRAPH
Back
mean
Front
the average
Back
association
Front
one of the variables tends to occur in common with specific values of the other
Back
mode; modes
Front
most frequent; major peaks
Back
Outlier
Front
An observation that lies outside the overall pattern of the other observations.
Back
numerical summary
Front
should report at least its center and spread, or variability
Back
μ (mu)
Front
a population mean
Back
The 68-95-99.7 Rule
Front
Also known as the "Empirical Rule."
Back
Extrapolation
Front
The use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line.
Back
Section 2
(48 cards)
Lurking variable
Front
A variable that has an important effect on the relationship among the variables in a study but is not one of the explanatory variables studied.
Back
Inference about cause and effect
Front
Using experimental results to draw conclusions about causality
Back
Control group
Front
In an experiment, the group that is administered a placebo treatment (an active treatment) or no treatment; results are compared to the treatment group
Back
Slope
Front
The amount by which y is predicted to change when x increases by one unit.
Back
Factor
Front
An explanatory variable in an experiment
Back
Population
Front
The entire aggregation of individuals from which samples can be drawn
Back
Experimental units
Front
the smallest collection of individuals to which treatments are applied
Back
Bias
Front
Occurs when a study design favors some outcomes over others
Back
Regression line
Front
A line that describes how a response variable y changes as an explanatory variable x changes; also known as "line of best fit"
Back
Confidentiality
Front
Any information gathered about a participant must not be revealed without the participants consent.
Back
Experiment
Front
Deliberately imposes some treatment on individuals to measure their responses. Causality can be inferred if carried out well.
Back
y intercept
Front
The predicted value of y when x = 0.
Back
Voluntary Response Samples
Front
A sample that consists of people who choose themselves by responding. They often over represent people with strong opinions. BIAS
Back
Replication
Front
Enough units in each group so that any difference in the effects of the treatments can be distinguished from chance differences between the groups. Reduces sample variability
Back
Convenience Sample
Front
A sample which consists of members of a population that are easily accessed. Generally leads to bias.
Back
Sample
Front
A relatively small proportion of people who are chosen in a survey so as to be representative of the whole.
Back
Single Blind
Front
a study in which the participants are unaware of whether they are in the control group or the experimental group
Back
Residual plot
Front
The distribution of residuals; helps us assess how well a regression line fits the data.
Back
Response variable
Front
A variable that measures an outcome of a study.
Back
Placebo effect
Front
Experimental results are caused by expectations alone; double blindness is intended to mitigate this effect.
Back
Response Bias
Front
Bias that occurs when the behavior of the respondent or of the interviewer causes inaccurate results
Back
Explanatory Variable
Front
a variable that we think explains or causes changes in the response variable
Back
Positive association
Front
Above-average values of one variable tend to accompany above-average values of the other, and below-average values also tend to occur together.
Back
Undercoverage
Front
Occurs when some groups in the population are left out of the process of choosing the sample
Back
Completely Randomized Design
Front
All experimental units have an equal chance of receiving each of the treatments
Back
Sampling Frame
Front
A list of individuals from whom the sample is drawn
Back
Stratified random sample
Front
A method of sampling that involves dividing your population into homogeneous subgroups and taking a simple random sample in each subgroup. Internally homogeneous and externally heterogeneous.
Back
Random sampling
Front
A sample that fairly represents a population because each member has an equal chance of inclusion
Back
Sampling error
Front
An error that occurs when a sample somehow does not represent the target population due to bad sampling methods and/or undercoverage
Back
Anonymity
Front
Even the researcher cannot link participants to their data
Back
Control
Front
In an experiment, the standard that is used for comparison. Reduces lurking variables!
Back
Randomized block design
Front
Form blocks consisting of individuals that are similar in some way that is important to the response. Random assignment of treatments is then carried out separately within each block.
Back
Census
Front
A study that attempts to collect data from every individual in the population.
Back
Double Blind
Front
This term describes an experiment in which neither the subjects nor the experimenter knows whether a subject is a member of the experimental group or the control group.
Back
Elements of Experimental Design
Front
CONTROL, RANDOM ASSIGNMENT, AND REPLICATION
Back
Nonresponse
Front
When the subjects refuse to cooperate or cannot be reached. This leads to non sampling bias.
Back
Cluster Sample
Front
A sample in which a simple random sample of heterogeneous subgroups of a population is selected. Internally heterogeneous and externally homogeneous.
Back
Predicted value
Front
The value predicted by the regression model; read as "y hat"
Back
Statistically significant
Front
Referring to a correlation, or a difference between two groups, that is larger than would be expected by chance alone.
Back
Scatterplot
Front
Plot that shows the relationship between two quantitative variables measured on the same individuals.
Back
Inference about the population
Front
Using sample data to draw conclusions about the population
Back
Block
Front
A group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments.
Back
Observational study
Front
A study that merely observes conditions of individuals in a population and records information; the population is disturbed as little as possible.
Back
Matched Pair
Front
The most extreme form of blocking. Subjects are matched in pairs as closely as possible and each subject in a pair is randomly assigned to receive one of the treatments.
Back
Residual
Front
The difference between an observed value of the response variable and the value predicted by the regression line.
Back
Confounding
Front
The effect of some variable on the response variable cannot be separated from the effect of the explanatory variable.
Back
Simple Random Sample (SRS)
Front
A sample of size n selected from the population in such a way that each possible sample of size n has an equal chance of being selected.
Back
Random Assignment
Front
Assigning participants to experimental and control conditions by chance, thus minimizing the effects of preexisting differences among those assigned to the different groups.