Research Methods & Design

Dr. John McLeveyUniversity of Waterloojohn.mclevey@uwaterloo.caFall 2016, Knowledge IntegrationUniversity of Waterloo methods.f2016.slack.com

Course Links

Lecture Slides (Updated Continuously)

Research Proposal Rubric

Research Paper Rubric

Course Calendar File

Introduction

September 8

Methods

Design

Learning Outcomes

correctly use specialized language
formulate empirical research questions and testable hypotheses
operationalize abstract concepts
appropriately evaluate the quality of a sampling design
explain the basic idea and importance of Bayes' theorem
describe the fundamental logic of quantitative and qualitative research methods
describe the challenges and opportunities of systematic evaluation research
use research software (i.e. R and NVivo) for basic data analysis

Course Deliverables

Research Proposal (Oct. 13, 10%)

Presentation of Research Proposal (Oct. 13, 5%)

Empirical Research Papers (Ongoing, 30%)

Presentation of 1 Rule of Social Research (Dec. 7, 40%)

10 Comprehension Quizzes (Ongoing, 5%)

Engagement / Participation (Ongoing, 10%)

Research Proposal

You will write a 2,500 word research proposal (don't waste words!) that presents the initial idea for your final empirical research paper. You may use any of the research methods introduced in this class.

The proposal should discuss:

problem / objectives, answerable research questions literature review, hypotheses the key variables or concepts in your research the population you will focus on the data you will use the methods you will use to analyze your data the contributions that you hope / expect to make a timeline certificate completion of online ethics training module

Proposal Presentation

Each group will give a formal 10-15 minute presentation of their research proposal to the class.

Research Paper

The main deliverable in this course is an empirical research paper. It should build directly on the research proposal submitted earlier in the course, but it is OK to deviate slightly as the project evolves. I expect the final papers to be about 7,000 words, and no more than 7,500 words under any circumstances.

The final paper should include:

title, abstract, keywords, and authors introduction and context theory and hypotheses data and methods results / findings discussion and conclusion

Comprehension Quizzes

theory construction, good research questions
research design
operationalization and measurement, the logic of sampling
survey methodology
causality, Bayes' theorem
indexes, scales, and typologies; descriptive statistics and graphs
fundamentals of inferential statistics
qualitative interviews and data analysis
social networks research
systematic evaluation research

I will drop your lowest grade.

Participation / Engagement

Participation in class activities and labs
Attendance -- scan the unique each class meeting

We will use slack

integ120.f2016.slack.com

XKCD, "Preferred Chat System."

The "Facebook Penalty"

Laptops may be used in the classroom on the honors system. If I see Facebook, email, an IM client other than #slack, a newspaper story, a blog, or any other content not related to the class, I will remove 1 point from your participation grade on the spot. No exceptions.

Books

Required Purchase: Uri Bram Thinking Statistically
Required Purchase: Earl Babbie and Lucia Benaquisto Fundamentals of Social Research
Glenn Firebaugh Seven Rules for Social Research
Janice Aurini, Melanie Heath, and Stephanie Howells The How To of Qualitative Research
Naomi Robbins Creating More Effective Graphs
Stephen Few Show Me the Numbers
Garry Robbins Doing Social Network Research

Computing

You will need a laptop for all classes marked as "computing" in the syllabus. You will need a laptop, a tablet, or a phone for all classes where a quiz is scheduled.

Software

R & NVivo

Datasets

Anything from the list of awesome public datasets Elementary-Secondary Education Survey (ESES) Women in Parliament, 1945-2003: Cross-National Dataset Aid Data: Open Data for International Development Canadian Community Health Survey (CCHS) British Household Panel Survey Public use data from Statscan, including crime statistics Government of Canada Open Data Portal Any of the datasets available from the ICPSR repository or from Dataverse.

Please complete Get to know people and help John make this a good class survey, which is active on LEARN until the coming Tuesday. This is to help us get to know one another, and to help me make this a better course.

I will also complete the survey and you can see my responses. I will discuss the data in class but I will not disclose names. I would like to make parts of your survey available to the rest of the class so that they can get to know you better, but I will only do so with your permission.

Next Class

Babbie and Benaquisto pages 4-21, 31-33, 41-57

Empirical Research & Theory Construction

September 13

Babbie and Benaquisto Chapters 1 and 2

Initial Reactions

Selections from Babbie and Benaquisto"Human Inquiry and Science" & "Paradigms, Theory, and Research"

The Big Picture

Looking for Reality The Foundations of Social Science Some Dialectics of Social Research Paradigms & Two Logical Systems Deductive Theory Construction Inductive Theory Construction Linking Theory & Research

(Photo Credit Luke Peterson) flickr

Ordinary Human Inquiry

Errors in Inquiry and some solutions

Inaccurate observations
- conscious and deliberate observation
Overgeneralization
- use large and representative samples
- replication
Selective observation
- deliberate observation and deviant cases
Illogical reasoning
- conscious attempt to use logical reasoning

Foundations of Social Science

Logic & Observation

theory, not philosophy or belief
social regularities
aggregates, not individuals
a variable language
- attributes and variables
- independent and dependent variables

(Photo Credit James Cridland) flickr

nomothetic and ideographic explanation
inductive and deductive theory
quantitative and qualitative data
pure and applied research

What is a paradigm?

A model or framework for observing and understanding, which shapes both what we see and how we see it. Paradigms are not "right" or "wrong," but theories can be.Babbie and Benaquisto Ch. 2

(Photo Credit Yoppy) flickr

Elements of Theory

axioms or postulates
propositions
hypotheses
variables

Two Logical Systems

Deductive and Inductive

The Traditional Image of Science

Babbie and Benaquisto Fig. 2-2 (pp. 45)

The Wheel of Science

Babbie and Benaquisto Fig. 2-4 (pp. 50)Reproduced from Walter Wallace (2012)

Theory Construction

Inductive & Deductive Approaches

Deductive Theory Building

specify topic specify who / what theory will apply to specify major concepts and theories find out what is know (propositions) about the relationships between those variables reason logically from the propositions to your specific topic

(Photo Credit Sunshinecity) flickr

Inductive Theory Building

...

(Photo Credit Sunshinecity) flickr

Inspiration for Research Agendas

Crafting Good Research Questions

September 15

Firebaugh Chapter 1

Teaching Assistant

Samantha Afonso!

Attendance

Are you here?

Rule 1

There should be the possibility of surprise.

If you already know the answer, why do the research?

Ideally not this kind of surprise.

What are some examples of advocacy research? What makes advocacy research different than social research?

Selecting a Research Question

interesting
- speaks to important problems, ties into debates
- casts a wide enough net
researchable
- what vs. why
- not too specific, not too grand

Common Objections

This is not surprising. Who cares?

What can you do?

(Photo Credit Evan Blaser) Flickr

So, maybe you want to extend prior research...

subpopulation, new population, new time period
causal chains, mechanisms, and mediating variables
introduce non-mediating variables / controls
introduce moderating variables / interactions
check for reverse causation

Some Principles of Sampling

One.

"You don't need to eat the whole ox to know that it is tough."

very large populations do not require larger sample sizes than smaller populations do
we can make confident generalizations about a large population from a sample containing only a tiny fraction of that population

Two.

How cases are selected is more important than how many cases are selected.

Larger samples tend to be better than smaller samples, but that is because they are more likely to be representative. Size is much less important than representativeness.

A representative sample with just 100 people can be remarkably accurate. A non-representative (i.e. biased) sample of a million people can be remarkably inaccurate.

Three.

Select a sample that permits powerful contrasts for the effects of interest.

If you use stratified random sampling (more on this later), stratify on the explanatory variable. You can't explain a variable with a constant! This is especially important for small samples.

Samples in Qualitative Studies

The same principles discussed above apply. Given smaller sample sizes, qualitative researcher must decide which comparisons are strategic and sample accordingly.

(Note: more on sampling in qualitative research when we get to the class on sampling.)

"In its most extreme version, empirical nihilism in the social sciences denies the possibility of discovering even regularities in human behavior. That position is obviously silly. Consider the life insurance industry..."

"The best response to empirical nihilism is to ignore it and the research."

Next Class

Is Thinking Statistically in the bookstore yet?

If yes, read Bram Ch. 1
If not, read Babbie and Benaquisto Ch. 4

Research Design & Causation

September 20

Babbie and Benaquisto Chapter 4

Two Key Questions

What do you want to find out? What is the best way to find out?

3 (Common & Useful) Purposes

Exploration

satisfy curiosity
test feasibility for further research
develop methods for further research

Description

what? where? when? how?

Explanation

why?

Typically, all three are present.

The dominant causal model in the social sciences is probabilistic.

e.g. What factors make attending university more or less likely for some group of people?

In a nomothetic approach:

Isolate a few factors that provide a partial explanation across many cases. You want the greatest amount of explanation with the smallest number of variables.

Criteria for Identifying Causal Relationships

We need more than an observed association. We also need logic.

association, correlation cause comes before effect non-spuriousness

1. Association, Correlation

When one variable changes, so does the other.

... There are few perfect correlations.

Spurious correlations: http://www.tylervigen.com/spurious-correlations

2. Cause Comes Before Effect

Not always clear in cross-sectional studies, but logic helps.

3. Non-Spuriousness

Possibility of explanation by a third variable?

Keep in mind

complete causation is not necessary exceptional cases are not discomfirming does not need to cover the majority of cases

Unit of Analysis & Units of Observation

E.g. individuals, groups, organizations, households, artifacts, etc.

Ecological Fallacy

You can't use an analysis of an ecological unit (e.g. ridings) to draw conclusions about individuals within those units (e.g. voters).

Sometimes you are limited by the data you have, in which case logic is your friend.

Time: Longitudinal Research

cross-sectional studies
trend studies
cohort studies
panel studies

The Time Dimension

Babbie and Benaquisto (pp. )Adapted from Joseph Leoni

Panel studies are the gold standard, but they do have unique problems. The most important is panel attrition. What happens if people drop out of the study? What if there is an underlying pattern in who drops out?

Approximating Longitudinal Research

combination of logic and a cross-sectional design life history interviews

But remember: recall bias and lies

Getting Started with Research Design

The (Idealized) Research Process

Babbie and Benaquisto Fig. 4-4 (pp. 112)

Next Class

Babbie and Benaquisto Ch. 5

Statistics Canada is hiring!

"The Two Recruits: A day in the life of an economist and a sociologist at Statistics Canada"

Conceptualization, Operationalization & Measurement

September 22

Babbie and Benaquisto Chapter 5

Getting Started with Measurement

Careful, deliberate observations of the real world for the purpose of research.

How would you measure:

political party affiliation?
age?
grade point average?
satisfaction with college?
religious affiliation?
religiosity?

Concepts as Constructs

"Conceptions summarize collections of seemingly related observations and experiences."

"Concepts are constructs derived by mutual agreement from mental images (conceptions)" In this sense, concepts aren't "real." But they can still be measured.

Direct Observables
Indirect Observables
Constructs
- theoretical creations based on observations but that cannot be observed directly or indirectly

Conceptualization

"Conceptualization produces a specific agreed-upon meaning for a concept for the purpose of research. This process of specifying exact meaning involved describing the indicators we'll be using the measure our concept and the different aspects of the concept, called dimensions."

At Your Table:

Conceptualize "love." Yes, I'm serious.

Rubin, Zick. 1970. "Measurement of Romantic Love." Journal of Personality and Social Psychology, 16:265-273.

Indicators

Observations that we consider reflections of a concept we wish to study. In other words, indicating the presence or absence of the concept we are interested in.

Dimensions

Aspects of a concept. Religiosity, for example, might be specified in terms of a ritual dimension, a belief dimension, etc. Compassion might have dimensions of compassion for humans and compassion for animals, etc.

Interchangeability of Indicators

We might develop a list of 100 indicators for compassion and its various dimensions. We could then study all of them, or some subset of them. If all of the indicators represent, to some degree, the same concept, then they will behave the way the concept would behave if it were real and could be observed.

Develop appropriate indicators and dimensions for:

religious affiliation
college success
political activity
poverty
binge drinking
fear of crime

Nominal and Operational Definitions

A nominal definition is one that is simply assigned to a term without any claim that the definition represents a "real" entity.

An operational definition specifies precisely how a concept will be measured.

Operationalization is the development of specific research procedures that will result in empirical observations representing those concepts in the real world.

Operationalization Choices

Range of variation

To what extent are we willing to combine attributes into fairly gross categories? For example: income, age.

For research on attitudes and orientations, do you need to collect data on the full spectrum, or just part of it?

Variations between the extremes

To what degree is the operationalization of variables precise?

"If you are not sure how much detail to pursue in a measure, get too much rather than too little. You can always combine precise measures into more general categories, but you cannot create more specific measures from general categories."

One way of thinking about conceptualization and operationalization is as the specification of variables and the attributes composing them.

The attributes composing every variable must be:

exhaustive
mutually exclusive

Four Levels of Measurement

Nominal Measures

attributes have only the characteristic of being jointly exhaustive and mutually exclusive.

Ordinal Measures

attributes can be rank-ordered on some dimension (e.g. high, medium, and low socioeconomic status)

Interval Measures

attributes are rank-ordered and have equal distances between adjacent attributes

Ratio Measures

attributes have all the qualities of nominal, ordinal, and interval measures, and are based on a "true zero" point. (e.g. age, income)

The level of measurement you use is determined by your research goals, and the inherent limitations of some variables. Generally speaking, try to measure at the highest level you can.

Criteria for Measurement Quality

Precision & Accuracy

Precise measures are better than imprecise measures. Accurate measures are better than inaccurate measures.

Reliability

Would a measurement method collect the same data each time in repeated observations of the same phenomenon?

Validity

Does a measure accurately reflect the concept it is intended to measure?

Reliability

Test-Retest Method

Take the same measurement more than once.

Split-Half Method

Split indicators into two groups and see if they classify people differently. Recall: interchangeability of indicators.

Using Established Measures

Use indicators that have proven to be reliable in previous research.

Reliability of Research Workers

Clarity, specificity, training, and practice can help ensure reliability.

Validity

Face Validity

Does it make sense without a lot of explanation?

Criterion-Related Validity / Predictive Validity

The degree to which a measure relates to some external criterion.

Construct Validity

The degree to which a measure relates to other variables as expected within a system of theoretical relationships.

Content Validity

The degree to which a measure covers the full range of meanings included within a concept.

Who decides what is valid?

In most cases, a good researcher should look to both colleagues and research participants as sources of agreement on the most useful meanings and measurements of important concepts. Sometimes one source will be more useful, but neither should be dismissed.

We want our measures to be both reliable and valid, but there is a tension between those two goals. Generally speaking, quantitative, nomothetic, structured techniques tend to be more reliable. Qualitative and ideographic approaches tend to be more valid.

Indexes and Scales

Typically, we need multiple indicators to measure a variable adequately and validly. There are specific techniques for combining multiple indicators into single measures.

Note: We will discuss indexes and scales in more detail later in the course.

Index

A composite measure that summarizes and rank-orders several specific observations to represent some more general dimension.

Scale

A composite measure composed of several items that have a logical or empirical structure among them.

At Your Table:

There are many dimensions to the concept of gender equality. List at least five different dimensions and suggest how you might measure each. It's OK to use different research techniques for measuring the different dimensions.

Next Class

Babbie and Benaquisto Ch. 6, "The Logic of Sampling"

The Logic of Sampling

September 27

There are two general types of samples: non-probability samples and probability samples.

Non-Probability Samples

Relying on Available Subjects

e.g. Stopping people at a street corner. This is extremely risky because you have no control over the representativeness of the sample. Do not to this!

Purposive Sampling

You select people based on your own knowledge of the population. Having a representative sample is not the goal.

Snowball Sampling

Members of a population are difficult to locate / identify. You find a few people, and then ask them to pass along information to people they know.

Quota Sampling

You have a matrix describing key characteristics of the population. You sample people who share the characteristics of each cell in the matrix, trying to assign equal proportions of people who belong to different groups to your sample (e.g.,if you know that 10% of all classics majors are female and international, then you select 10 female international students for a sample of 100 classics majors).

It can be hard to get up to date information about the characteristics of the population, and tend to be high rates of sampling bias.

Selecting Informants

An informant is a member of a group who is willing to share what they know about the group. Informants are not the same as respondents, who are typically answering questions about themselves. Informants are often used in field research.

Why do people use non-probability samples?

In quantitative studies, people use non-probability samples when (a) it's the best that they can do under difficult circumstances, (b) they don't know any better, or (c) they actually don't care about the survey or the data. In qualitative studies, the entire logic of research is different. For one, research strategies tend to be more informed by case study methodology and systematic logical comparisons. Because the intellectual goals are different, different sampling strategies may be used. However, there are always important limitations to keep in mind. We will talk more about this later.

Probability Samples

A probability sample is more likely to be representative of a population than a non-probability sample.

Probability theory enables us to enables us to estimate the parameters of a population from a representative sample.

Representativeness

"That quality of a sample of having the same distribution of characteristics as the population from which it was selected. By implication, descriptions and explanations derived from an analysis of the sample may be assumed to represent similar ones in the population. Representativeness is enhanced by probability sampling and provides for generalizability and the use of inferential statistics."

Parameter

The summary description of a given variable in a population (e.g. mean income, mean age).

Statistic

The summary description of a variable in a sample, used to estimate a population parameter.

The goal is to define a population, produce a sampling frame, and then sample elements (i.e. people) from the frame in a way that contains essentially that same variation that exists in the population. Random selection enhances the likelihood of achieving this.

Figure from the 2008 American edition of Fundamentals of Social Research.

Conscious & Unconscious Sampling Bias

The possibilities of sampling bias are endless, and not always obvious.

Examples?

Surveying university students about their alcohol consumption.What are some possible sources of sampling bias?Among other things, random selection ensures that the procedure is not biased by the researcher.

Random Selection

Every element has an equal chance of being sampled independent of any other event in the selection process. EPSEM: Equal Probability of Selection Method.

Typically, random selection is done using computer programs that randomly select elements.

Random selection also provides access to probability theory, which we can use to estimate population parameters, and to arrive at a judgment of how likely the estimates are to accurately reflect the actual parameters in the population.

Probability, Sampling Distributions, and Estimates of Sampling Error

A single sample selected from a population will give an estimate of the population parameter. Other samples would give the same or slightly different estimates. Probability theory tells us about the distribution of estimates that would be produced by a large number of such samples.

Let's look at a simple example.

Figure from the 2008 American edition of Fundamentals of Social Research.

There are 10 possible samples of 1, and 45 possible samples of 2 (10 choose 2). For the samples of 2, take every possible pair, compute the mean, and then plot it. We can already see that the estimates are starting to converge around the true mean.

Increasing the sample size improves our estimations.

Let's try some slightly larger sample sizes. Remember, we compute and plot the mean for every possible sample.

Sample Size 3 10 choose 3 = 120 possible unique samples

Sample Size 410 choose 4 = 210 possible unique samples

Sample Size 510 choose 5 = 252 possible unique samples

Sample Size 610 choose 6 = 210 possible unique samples

Notice that as we increase the sample sizes, they cluster even more tightly around the true mean. The chance of inaccurate estimates decreases, and the change of sampling near the true value increases.

If you take random samples over and over and over and over again, they will converge on the true value. The larger the random sample, the more accurate it is likely to be.

Another example, modified from the book and loosely based on something that happened last year.

Your team has been contracted by the University of Waterloo to consult on a brand redesign. You need to survey the population of undergraduate students, graduate students, and professors to determine how they feel about the new university logo.

The variable we are interested in is attitudes towards the new logo. Respondents may either approve or disapprove.

As of 2014, there were 30,600 undergraduate students, 5,300 graduate students, and 1,139 full-time professors in 6 faculties.

There are 37,039 people in the sampling frame.

Let's randomly sample 600.

More on how we could do this properly below!

There could be between 0 and 100% approval for the new logo. Let's assume that 50% approve and 50% disapprove. (Obviously, the research team doesn't actually know this.)

Imagine taking 3 different samples of substantial size. None is a perfect reflection of the UW community, but each comes close.

We have 3 different sample statistics. If we kept sampling, we would continue to get different estimates of the percentage of people in the UW community that approve of the new logo. Again, they would converge on the true value. As we continue to sample and plot, we find that some estimates overlap. We begin to see a normal curve.

Obviously, in real research we only collect one sample. Knowing what it would be like to select thousands of samples allows us to make assumptions about the one sample we do select and study.

A Normal Curve!

If many independent random samples are selected from a population, the sample statistics provided by those samples will be distributed around the population parameter in a known way. We can see that most of the estimates fall close to 50%

We can also use a formula to estimate how closely the sample statistics are clustered around the true value. The formula to estimate sampling error is:

$$ s = \sqrt{\frac{P \times Q}{n}} $$

Where $s$ is the standard error. $P$ and $Q$ are the population parameters for the binomial ($P$ is approval and $Q$ is disapproval), and $n$ is the number of cases in each sample.

The standard error can tell us how the sample estimates are clustered around the population parameter. Because the standard error, in this case, is the standard deviation of the sampling distribution, we can determine confidence levels and confidence intervals.

Confidence Levels, Confidence Intervals

"Whereas probability theory specifies that 68 percent of that fictitious large number of samples would produce estimates falling within one standard error of the parameter, we can turn the logic around and infer that any single random sample has a 68 percent chance of falling within that range."

"We express the accuracy of our sample statistics in terms of a level of confidence that the statistics fall within a specified interval from the parameter. For example, we may say we are 95% confident that our sample statistics are within plus or minus 5 percentage points of the population parameter. As the confidence interval is expanded for a given statistic, our confidence increases. For example, we may say that we are 99.9% confident that our statistic falls within three standard errors of the true value."

In real research, we don't actually know what the population parameter is. So we use our best guess (i.e. the sample estimate) for the formula.

Speaking of the real world...

Probability sampling is messier in reality than in theory.

Types of Sampling Designs

Break into groups! Each group takes on type of design.

Simple Random Sampling
Systematic Sampling
Stratified Sampling
Multi-Stage Cluster Sampling
Multi-Stage Designs and Sampling Error
Stratification in Multi-Stage Cluster Sampling
Probability Proportionate to Size
Disproportionate Sampling and Weighting

Next Class

Babbie and Benaquisto Ch. 8, "Survey Research"

I will also include some lecture material from Groves et al. (2009) Survey Methodology. It is not necessary to do extra reading.

Survey Research

September 29

Thank you, Perd Hapley. :-)

From Parks and Recreation

Experiences with bogus surveys?

What are the signs?What makes a survey bad?

There are two "inferential steps" in survey methodology

between the questions you ask and the thing you actually want to measure between the sample of people you talk to and the larger population you care about

Errors are not mistakes, they are deviations / departures from desired outcomes or true values.

Errors of observation are when there are deviations between what you ask and what you actually want to measure.

Errors of non-observation are when the statistics you compute for your sample deviate from the population.

Surveys from the design perspective

You move from the abstract to the concrete when you design surveys. "Without a good design, good survey statistics rarely result." You need forethought, planning, and careful execution.

Survey Lifecycle from a Design Perspective

Adapted from Groves et al. (2009) Survey Methodology.

Survey Design as a Process

Adapted from Groves et al. (2009) Survey Methodology.

Surveys from a Quality Perspective

Let's Focus on the Left Side: Measurement

Adapted from Groves et al. (2009) Survey Methodology.

We can represent all of this with nice compact notation. In most cases, capital letters stand for properties of population elements and are used when we are talking about measurement and when sampling the population is not an issue. If we are drawing inferences about a population by using a sample, capital letters are for population elements and lower case are for sample quantities. Subscripts indicate membership in subsets of the population (e.g. $_i$ for the $i$th person).

Recall Class on Conceptualization and Measurement

choose appropriate question forms
- question and statements
- open-ended and closed-ended questions
make items clear
avoid double-barreled questions
respondents must be competent and willing
questions should be relevant (contingency questions / skip logic)
short items are best
avoid negative items
avoid biased items and terms
consider the potential impact of questionnaire construction, including question order, white space, etc.

$\mu_i$ = value of a construct for the $i$th person in the population, $i$ = 1, 2, 3, 4 ... N $Y_i$ = value of a measurement for the $i$th sample person $y_i$ = value of the response to the application of the measurement $y_{ip}$ = value of the response after editing and processing steps

We are trying to measure $\mu_i$ using $Y_i$, which will be imperfect due to measurement error. When we apply the measurement $Y_i$ (e.g. by asking a survey question), we actually obtain $y_i$. This is due to problems with administration. Finally, we try to mitigate these errors by making final edits, resulting in $y_{ip}$.

The measurement equals the true value plus some error term ($\epsilon_i$).

$Y_i = \mu_i + \epsilon_i$

The answers you provide on a survey are inherently variable. Given so many "trials," you might not provide the same answers. In theory, there could be an infinite number of trials! We can use another subscript $_t$ to denote the trial of the measurement. We will still use $_i$ to represent each element of the population (e.g. the person completing the survey).

$Y_{it} = \mu_{i} + \epsilon_{it}$

Q: Have you ever, even once, used any form of cocaine?

Survey respondents tend to under report behaviors that they perceive of as being undesirable. Even if the answer is yes, they may answer no.

What if the discrepancy between responses and the true value is systematic?

If response deviations are systematic, then we have response bias, which will cause us to under-estimate or over-estimate population parameters. If they are not systematic, we have response variance, which leads to instability in the value of estimates over trials.

PRETEST YOUR QUESTIONAIRRE!

Let's Focus on the Right Side: Representation

Adapted from Groves et al. (2009) Survey Methodology.

Coverage Error

There are people in the population that are not in our sampling frame, and there are people in our sampling frame that are not in our population.

Coverage of a target population by a frame.

Adapted from Groves et al. (2009) Survey Methodology.

Sampling Error

If there are some members of the sampling frame that are given no, or reduced, chance of inclusion, then we have sampling bias. They are systematically excluded. Sampling variance is not systematic and is due to random chance.

The extent of error due to sampling is a function of 4 things:

whether all sampling frame elements have known non-zero changes of selection into the sample (probability sampling) whether the sample is designed to control the representation of key sub-populations in the sample (stratification) whether individual elements are drawn directly and independently or in groups (cluster samples) How large a sample of elements is selected

$\bar{Y}$ = mean of the entire target population

$\bar{Y}_C$ = mean of the population on the sampling frame

$\bar{Y}_U$ = mean of the population not on the sampling frame

N = total number of members in the target population

C = total number of eligible members on the sampling frame

U = total number of eligible members not on the sampling frame

If the values of our statistics computed on the respondent data differs from the values we would get if we computed statistics on the entire sample data, then we have non-response bias.

$\bar{y}_s$ = mean of the entire sample as selected

$\bar{y}_r$ = mean of the respondents within the $s$th sample

$\bar{y}_n$ = mean of the nonrespondents within the $s$th sample

$n_s$ = total number of sample members in the $s$th sample

$r_s$ = total number of respondents in the $s$th sample

$m_s$ = total number of nonrespondents in the $s$th sample

$s$th sample? Yup. Conceptually this is similar to the idea of trials. The sample we draw is one of many that we might possibly have drawn. It's one single realization.

Adjustment Error

We make postsurvey adjustments to mitigate the damage of the types of errors we just discussed. Sometimes we introduce new errors.

You may have noticed

If a source of error is systematic, we call it bias. If it is not systematic, we call it variance. Most errors probably contain both biases and variances.

Here it is one last time.

Adapted from Groves et al. (2009) Survey Methodology.

Acceptable Response Rates

A response rate is the number of people participating in a survey divided by the number of people selected in the sample, in the form of a percentage.

Low response rates are a danger sign, suggesting that the nonrespondents are likely to differ from the respondents in ways other than their willingness to participate in your survey.

Some General Guidelines

50% is often considered acceptable

60% is often considered good

70% is often considered very good

These are only rough guides. They have no statistical basis, and a demonstrated lack of response bias is better than a high response rate.

General Guidelines for Survey Interviewing

dress in a fashion similar to the people you will be interviewing. as excruciating as it may be, strive to be as neutral as possible
be very familiar with the questionnaire
follow question wording exactly
record responses exactly
probe for responses
if relevant, carefully train and coordinate your team

Know Strengths and Weaknesses

self-administered
- speed, lack of interviewer bias, possibility of anonymity and privacy to encourage candid responses
face-to-face interview survey
- fewer incomplete questionnaires, generally higher response rates, greater flexibility in terms of sampling and special observations
telephone survey
- save time and money, are safer, and may have a smaller effect on the interview itself
online survey
- cheaper, but can be more difficult to ensure that the respondents represent a more general population

Next Class

Bram Thinking Statistically Ch. 1

Thinking Statistically

October 4 & 6

Today

follow up from quizzes remaining questions about sampling and selection bias project time - forming groups, picking topics

Bayes Theorem

Let's just get right into an example.

A zombie example...

The zombies from iZombie, not The Walking Dead...

Hypothetically. Obviously...

turns out that zombies can pass as people...
we need to know which people in our village of 100 have been infected with the zombie virus.
unrealistically, we know that ~ 20 people are zombies. don't ask how, we just know.
the most reliable test is an experimental drug that makes it more difficult for the zombies to pass as humans.
your drug is not perfect:
- 90% of the time, the drug will make zombies unable to pass
- 30% of the time, it will make a non-zombie appear to be a zombie

Liv Moore, iZombie

Your friend takes the drug. She starts to look and behave like a zombie. What is the probability that she is a zombie?

Remember, there are 100 people in our village. Twenty are actually zombies. :(

Our drug will identify zombies correctly 90% of the time. Unfortunately, it will also give false positives (i.e. incorrectly identify non-zombies as probable zombies) 30% of the time.

The drug suggests:

$ .9 \times 20 = 18 $ zombies

$ .3 \times 80 = 24 $ non-zombies

Now let's focus on the $ 18 + 24 = 42$ that are possible zombies.

Of the 42 that appear to be zombies, 18 actually are and 24 are not.

The probability is:

$$18/42 = 3/7 = 43\%$$

So, equally important, what is the probability that someone who has taken the drug but does not appear to be a zombie is actually a zombie?

Recall

20 zombies, 80 non-zombies

90% success for zombies ("sensitivity")

30% false positives ("specificity")

$ .1 \times 20 = 2 $

Despite our fancy drug, 2 zombies continue to pass!

Since 58 people did not appear to be zombies when given the drug, $2/58 = 1/29 = 3.4\%$ of the villagers continue to live as secret zombies.

Conditional Probability

What is the probability that someone is a zombie?

$P(X)$

What is the probability that someone is a zombie given that they started acting like a zombie once they took the drug?

$P(X \mid Y)$

$X$ = Hypothesis $Y$ = Evidence

Bayes' insight was that the conditional probability depends on 4 different things:

$P(X \mid Y)$ depends on $P(Y \mid X)$
the prior probability that the hypothesis was correct before we saw the new evidence
probability of the new evidence given alternative hypotheses
prior probabilities of each of the alternative hypotheses

Bayes's Theorem:

$$ P(X \mid Y) = \frac{P(Y \mid X_{1}) \times P(X_{1})}{P(Y)} $$

Now, what are the hypotheses and evidence from the zombie example?

Another example: What is the probability that Spike is a vampire?

Bayes is about updating beliefs when confronted with new evidence. This requires a prior belief to update from. If you get the prior probability wrong, your conclusions will be wrong even if the update was correct.

Another name for the prior is base rate.

The base rate fallacy is when you do not take account of the base rate -- i.e. the prior probability that something was true before new evidence was introduced.

"What is the probability that a person is zombifying given that the test came out positive, and given that the test is pretty good but it isn't perfect, and given that non-zombies greatly outnumber zombies in our current population?"

Now... Explain what went wrong in the Sally Clark case?

Indexes, Scales, and Typologies

October 13

In class activities.

Basics of Quantitative Data Analysis

October 18

See R scripts distributed for class lab sessions.

Descriptive Statistics & Exploring Data with Graphs

October 20 - 27

See R scripts distributed for class lab sessions.

Brief Introduction to Linear Regression

November 1

Simple Regression

With correlation, we measure the association between two quantitative variables. What if we want to predict one variable from another? Any particular outcome can be predicted by a combination of a model and some error.

$$ outcome_i = (model) + error_i $$

If there is a linear relationship between our response and explanatory variables, we can summarize the relationship between them with a straight line.

Why do criminal sentences vary? Could it be related to the number of prior convictions a person has?

We use the equation of a straight line. Let's start with just one explanatory variable, which makes this a simple regression:

$$ y = a + bx $$

where

$a$ is the y-intercept of the line (i.e. the y-value corresponding to an x-value of 0)
$b$ is the slope of the line, indicating how much y changes when x is increased by 1

If $b$ is positive, the value of y increases as x increases. If $b$ is negative, it decreases as x increases. If $b$ = 0, the value of y does not change with x.

We will fit a regression model to our data and use it to predict values for the response variable.

$$ Y_i = (b_0 + b_1 X_i) + \epsilon_i $$

where

$Y$ is the outcome we want to predict
$\epsilon$ is an error term / residual
$_i$ is an individual person's value (e.g. number of prior convictions)

$b_0$ and $b_1$ are regression coefficients.

$b_0$ is the y-intercept
$b_1$ is the slope of the line

Continuing with our example, if we want to predict the length of the sentence based on the number of prior convictions:

$$ Y_i = (b_0 + b_1 X_i) + \epsilon_i $$

$$ Y_i = (b_0 + b_1 Priors_i) + \epsilon_i $$

where

the length of an individual's sentence is a function of (1) a baseline amount given to all defendants + (2) an additional amount for each prior conviction, and (3) a residual value that is unique to each individual case.

There are many lines we could fit to describe the data. To find the line of best fit, we typically use a method called least squares. The method of least squares will go through, or get close to, as many of the points as possible.

We will have both positive and negative residuals, because there will be data points that fall both above and below our line of best fit. We square the differences before adding them up to prevent the positive residuals (points above the line) from canceling out the negative residuals (below the line).

If the squared differences are very big, the line does a poor job of representing the data. If the squared differences are small, it does a good job of representing the data.

The line of best fit is the one with the lowest Sum of Squared Differences ($SS$ for short, or $\sum residual^2$). The method of least squares selects the line with the smallest $SS$.

Goodness of Fit

We might have the best line possible, but is it a good fit?

We have the best line possible now. But what if it does a really bad job of actually fitting the data? To assess the goodness of fit:

$R^2$ tells us how much variance in the outcome variable that our model explains compared to the amount of variance that was there to explain in the first place.
$F$ tells us how much the model can explain relative to how much it can't explain.

These values will be reported in your R output.

Cautions

linear least squares is a good summary if the relationship is indeed linear, and if the data are "well-behaved"
there can be problems if there are influential outliers. Always plot your data, including plots like residuals against $x$ to help identify influential outliers.

Anscombe's Quartet

Regression: $Y$ = 3 + 0.5($X$). The linear least squares regression is only a good summary of the relationship between $x$ and $y$ for the first dataset. In the second dataset, the relationship is non-linear. In the third dataset there is an outlier. In the fourth dataset, the least squares line chases the influential observation.

Assumptions

the average values of $y$ are linearly related to $x$. $\mu_y = a + Bx$ the standard deviation of $y$ is the same regardless of the value of $x$ the distribution of $y$ at each value of $x$ is a normal distribution the values of $y$ are independent of one another

These assumptions can easily be wrong. We have attempt to check and see if the assumptions are reasonable.

Extrapolation

It is dangerous to summarize a relationship between $x$ and $y$ beyond the range of the data.

Lurking Variables

A lurking variable is one that has an important effect on the relationship between $x$ and $y$ but is omitted from the analysis. This can lead to you missing a relationship that is present, or inducing a relationship that is not present.

The possible presence of lurking variables makes causal analysis more different in observational work. Multiple regression does better than simple regression. Experimental designs are best are mitigating lurking variables.

Multiple Regression

Why do criminal sentences vary? deserved? moody judge? long criminal record? vicious crime? defendant's race? race of victim? There could be many theories. What is the relative importance of each variable?

By extending regression analysis to 2 or more explanatory variables, we (1) reduce the size of the residuals and therefore account for more variation in the response variable, and (2) can hold these additional causes of the response variable constant statistically, resulting in a more accurate estimation of the effect of $x$ on $y$ because it is less likely that we will omit lurking variables.

We can extend the number of explanatory variables in multiple regression to $k$ variables, $x_1, x_2, ..., x_k$ for the regression equation:

$$ y = a + b_1 x_1 + b_2 x_2 + ... + b_k x_k + residual $$

Now we have a new coefficient for each explanatory variable we add! :) The outcome is predicted from the combination of all the variables multiples by their respective coefficients and, of course, the residual term.

$b_1$ is the average change in $y$ for a one-unit increase in $x_1$ holding the other explanatory variables constant.

The slope $b1$ can also be interpreted as the average change in $y$ associated with a one-unit increase in $x1$ holding the value of $x2$ constant.
The slope $b2$ is similarly interpreted as the average change in $y$ associated with a one-unit increase in $x2$ holding $x1$ constant.
etc.

How do you add predictors to a model?

hierarchical: add known variables (from theory and previous research) first, add new variables last forced-entry: force all explanatory variables into the model at the same time stepwise: enter the variables into the model based on mathematical criteria, typically correlation with the outcome

A Final Caution

Association is not causation. Obviously.

Designing a Qualitative Project

November 3

Janice Aurini, University of WaterlooMelanie Heath, McMaster UniversityStephanie Howells, University of Guelph

All of our previous conversations about research design are relevant. What's different about today is the focus specifically on qualitative designs and methods.

Recall

What makes a good master research question?
What makes a good "sub-question?"
How do you align questions and methods?
How do you develop a plan for collecting high quality data?

what? how?

variance theory vs. process theory
- thick description
- causal mechanisms
- systematic comparisons

Is your question researchable?

Feasibility

Can I answer my question?
What kind of data will I need to answer my question?
Do I have the resources (e.g. time and money) to gather the data I need to answer my question?
Are my research questions or required data ethical?
Does my question make sense? Is it too narrow or complicated? Is it based on an empirical, theoretical, or policy problem?

Interest, contribution, and potential criticisms

Will my research question accommodate (possibly inconvenient) surprises?
Will my research questions allow me to accommodate findings that challenge conventional wisdom?
Has my question already been asked before? If so, what will I add to the literature?
Do people established in the field think my research question is interesting?
What are the potential criticisms or flaws in the kind of question I am conside

– Research Methods & Design – Course Links

mclevey

– Research Methods & Design – Course Links

0 0 (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/platform.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();