Thursday, January 30, 2014

Program evaluation notes

Program evaluation

1. Non-experiments
Means using R2 statistical analysis to find response.  Uses statistical controls (program variables x1+x2+x3) to analysis outcome measures (independent variable y)

2. Experimental designs-Random field experiments
Must Have
1.    comparison-random assignment of cases to experiment vs. control group
2.    manipulation- treatment should produce change in unit of analysis
3.    Control-internal validity (controlled through random assignment)
4.    Generalizabilty-external validity

    Must have 30 items in the sample
    Randomly selected
    Avoid selection threats

3 types of tests
1.    pretest post test
2.    pretest comparison
3.    salamon 4

Positive
Use simple statistics use t test or f tests
Ethical issues holding on reservation
Treats to validity, external depend on large sample size N
Never randomize to suite probs
Control many factors- get selection treats
Simple to understand

Negative
Cause group doesn’t explain why
Not always feasible
Not always generalizable
Lots of internal threats – instrumentation- multi-treatment- self selection-attrition
Must have large N

With randomized field experiments,  you can get the closes to causal inference, based on random assignment of program and control groups.  Lottery is a randomized field experiment.  Yet it is not the most efficient way to redirect scarce resources.  Each group must be large (greater than 30)  and be composed of the same % of sex, race, characteristics.   Select numbers run program on one group and not on the other. We only need random assignment not random selection. If the numbers are less then 30 you can not do random assignment but must do a quasi-experiment. Unit of analysis is important.



3.    Quasi-Experimental Design
Absence of random assignment makes QE different than experimental design, tend to be retrospective.  Internal Validity more questionable Selecting unit of analysis or variable that could be effective and related to treatment selected comparable places
 3 types
1.    cross sectional- experiments with comparable units XS
2.    Time series- before and after treatments- TS
3.    Both cross sectional and time series- comparison before and after time series

Types of studies:
1.    No comparison
a.    descriptive case study- not good program evaluation
2.    Posttest Only comparison group
a.    Threats to valid causal inference, who knows if program caused difference, or if it was another variable.
3.    Pretest-Posttest Comparison Group
a.    You should have baseline data for this formula.
b.    You control for self selection bias
c.    Random assignment would help separate groups
4.    Pretest-Posttest
a.    2 data points- not strong design, history and maturation problems?
b.    Have baseline data
c.    Yet can´t claime effectiveness of program –other variables may effect group
5.    Interrupted Times Series
a.    strength control for maturation yet no comparison, selection threats, purely reflexive design
6.    Interrupted Times Series Comparison Group- many different levels o interventions


4 types of Validity

A.  External Validity = must be generalizable
1.    Time not general range of economic growth
2.    Place- not general to all US but specific place
Solved by having a Large N with random selection

B.  Measurement Validity
•    Reliability – absence of random error
•    Validity—absence of non-random measurement error

Best way to reduce both –multiple indicator to explain differences with random measurement error
3 types of measurement validity
1.    face values-measurement instrument really measure what it is suppose to
2.    concept-construct- is measured indicators related to one another
3.    predictive validity- score on GRE- how well you will do in school? The valid measurement will yield the correct outcome.

C. Statistical Conclusion Validity- refers to the accuracy with which system affects are separated from random effects (stochastic affects=
    Sources of randomness
1.    sampling error
2.    random measurement error
3.    inherent in human behavior
4.    small sample size
Soled by having a larger N sample
If studying a population use statistic test to nullify- need large N

Type 1 error is finding a program effective when it is not
•    reject the null hypothesis lower levels of significances
•    academic research focuses on it
•    F= low
•    reject null

Type 2 error is finding no effect when the program is effective
•    Accept the null hypothesis
•    Beta program evaluation focuses on
•    F= high
•    Accept the null kill the program
•    Policy analysis must focus on type 2 errors

How do determine the power of a tests
1.    frequently use increase level of significance, measure powerful test
2.    use .05 level of sig. As base

D. Treats to Internal Validity
Problem with internal validity it is impossible to prove causal claims, no study is accurate.  Some studies have mere internal validity treats than others . How to improve internal validity design matters.

Types
1.    History (TS) an event other than the change in the treatment (x) might cause the outcome (y) to change  (single event)
2.    Maturation (TS) Y man be changing partly because of underling trend and not because of treatment (x)
3.    Testing, (TS) while taking a test, no change in treatment, may cause the outcome (y) to change – external and internal treat (aware of being studied)
4.    Instrumentation (TS, CS) change in calibration of measurement procedure or instrument may partly or entirely cause the outcome (Y) to change, rather than the treatment – change treatment causes outcome to change
5.    Regression artifacts (TS, CS)  extreme high or low scores chosen often, there is a tendency for extreme scores to return normal.  Chose highest more likely to go down.
6.    Selection (CS) when the group to be compared differ on factors besides treatments (x) than these differences (z) may account partly or entirely for the observed difference in outcome (y). example public vs. private schools
7.    Attrition (TS) when 2 or more groups are being compared, observed between-treatment difference in outcome (y) may be partly or entirely attributable to a differential loss of respondents rather than to the treatment (x)
8.    Multiple treatment interference (TS,CS) when one treatment (x1) is confounded with another (x2) that it is impossible to separate the impacts of one to the other
9.    Contamination (CS) when one group finds out about the treatment and there is no difference in outcomes (y)


Symposium on Behavioral Approaches to Bureaucratic Red Tape and Administrative Burden

CALL FOR PAPERS Public Administration Review Symposium Editors: Christopher Carrigan, The George Washington Universit...