Thursday, January 30, 2014

Statistics Review


Statistics


This is a non-experimental design:  main program (x) is either continuous or categorical: matching or keeping groups comparable is mostly or entirely more multiple statistical controls designated as a vector of control variables, collectively labeled as z.  Outcome variables are y.

1. Estimate Evaluation
Y= A,+ B, X, + B,,X,,+ B,,,X,,, ... + e

Y= DEPENDENT VARIALBE
A= CONSTANT
B= Parameter estimates
X= INDEPENDENT VARIALBE
E= ERROR TERM

CONTROLS??

2. Interprets that
            each unit change in X, changes Y, by B,  holding everything content

3. Discuss significance:

  • The t test= significance if the X’s are greater than 2 or less than –2, and the coefficient is significant in difference than 0

  • R square  is the % of variation in y explained by the variation in X

  • Adjusted R2 = percentage of variation in Y explained by the variations in the x’s

  • The f test= if variation in the x’s f prob. Is greater than .05, than the r squared (r2) is statistically significant

4. Evaluate the model
  • Omitted and irrelevant variables
  • multi-colinearity
  • hedorasticity
  • auto or serial correlation

Assumption for Parameter Estimates X independent

Question: what do the letters mean E(u)=0
  1. E(u)=0 In general, this assumption means that any independent variable you couldn’t think of to include in the analysis are just noise- they have no impact on the dependent variable
  2. no random or non-random measurement error in X
    1. Random measurement error-data collected form human responses, recording or transcribing data, etc.
    2. Non- random error- the extent to which a measure reflects the concept it is intend to measure.  An example of non-random error would be if I am trying to measure impact of 9-1, and I include an unrelated variable like pilot hair color, that would not non-random error. At the same time, if I purposely leave out an important variable such as daily number of airline passengers in the past 12 months, that is also non-random error.
  3. No random or non-random measure in Y
  4. No correlation between x and unmeasured-unobserved variables
    1. This means no omitted variables- selection bias
    2. No simultaneity- or it has to be clear that the independent variable impacts the dependent variable and no the other way around
  5. Must have linear functional form
    1. If you have significant independent variables and a low R-square, you may have poor linear function form which means that using a linear regression equation is not appropriate

Assumptions need to believe significances test for X
  1. No autocorrelation-observations must be independent of one another.  If observations are not independent, you cannot trust the accuracy of that variable. This can be a problem in:
    1. A time series design- observations from time 1 not independent form time 2
    2. Cross sect oral data- for example if you are observing a whole group of people at the same time and are trying to keep data of each person in the group individually.  Each person in the group is impacting the other people in the group so the observations are not independent

  1. No Heteroscedasticity- if the variance of the error term is not constant for all observation there is heteroscedasticity.  For example, if you were trying to measure the impact of gun legislation on gun related death in the US by collecting CS data from all 50 states, there would probably be heteroscedasticity.  The reason is that with such varying populations, the variance in the error term would be different with different populations.
    1. Look for it in
                                               i.     CS data or TS mostly cross-sectional
                                             ii.     Aggregate data- like states each unit has a different N
                                            iii.     Test scores or policy opinions
                                            iv.     When dependent is spending and independent or control is income

  1. No severe co linearity (same as mulicolinearity) There can be no significant relationship among independent variables If there are R-square will be high and independent variables will be insignificant
    1. Example I am using hair color and ethnicity as independent variables in a regression, neither will be significant because they are highly related.

  1. Omitted variables: things that should have been controlled for but were forgotten should be listed at the end
  2. Random measurement error
    1. Reliability output and Validity in Design
    2. Human error- people write in something,
    3. Large sample for less random error
    4. Can control just by adding more samples
  3. Non-random measurement error
    1. make a mistake or human bias, can’t be predicted
    2. irrelevant variables- put into regression
    3. measure validity
    4. measuring hair when you want weight
    5. control by design

Measurement reliable- random measurement error is absent
Measurement validity- non random measurement error

3 .x=independent
v= obscure variables
omitted variables is the relationship between variables

Regressions: what to know, if we have x can we determine with y? Is there a correlation between the 2 variables?

  • R is between              –1        0          1+
 Zero is no correlation
-1 is negative perfect correlation
+1 is positive correlation

  • R square  The coeffeicent of determination tells the % of variation in the dependent variable (y), explained by the independent variable (x).

Total variation explained is the sum (y-y)2
If I know x can improve my perditions of y over using the man score to predict

linear regression:  y= a (intercept) + b (slope) x

r squared is the improvement in perdition of the score from using the mean
slope is also the regression coefficient



No comments:

Non-Market Failure Justifications for Government Intervention

                                                                               Non-Market Failure Justifications for Government Inter...