Stats
Essay by fiftythreee • July 7, 2015 • Study Guide • 1,192 Words (5 Pages) • 1,166 Views
336 Final Review Notes
Regression
Goal: identify the function that describes the relationship between a continuous dependent variable and one or more independent variable(s)
SLR = Simple Linear Regression involves 1 variable
MLR = multiple Linear Regression involves more than 1 variable
SLR
We are trying to estimate: Y= B0 + B1X
Where Y = dependent variable
B0 = True y-intercept
B1 = True Slope (the amount by with the line rises/falls per increase in an additional unit)
X = Independent Variable
Estimated Function: ŷ = b0 + b1X
HOW TO SOLVE:
- Identify the Dependent Variable (Y) and Independent Variable (X)
- Create a Scatterplot and look for general linear regression
- SOLVE using regression on a calculator
- Calculate and generate “co-efficient” of the best fit line, which are bo (a) and b1 (b)
- a or b0 = estimated y-intercept: math: the value of y when x=0; or means that if you do not study, on average you will receive 33.41%
- b or b1 = estimated slope: math: the rate of change in y as x increases by 1; or for every additional hour studied, we would expect on average that the final grade would increase by .8225%
- Evaluate the Quality of the model by using:
- Coefficient of Correlation (r) – describes how strong the linear relationship is. If r is between -1 ≤ r ≤ 1, the linear relationship is strong
- Coefficient of Differentiation (r) – describes the goodness of fit. If r2 is between 0 ≤ r2 ≤ 1, the fit of the model is good.
- R2 = ___% of variability in predicting the (dependent variable) is explained by knowing the (independent variable)
ESS – Error Sum of Squares
- Always non-negative thus ESS ≥ 0
- IF ESS = 0, then Y1 - Ŷ1 = 0 means a perfect fit with is RARE
- Thus, our goal is to minimize ESS as close to zero to find the best b0 and b1 by using method of least squares (TSS = RSS + ESS)
- Regression Report RSS is the regression ss, ESS is the residual ss, and TSS is the total ss
Hypothesis Testing
- Hypothesis
- H0 = 0 (means that there is no linear relationship between x and y)
- Ha ≠ 0 (means that there is a linear relationship that exists between x and y)
- Α = 0.05 (level of significance)
- Decision Rule (3 Methods)
- T-TEST
- We REJECT H0 if ttest > -tα/2 or ttest > tα/2
- Ttest = (b1 – B1)/ Se(b1)
- T-chart!! Df = n-2 where n = sample size
- *NOTE* two-tailed
- USING P-VALUES
- REJECT H0 if p-value < α
- We estimate the p-value by using the t-table (the upper tail area probability)
- *NOTE* two-tailed, make sure to multiple each probability by 2
- 95% CONFIDENCE INTERVAL
- CI = b1 + tα/2 (Seb1)
- REJECT H0 if Confidence Interval DOES NOT contain 0
- Draw a Conclusion
MLR (Multiple Linear Regression)
- Linearity – best fit line
- Normality – every value of X is normally distributed around the regression
- Homoscedasticity – random patterns of residual error
- Independence of errors – any error is independent from other errors
- b1 represents the change in y with an increase in one unit of x, when all other variables are held constant
- Evaluating Regression Models:
- Adjusted R2
- Hypothesis Testing -> Overall Significance (only used before performing backwards ML)
Forward Regression
Determining the “best model” by finding SLR of X1; then finding X1 + X2 and X1 + X3, finding the best model, then finding X1 + X2 + X3 then evaluating which model is best by adjusted r2 and amount of violations.
Multicolinearity
- We want y(dependent) and x (independent) variables to by highly correlated, however multicolinearity means that independent variables (Xs) are correlated with each other
- CAUSES: regression coefficients to have the “wrong sign”, t-values too small, and p-values too big.
SOLVE WITH: overall significances F Test (first) and Correlation Matrix (last)
Backward Multiple Regression
- Overall F-Test to determine if at least 1 independent variable is a significant predictor
- Hypothesis: H0 = B1 = B2 = B3 … = 0 (meaning all variables are unimportant to predict)
Ha = At least one Bk ≠ 0 (meaning at least one variable is important at predicting)
- Decision Rule: P-value approach: REJECT H0 if p-value < α
- Make a conclusion in Business Context – “based on this test, a 5% level significance, we can say that AT LEAST one of the independent variables is significant in predicting the dependent variable.
- Continue to re-run and remove independent variables that have violates, eg. If their p-values < α
- Check MULTICOLLINEARITY with Correlation Matrix
- Anything that is over ±0.6 may present a problem
- Check the signs of the coefficients of the correlation matrix with the coefficients of your regression model for “incorrect” signs
Dummy Variables
- All coefficients are relative to the reference (dummy) variable
- Eg. If the coefficient is -17, that variable sells 17 less units than the reference variable
- SEE LECTURE NOTES EXAMPLE
Linear Programming (LP)
- A decision to be made with the respect to allocation of resources
- Constraints
- An objective function such as maximizing profits or minimizing costs
- Problems with 2 decision variables can be solved graphically
STEPS:
- Define the decision variables
- State the Objective Function
- State the Constraints
- State the Non-Negativity Constraints
- FOR 2 DECISION VARIABLES ONLY
- Graph the relationships
- Find the feasible area
- Location the optimal point by using the objective function
- Identify the constraints involved with the optimal point and solve
Special LP Conditions
- Alternate Optimal Solutions – more than one “best” solution
- Redundant Constraints – plays NO part in determining the feasible area
- Unbounded Solutions – solution with infinitely large or small solution, no limit to the decision variables
- Infeasibility – No way to satisfy all constraints; no feasible area
100% Rule
Sensitivity report is valid if: 2 or more objective function coefficients change simultaneously or 2 RHS constraints change simultaneously
...
...