Stata On-line Lecture Series (2) : 기초통계와 회귀분석

2020-11-28

2. Correlation coefficient

상관계수: 두 확률변수의 $\color{red}{\text {선형관계 (linear relationship)}}$를 측정한다.

$$ corr(X,Y)=\frac{cov(X,Y)}{\sqrt{var(X)var(Y)}} $$ $$ \text{note that } -1 \leq corr(X,Y) \leq 1 $$

scatter plot을 이용한 상관관계 예측

. use R_data4_1, clear 

. twoway (scatter write math, msymbol(sh) mcolor(red%50))

Q) Stata의 그림파일을 HWP 또는 Word 로 옮기는 방법은 ?

Q) 그림파일을 *.png 파일로 저장하는 명령어는?

pwcorr 명령어

. pwcorr read write 

             |     read    write
-------------+------------------
        read |   1.0000 
       write |   0.5968   1.0000 

. pwcorr read write math , listwise sig star(0.05)

             |     read    write     math
-------------+---------------------------
        read |   1.0000 
             |
             |
       write |   0.5941*  1.0000 
             |   0.0000
             |
        math |   0.6600*  0.6141*  1.0000 
             |   0.0000   0.0000
             |

Q) corr 과 pwcorr 명령어의 차이점은?

Q) return list 의 활용

3. Simple linear regression

종속변수 y와 설명변수 x의 선형관계를 추정하는 모형을 설정한다.

$$ y_i=\alpha+\beta x_i+e_i $$

Estimation

주어진 표본 데이터 x와 y를 이용하여 최소자승 추정량(OLS estimator)를 적용하여 추정치를 얻을 수 있다. 일정한 가정 하에서 OLS 추정량은 BLUE(Best Linear Unbiased Estimator)가 된다.

. use R_data8_1, clear 

. reg food_exp income 

      Source |       SS           df       MS      Number of obs   =        40
-------------+----------------------------------   F(1, 38)        =     23.79
       Model |   190626.98         1   190626.98   Prob > F        =    0.0000
    Residual |  304505.173        38  8013.29403   R-squared       =    0.3850
-------------+----------------------------------   Adj R-squared   =    0.3688
       Total |  495132.153        39  12695.6962   Root MSE        =    89.517

------------------------------------------------------------------------------
    food_exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |   10.20964   2.093263     4.88   0.000     5.972052    14.44723
       _cons |   83.41601   43.41016     1.92   0.062    -4.463272    171.2953
------------------------------------------------------------------------------

. ereturn list 

scalars:
                  e(N) =  40
               e(df_m) =  1
               e(df_r) =  38
                  e(F) =  23.788841278548
                 e(r2) =  .3850022234808774
               e(rmse) =  89.5170041283277
                e(mss) =  190626.97975307
                e(rss) =  304505.1730682194
               e(r2_a) =  .3688180714672162
                 e(ll) =  -235.5088193402595
               e(ll_0) =  -245.2315518722239
               e(rank) =  2

macros:
            e(cmdline) : "regress food_exp income"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "food_exp"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 2
                  e(V) :  2 x 2

functions:
             e(sample)

Q) ereturn list는 어떻게 활용할 수 있는가?

상수항이 없는 모형을 추정하기 위해서는 noconstant 옵션을 사용한다.

. reg food_exp income, noconstant cformat(%9.3f)

      Source |       SS           df       MS      Number of obs   =        40
-------------+----------------------------------   F(1, 39)        =    394.28
       Model |  3377595.38         1  3377595.38   Prob > F        =    0.0000
    Residual |  334093.953        39  8566.51161   R-squared       =    0.9100
-------------+----------------------------------   Adj R-squared   =    0.9077
       Total |  3711689.33        40  92792.2333   Root MSE        =    92.555

------------------------------------------------------------------------------
    food_exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |     14.012      0.706    19.86   0.000       12.585      15.440
------------------------------------------------------------------------------

R_squared 통계량

x 변수가 y변수의 변화를 설명하는 정도를 표현한다. 모형의 적합도(goodness of fit)으로 해석한다.

$$ R^2=\frac{SSR}{SST}=1-\frac{SSE}{SST} $$ $$ \text{ where } SSE= ?? , SSR=?? , SST=?? $$

. reg food_exp income 

      Source |       SS           df       MS      Number of obs   =        40
-------------+----------------------------------   F(1, 38)        =     23.79
       Model |   190626.98         1   190626.98   Prob > F        =    0.0000
    Residual |  304505.173        38  8013.29403   R-squared       =    0.3850
-------------+----------------------------------   Adj R-squared   =    0.3688
       Total |  495132.153        39  12695.6962   Root MSE        =    89.517

------------------------------------------------------------------------------
    food_exp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      income |   10.20964   2.093263     4.88   0.000     5.972052    14.44723
       _cons |   83.41601   43.41016     1.92   0.062    -4.463272    171.2953
------------------------------------------------------------------------------

. di e(r2)
.38500222

. corr food_exp income 
(obs=40)

             | food_exp   income
-------------+------------------
    food_exp |   1.0000
      income |   0.6205   1.0000


. di r(rho)^2 
.38500222

Q) 단순선형회귀모형에서 $ R^2 $ 와 $ corr(x,y) $ 의 관계는?

prediction

추정결과를 이용하면 y 변수의 fitted value를 구할 수 있다.

$$ \hat y_i=\hat \alpha+\hat \beta x_i $$

예측된 직선(predicted line)은 lfit 명령어를 이용하여 그래프로 표현할 수 있다.

. reg food_exp income 

. twoway (scatter food income,mcolor(red%50) msymbol(dh)) (lfit food income)

4. Multiple linear regression

설명변수에 해당하는 x 변수가 2개 이상인 선형회귀모형을 설정한다. 다음 식에서 설명변수는 (상수항 포함) $ k+1 $ 개 이다.

$$ y_i=\beta_0+\beta_1 x_{1i}+\beta_2 x_{2i}+\cdots + \beta_k x_{ki}+e_i $$

. use R_data8_3, clear 
(Housing price data for Boston-area communities)

. reg price nox crime 

      Source |       SS           df       MS      Number of obs   =       506
-------------+----------------------------------   F(2, 503)       =     76.98
       Model |  1.0036e+10         2  5.0181e+09   Prob > F        =    0.0000
    Residual |  3.2789e+10       503  65187676.7   R-squared       =    0.2343
-------------+----------------------------------   Adj R-squared   =    0.2313
       Total |  4.2826e+10       505    84803032   Root MSE        =    8073.9

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         nox |   -2538.31    341.962    -7.42   0.000    -3210.159    -1866.46
       crime |  -271.6976   46.11359    -5.89   0.000    -362.2966   -181.0986
       _cons |   37579.82   1868.701    20.11   0.000      33908.4    41251.24
------------------------------------------------------------------------------

. reg price nox crime, beta  

      Source |       SS           df       MS      Number of obs   =       506
-------------+----------------------------------   F(2, 503)       =     76.98
       Model |  1.0036e+10         2  5.0181e+09   Prob > F        =    0.0000
    Residual |  3.2789e+10       503  65187676.7   R-squared       =    0.2343
-------------+----------------------------------   Adj R-squared   =    0.2313
       Total |  4.2826e+10       505    84803032   Root MSE        =    8073.9

------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|                     Beta
-------------+----------------------------------------------------------------
         nox |   -2538.31    341.962    -7.42   0.000                -.3192976
       crime |  -271.6976   46.11359    -5.89   0.000                -.2534462
       _cons |   37579.82   1868.701    20.11   0.000                        .
------------------------------------------------------------------------------

Q) ** Beta coefficients ** 는 어떻게 해석하는가?

Q) 표준화 변수를 생성하는 명령어는?

reg 명령어를 사용하면 추정결과는 특정한 이름으로 저장되어 있고 그 저장결과를 활용할 수 있다.

. qui reg price nox crime 

. di _b[nox]
-2538.3095

. di _se[nox]
341.96202

. di "t-value = " _b[nox]/_se[nox]
t-value = -7.4227821

. ereturn list 

scalars:
                  e(N) =  506
               e(df_m) =  2
               e(df_r) =  503
                  e(F) =  76.97873479111324
                 e(r2) =  .2343492184967946
               e(rmse) =  8073.888574956597
                e(mss) =  10036129755.88084
                e(rss) =  32789401390.56978
               e(r2_a) =  .2313048813934021
                 e(ll) =  -5268.652026932938
               e(ll_0) =  -5336.210392272958
               e(rank) =  3

macros:
            e(cmdline) : "regress price nox crime"
              e(title) : "Linear regression"
          e(marginsok) : "XB default"
                e(vce) : "ols"
             e(depvar) : "price"
                e(cmd) : "regress"
         e(properties) : "b V"
            e(predict) : "regres_p"
              e(model) : "ols"
          e(estat_cmd) : "regress_estat"

matrices:
                  e(b) :  1 x 3
                  e(V) :  3 x 3

functions:
             e(sample)   

. mat list e(b) 

e(b)[1,3]
           nox       crime       _cons
y1  -2538.3095  -271.69761   37579.821

종속변수에 로그를 취하는 경우

y 변수에 취하는 경우 추정계수 해석은 x 변수가 1단위 증가할 때 y 변수는 $ \hat \beta \times 100 \text {%} $ 증가/감소(% change)로 해석한다.

. reg lprice nox crime 

      Source |       SS           df       MS      Number of obs   =       506
-------------+----------------------------------   F(2, 503)       =    152.91
       Model |  31.9812631         2  15.9906315   Prob > F        =    0.0000
    Residual |  52.6010079       503  .104574568   R-squared       =    0.3781
-------------+----------------------------------   Adj R-squared   =    0.3756
       Total |  84.5822709       505  .167489645   Root MSE        =    .32338

------------------------------------------------------------------------------
      lprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         nox |  -.1230908   .0136965    -8.99   0.000    -.1500001   -.0961815
       crime |  -.0181402    .001847    -9.82   0.000    -.0217689   -.0145115
       _cons |    10.6897   .0748463   142.82   0.000     10.54265    10.83675
------------------------------------------------------------------------------

. reg lprice lnox crime 

      Source |       SS           df       MS      Number of obs   =       506
-------------+----------------------------------   F(2, 503)       =    153.55
       Model |  32.0636239         2   16.031812   Prob > F        =    0.0000
    Residual |   52.518647       503  .104410829   R-squared       =    0.3791
-------------+----------------------------------   Adj R-squared   =    0.3766
       Total |  84.5822709       505  .167489645   Root MSE        =    .32313

------------------------------------------------------------------------------
      lprice |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        lnox |  -.7145337   .0790602    -9.04   0.000    -.8698626   -.5592049
       crime |   -.017933   .0018537    -9.67   0.000    -.0215749   -.0142911
       _cons |   11.21559   .1319038    85.03   0.000     10.95644    11.47474
------------------------------------------------------------------------------

$$ \text {elasticity of } x = \frac {\frac {\triangle y}{y}}{\frac {\triangle x}{x}} = \frac {\triangle log(y)}{\triangle log(x)} $$

Q) 위 명령문에서 elasticity of nox를 구하는 방법은?

다중공선성(Multi-collinearity)

OLS 추정을 위한 가정 중 하나는 (상수항을 포함하여) 설명변수 간 완전한 선형관계(perfect linear relationship)이 존재하지 않아야 한다. x 변수 간 완전한 선형관계가 존재하면 특정 x 변수의 추정계수는 식별되지 않는다.

$$ \text {Model 1: } HRS=\beta_0+\beta_1 AGE + \beta_2 NEIN +\beta_3 ASSET + e $$ $$ \text {Model 2: } HRS=\beta_0+\beta_1 AGE +\beta_2 ASSET + v $$ $$ \text { where NEIN: non-labor income and ASSET: the amount of asset } $$ Model (1)에서 $ SE(\hat \beta_2) $ 는 $ AGE $, $ NEIN $, $ ASSET $ 변수 간 선형관계에 의존한다. 선형관계가 커질수록 $ SE(\hat \beta_2) $ 역시 커지게 되고 결과적으로 $ \hat \beta_2 $는 유의하지 않게 된다.

$$ \text {1/(variance inflation factor) } = 1/VIF = 1-R^2_j $$ $$ \text { where } R^2_j = ?? $$

estat vif 명령어를 이용하여 다중공선성 여부를 확인할 수 있다. VIF 값이 5 또는 10보다 크다면 다중공선성을 의심하게 된다.

. use R_data10_3, clear

. reg HRS AGE NEIN ASSET

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(3, 35)        =     25.83
       Model |   107317.64         3  35772.5467   Prob > F        =    0.0000
    Residual |  48465.5908        35  1384.73117   R-squared       =    0.6889
-------------+----------------------------------   Adj R-squared   =    0.6622
       Total |  155783.231        38   4099.5587   Root MSE        =    37.212

------------------------------------------------------------------------------
         HRS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         AGE |  -8.007181    1.88844    -4.24   0.000    -11.84092   -4.173443
        NEIN |   .3338277    .337171     0.99   0.329    -.3506658    1.018321
       ASSET |   .0044232    .015516     0.29   0.777     -.027076    .0359223
       _cons |   2314.054   63.22636    36.60   0.000     2185.698    2442.411
------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
        NEIN |     60.84    0.016436
       ASSET |     56.07    0.017836
         AGE |      1.74    0.573178
-------------+----------------------
    Mean VIF |     39.55

. reg HRS AGE ASSET

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(2, 36)        =     38.28
       Model |  105960.234         2  52980.1169   Prob > F        =    0.0000
    Residual |  49822.9969        36  1383.97214   R-squared       =    0.6802
-------------+----------------------------------   Adj R-squared   =    0.6624
       Total |  155783.231        38   4099.5587   Root MSE        =    37.202

------------------------------------------------------------------------------
         HRS |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         AGE |  -6.952868   1.559142    -4.46   0.000    -10.11495   -3.790782
       ASSET |   .0196214   .0022597     8.68   0.000     .0150384    .0242044
       _cons |   2288.056   57.49982    39.79   0.000     2171.441    2404.671
------------------------------------------------------------------------------

. estat vif

    Variable |       VIF       1/VIF  
-------------+----------------------
         AGE |      1.19    0.840402
       ASSET |      1.19    0.840402
-------------+----------------------
    Mean VIF |      1.19

. reg NEIN AGE ASSET

      Source |       SS           df       MS      Number of obs   =        39
-------------+----------------------------------   F(2, 36)        =   1077.14
       Model |  728896.478         2  364448.239   Prob > F        =    0.0000
    Residual |  12180.4963        36  338.347121   R-squared       =    0.9836
-------------+----------------------------------   Adj R-squared   =    0.9827
       Total |  741076.974        38  19502.0256   Root MSE        =    18.394

------------------------------------------------------------------------------
        NEIN |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         AGE |   3.158253    .770909     4.10   0.000     1.594777    4.721729
       ASSET |   .0455272   .0011173    40.75   0.000     .0432612    .0477933
       _cons |  -77.88006   28.43047    -2.74   0.010    -135.5397   -20.22039
------------------------------------------------------------------------------

. di "1/VIF =", 1-e(r2)
1/VIF = .01643621

5. 더미변수와 상호작용항을 포함한 Regression

더미변수 생성

더미변수(dummy variable)는 질적특성을 나타내는 설명변수로 회귀모형에서 사용된다. indicator variable 이라고도 부른다. xi 또는 tab 명령어를 이용하여 더미변수를 생성할 수 있다.

. use R_data9_1, clear
(National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

. tab race, gen(race_dum)

   1=white, |
   2=black, |
    3=other |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |      1,657       72.93       72.93
          2 |        589       25.92       98.86
          3 |         26        1.14      100.00
------------+-----------------------------------
      Total |      2,272      100.00

. xi i.race , prefix(dum) 
i.race            dumrace_1-3         (naturally coded; dumrace_1 omitted)

Q) xi 명령문에서 모든 범주에 대한 더미변수를 만들고자 하는 경우 옵션은?

더미변수를 포함한 선형회귀모형

더미변수는 상수항을 shif 하는 역할을 한다. Race 변수의 3개 범주 중에서 2개 더미변수만 모형에 포함한다.

$$ y_i=\beta_0+\beta_1 Black_i+\beta_2 Other_i+\gamma x_i+e_i $$ $$ E(y_{Black})=\beta_0+\beta_1+\gamma x_i $$ $$ E(y_{Other})=\beta_0+\beta_2+\gamma x_i $$ $$ E(y_{White})=\beta_0+\gamma x_i $$

i. operator를 이용하여 범주형 변수임을 표현한다. 1번 범주를 자동으로 drop하고 나머지 범주에 대해서 더미변수로 만든다.

. reg ln_wage i.race ttl_exp 

      Source |       SS           df       MS      Number of obs   =     2,272
-------------+----------------------------------   F(3, 2268)      =    132.44
       Model |  120.253715         3  40.0845716   Prob > F        =    0.0000
    Residual |  686.454941     2,268  .302669727   R-squared       =    0.1491
-------------+----------------------------------   Adj R-squared   =    0.1479
       Total |  806.708656     2,271  .355221777   Root MSE        =    .55015

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
          2  |  -.1827163   .0263983    -6.92   0.000    -.2344836   -.1309489
          3  |   .0405064   .1087378     0.37   0.710    -.1727296    .2537425
             |
     ttl_exp |   .0471353   .0025044    18.82   0.000     .0422242    .0520463
       _cons |   1.336392   .0340166    39.29   0.000     1.269686    1.403099
------------------------------------------------------------------------------

. reg ln_wage b1.race ttl_exp 

      Source |       SS           df       MS      Number of obs   =     2,272
-------------+----------------------------------   F(3, 2268)      =    132.44
       Model |  120.253715         3  40.0845716   Prob > F        =    0.0000
    Residual |  686.454941     2,268  .302669727   R-squared       =    0.1491
-------------+----------------------------------   Adj R-squared   =    0.1479
       Total |  806.708656     2,271  .355221777   Root MSE        =    .55015

------------------------------------------------------------------------------
     ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        race |
          2  |  -.1827163   .0263983    -6.92   0.000    -.2344836   -.1309489
          3  |   .0405064   .1087378     0.37   0.710    -.1727296    .2537425
             |
     ttl_exp |   .0471353   .0025044    18.82   0.000     .0422242    .0520463
       _cons |   1.336392   .0340166    39.29   0.000     1.269686    1.403099
------------------------------------------------------------------------------

Q) gender 변수처럼 이미 0과 1로 구성된 더미변수인 경우도 i. operator 를 사용해야 하는가?

Q) race 변수가 통계적으로 유의한지 가설검정하기 위해 필요한 명령문은?

margins와 marginsplot 명령어를 이용하면 prediction graph를 그릴 수 있다.

. reg ln_wage i.race ttl_exp 

. margins race, at(ttl_exp=(0(1)25)) noatlegend

. marginsplot , noci recast(line)

상호작용항(조절효과)를 포함한 선형회귀뫼형

Q) 조절효과를 path diagram 으로 설명하세요.

연속형 변수와 범주형 변수의 상호작용을 포함하는 모형이 가장 일반적이다. 각 범주에 따라 $ \frac {\triangle y}{\triangle x} $가 서로 다르다고 가정한다.

$$ y_i=\beta_0 + \beta_1 D_{2i}+ \beta_2 D_{3i}+\gamma x_i+\beta_3 D_{2i}x_i+\beta_4 D_{3i}x_i+e_i $$ $$ E(y_{Black})=\beta_0+\beta_1+(\gamma+\beta_3) x_i $$ $$ E(y_{Other})=\beta_0+\beta_2+(\gamma+\beta_4) x_i $$ $$ E(y_{White})=\beta_0+\gamma x_i $$

. reg ln_wage i.race ttl_exp i.race#c.ttl_exp 

      Source |       SS           df       MS      Number of obs   =     2,272
-------------+----------------------------------   F(5, 2266)      =     79.43
       Model |  120.297066         5  24.0594131   Prob > F        =    0.0000
    Residual |   686.41159     2,266  .302917736   R-squared       =    0.1491
-------------+----------------------------------   Adj R-squared   =    0.1472
       Total |  806.708656     2,271  .355221777   Root MSE        =    .55038

--------------------------------------------------------------------------------
       ln_wage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
          race |
            2  |  -.1550527   .0778385    -1.99   0.046    -.3076949   -.0024106
            3  |   .0425614   .2724185     0.16   0.876    -.4916545    .5767773
               |
       ttl_exp |   .0476876   .0029271    16.29   0.000     .0419474    .0534277
               |
race#c.ttl_exp |
            2  |  -.0021888   .0057937    -0.38   0.706    -.0135503    .0091728
            3  |   -.000169   .0198287    -0.01   0.993    -.0390533    .0387153
               |
         _cons |   1.329508    .038911    34.17   0.000     1.253203    1.405812
--------------------------------------------------------------------------------

. reg ln_wage i.race##c.ttl_exp

. reg ln_wage b1.race##c.ttl_exp

Q) 종속변수가 로그 변환된 변수임을 감안하여 $ \frac {\triangle wage}{\triangle ttlexp} \mid_{White} $ 을 계산하세요.

앞서와 마찬가지로 margins 와 marginsplot 명령어를 이용하면 범주별 $ \color{red}{\text {서로 다른 기울기를 가진 prediction graph}} $ 를 그릴 수 있다.

. use R_data9_3,clear

. gen ln_wage=log(wage)

. reg ln_wage i.union ttl_exp i.union#c.ttl_exp

. margins union, at(ttl_exp=(0(1)25)) atmeans noatlegend 

. marginsplot, noci recast(line)

Q) 위 그래프에서 nonunion prediction graph를 점선으로 나타내고자 하는 경우는?

6. Polynomial Regression

다항 회귀모형은 여전히 선형회귀모형에 속하지만 종속변수 y 와 설명변수 x 변수 간 관계가 선형관계가 아니고 비선형관계이다. curvilinear relationship이라고도 부른다.

Q) 선형회귀모형 비선형회귀모형의 차이점은 ?

$$ \text {Quadratic model: } y_i=\beta_0 + \beta_1 x_i +\beta_2 x^2_i +e_i $$ $$ \text {Cubic model: } y_i=\beta_0 + \beta_1 x_i +\beta_2 x^2_i +\beta_3 x^3_u+e_i $$

$$ \text {marginal effect: } \frac {\partial y}{\partial x} = ?? $$

. use R_data15_1, clear
(NLSW, 1988 extract)

. gen lwage=ln(wage)

. reg lwage tenure c.tenure#c.tenure 

      Source |       SS           df       MS      Number of obs   =     2,231
-------------+----------------------------------   F(2, 2228)      =    121.85
       Model |   72.193832         2   36.096916   Prob > F        =    0.0000
    Residual |  660.023236     2,228  .296240232   R-squared       =    0.0986
-------------+----------------------------------   Adj R-squared   =    0.0978
       Total |  732.217068     2,230  .328348461   Root MSE        =    .54428

-----------------------------------------------------------------------------------
            lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
------------------+----------------------------------------------------------------
           tenure |   .0610446   .0068122     8.96   0.000     .0476856    .0744035
                  |
c.tenure#c.tenure |  -.0016852   .0003661    -4.60   0.000    -.0024031   -.0009673
                  |
            _cons |   1.619121   .0223875    72.32   0.000     1.575218    1.663023
-----------------------------------------------------------------------------------

. reg lwage tenure c.tenure#c.tenure c.tenure#c.tenure#c.tenure  

      Source |       SS           df       MS      Number of obs   =     2,231
-------------+----------------------------------   F(3, 2227)      =     84.50
       Model |  74.8319576         3  24.9439859   Prob > F        =    0.0000
    Residual |   657.38511     2,227  .295188644   R-squared       =    0.1022
-------------+----------------------------------   Adj R-squared   =    0.1010
       Total |  732.217068     2,230  .328348461   Root MSE        =    .54331

--------------------------------------------------------------------------------------------
                     lwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
                    tenure |   .0999755   .0146911     6.81   0.000     .0711658    .1287851
                           |
         c.tenure#c.tenure |  -.0069834   .0018096    -3.86   0.000    -.0105321   -.0034348
                           |
c.tenure#c.tenure#c.tenure |   .0001791   .0000599     2.99   0.003     .0000616    .0002966
                           |
                     _cons |   1.569498    .027838    56.38   0.000     1.514907    1.624089
--------------------------------------------------------------------------------------------

Q) curvilinear relatioinship에 대한 가설검정을 위한 명령어는?

Q) Quadratic model에서 임금이 최대가 되는 tenure 값은 얼마인가?

predictionn graph

다른 x 변수는 평균에서 고정시킨 상태에서 Prediction graph를 그릴 수 있다.

. reg lwage tenure c.tenure#c.tenure   

. margins, at(tenure=(0(1)25)) atmeans noatlegend 

. marginsplot , legend(off) noci recast(line) ///
> addplot((function y=_b[_cons]+_b[tenure]*x+_b[c.tenure#c.tenure]*x^2, recast(area) range(18.05 18.1)))