ECON41820 - Assignment


Econometrics Project (2024)

Submission Information

  • Due Date: 3 PM, Friday, December 6th
  • Submission Requirements:
    • A report (maximum 1500 words) with proper tables.
    • Include the coding file as an appendix.
    • Submit a hard copy to either the instructor (D203) or the School Office (G201). If in a group, one submission suffices.

Project Overview

Using the dataset earnings.xlsx, analyze the causal relationship between schooling and earnings. The key tasks include:

  1. Exploring potential endogeneity of regressors.
  2. Proposing instruments and assessing their validity.
  3. Obtaining reliable and unbiased estimates of the returns to education.

The analysis should draw on your understanding of Endogeneity and include a detailed exploration of the data.

Analysis Tasks

1. Summary Statistics

  • Provide summary statistics of the key variables used in the analysis.
  • Include commentary on the observed statistics.

2. Baseline OLS Regression

  • Estimate the baseline regression specification:
    \text{log(wage)} = \beta_0 + \beta_1 \text{educ} + \beta_2 \text{exper} + \beta_3 \text{expersq} + \beta_4 \text{black} + \beta_5 \text{south} + \beta_6 \text{smsa} + \varepsilon
  • Consider additional variables or transformations/interactions that may affect earnings.

3. Endogeneity Exploration

  • Identify Potential Endogeneity: Assess whether variables such as educ, exper, and expersq are endogenous.
  • Instrument Selection:
    • Identify possible instruments.
    • Justify the choice of instruments.
  • Endogeneity Testing:
    • Test for endogeneity of educ, exper, and expersq.
    • Test the validity of the chosen instruments.

4. Unbiased Estimates of Returns to Education

  • Provide unbiased estimates of the returns to education using the selected instruments.
  • Compare these estimates to those obtained from OLS regression.

Dataset Information

The dataset, earnings.xlsx, is derived from the Young Men Cohort of the National Longitudinal Survey (NLSYM):

  • Original survey conducted in 1966 on 5525 men aged 14-24.
  • Follow-up survey conducted in 1976 with 3010 respondents.
  • Key Notes:
    • This is not a random sample of the US population.
    • Ignore the sample weights for your analysis.
    • Regional dummy variables (south66, south) refer to residence in 1966 and 1976, respectively.
    • SMSA indicates residence in a Standard Metropolitan Statistical Area based on population density.
    • Knowledge of the World of Work: A score variable derived from responses about job activities, education requirements, and relative earnings. It correlates with education and wages and may serve as a proxy for ability.

Additional Resources

An article by David Card (1995) related to this dataset has been uploaded. You may consult it for ideas on selecting instruments, but this is not a replication exercise.

Grading Criteria

  • Quality of Write-Up: Clarity and professionalism of the report.
  • Results Presentation: Appropriateness and accuracy of tables and explanations.
  • Statistical Analysis: Validity and robustness of the econometric approach.


  1. Work individually or in pairs (no groups larger than 2).
  2. Sign the submission record when handing in the project.
  3. Ensure results and coding are distinct from other students' work.

Notes on Variables

  • Regional dummies refer to 1966 residence unless specified (e.g., south refers to 1976 residence).
  • The Knowledge of the World of Work variable reflects the respondent’s understanding of job-related activities, education requirements, and relative earnings in their occupation.


  • 1
