Foundational Business Analytics - Coursework 2024-2025
编辑Foundational Business Analytics - Coursework 2024-2025
Key Information
- Release Date: 14th October 2024 (dataset available on Moodle)
- Deadline: 5th December 2024, 3:00 pm
- Submission: Via Moodle coursework submission link on the FBA module web page
1. Problem Definition
A financial institution has been over-issuing loans to unqualified applicants to gain market share. This strategy has increased the number of loan defaults, causing significant financial losses. Traditionally, the company reacts only after a loan defaults, but this passive approach is unsustainable.
The institution plans to launch a proactive loan risk management programme to predict loans likely to default and intervene early to avoid losses. The historical dataset contains demographic, credit history, and repayment details, including whether loans defaulted. Your task as a consultant is to analyze this data and build a model to predict loan defaults. Additionally, you must provide business recommendations based on your analysis.
2. Important Message from the CEO
The CEO’s key directive:
- Predict loans likely to default to enable early intervention and minimize losses.
- Focus on avoiding financial losses from defaults, even if this risks minor customer dissatisfaction during investigations.
3. The Available Dataset
You are provided with a unique dataset in CSV format containing 300,000 samples of loan data, accessible via Moodle. The dataset schema includes:
Type | Name | Description |
---|---|---|
ID | ID | Unique identifier for the loan listing. |
Input | Loan_Amount | Loan amount in dollars. |
Input | Term | Loan term (in years). |
Input | Interest_Rate | Loan interest rate. |
Input | Installment | Fixed regular payment for loan repayment. |
Input | Grade | Loan grade indicating risk (A to G; higher grades are better). |
Input | Sub_Grade | Detailed subcategory within the loan grade. |
Input | Employment_Duration | Borrower’s employment duration (in years). |
Input | Realestate_Ownership | Borrower’s real-estate ownership (categorical: rent, mortgage, etc.). |
Input | Annual_Income | Borrower’s annual income. |
Input | Purpose | Purpose of the loan (e.g., housing, vehicle, education, etc.). |
Input | DTI | Debt-to-income ratio. |
Input | FICO_Range_Low | Lower range of borrower’s FICO score. |
Input | FICO_Range_High | Higher range of borrower’s FICO score. |
Output | Y | Whether the loan defaulted (Y=0 not default, Y=1 default). |
4. Formal Task Specification
- Objective: Build a classification model to predict loan defaults.
- Process:
- Statistical analysis of input features.
- Model selection and training using Python 3 or Orange3.
- Evaluate implications and provide business recommendations.
- Submission Requirements:
- A maximum 8-page report (excluding the front page).
- A zip file containing your model implementation with instructions for use.
5. Report Sections
Section A: Summarization (10 marks)
- Perform statistical analysis of the dataset, examining relationships between features and the target variable (Y).
- Use visuals like tables, bar charts, or scatter graphs to communicate insights clearly.
Section B: Preparation and Exploration (15 marks)
- Describe your data cleaning and transformation processes, including handling missing values and outliers.
- Apply a decision tree to explore feature importance and sub-populations.
Section C: Model Evaluation (25 marks)
- Select and compare three classification models (from Logistic Regression, Decision Trees, Random Forests, Naive Bayes, and KNN).
- Detail the models, parameters, and evaluation strategy (e.g., confusion matrices, performance metrics).
- Provide a thorough comparison against a benchmark predictor.
Section D: Final Assessment (5 marks)
- Justify your selected "winning" classifier, considering its business implications.
Section E: Model Implementation (5 marks)
- Train the final model on the entire dataset.
- Provide clear instructions for using the model on new data.
Section F: Business Recommendations (10 marks)
- Summarize business recommendations based on your findings.
6. Further Available Marks
- Report Presentation: Clarity, professionalism, and argument quality (5 marks).
- Model Implementation: Code quality and usability for making predictions on new data (20 marks).
- Model Effectiveness: Performance on a hidden test dataset (5 marks).
7. Submission Guidelines
- Submit a zip file containing:
- The final report (maximum 8 pages).
- Model code/workflow files.
- Late submissions incur a 5% penalty per day.
- Only the first 8 pages of the report will be assessed.
8. Final Notes on Plagiarism
- Each student’s dataset is unique to ensure individual work.
- Submissions will be checked for originality; plagiarized work receives zero marks.
9. Additional Tips
- Focus on robust model evaluation and understanding over achieving perfect predictions.
- Presentation matters; format your report professionally.
- Using Python offers opportunities for advanced analysis but is not required to score high marks.
- Decision trees and visuals should be concise and informative.
联系我们
WeChat:pythonyt001
Email: [email protected]
- 1
-
分享