Expert BDA601 Assignment Help

Big Data and Analytics (BDA601) Assignment Help

https://gradespire.com/

ASSESSMENT 2 BRIEF
Subject Code and Title	BDA601—Big Data and Analytics
Assessment	Visualisation and Model Development
Individual/Group	Individual
Length	Source Code and Report 1,000 words (+/—10%)
Learning Outcomes	The Subject Learning Outcomes demonstrated by the successful completion of the task below include: c) Apply data science principles to the cleaning, manipulation, and visualisation of data d) Design analytical models based on a given problems; and e) Effectively report and communicate findings to an appropriate audience.
Submission	Due by 11.55 pm AEST on the Sunday at the end of Module 8.
Weighting	30%
Total Marks	100 marks

Task Summary

Customer churn, also known as customer attrition, refers to the movement of customers from one service provider to another. It is well known that attracting new customers costs significantly more than retaining existing customers. Additionally, long-term customers are found to be less costly to serve and less sensitive to competitors’ marketing activities. Thus, predicting customer churn is valuable to telecommunication industries, utility service providers, paid television channels, insurance companies and other business organisations providing subscription-based services. Customer-churn prediction allows for targeted retention planning.

In this Assessment, you will build a machine learning (ML) model to predict customer churn using the principles of ML and big data tools.

As part of this Assessment, you will write a 1,000-word report that will include the following:

a) A predictive model from a given dataset that follows data mining principles and techniques; b) Explanations as to how to handle missing values in a dataset; and

c) An interpretation of the outcomes of the customer churn analysis.

Please refer to the Task Instructions (below) for details on how to complete this task.

Task Instructions

1. Dataset Construction

Kaggle telco churn dataset is a sample dataset from IBM, containing 21 attributes of approximately 7,043 telecommunication customers. In this Assessment, you are required to work with a modified version of this dataset (the dataset can be found at the URL provided below). Modify the dataset by removing the following attributes: MonthlyCharges, OnlineSecurity, StreamingTV, InternetService and Partner.

As the dataset is in .csv format, any spreadsheet application, such as Microsoft Excel or Open Office Calc, can be used to modify it. You will use your resulting dataset, which should comprise 7,043 observations and 16 attributes, to complete the subsequent tasks. The ‘Churn’ attribute (i.e., the last attribute in the dataset) is the target of your churn analysis.

2. Model Development

From the dataset constructed in the previous step, present appropriate data visualisation and descriptive statistics, then develop a ‘decision-tree’ model to predict customer churn. The model can be developed in Jupyter Notebook using Python and Spark’s Machine Learning Library (Pyspark MLlib). You can use any other platform if you find it more efficient. The notebook should include the following sections:

a) Problem Statement

In this section, briefly state the context and the problem you will solve in the notebook.

b) Exploratory Data Analysis

In this section, perform both a visual and statistical exploratory analysis to gain insights about the dataset.

c) Data Cleaning and Feature Selection

In this section, perform data pre-processing and feature selection for the model, which you will build in the next section.

d) Model Building

In this section, use the pre-processed data and the selected features to build a ‘decision-tree’ model to predict customer churn.

In the notebook, the code should be well documented, the graphs and charts should be neatly labelled, the narrative text should clearly state the objectives and a logical justification for each of the steps should be provided.

3. Handling Missing Values

The given dataset has very few missing values; however, in a real-world scenario, data scientists often need to work with datasets with many missing values. If an attribute is important to build an effective model and have significant missing values, then the data scientists need to come up with strategies to handle any missing values.

From the ‘decision-tree’ model, built in the previous step, identify the most important attribute. If a significant number of values were missing in the most important attribute column, implement a method to replace the missing values and describe that method in your report.

4. Interpretation of Churn Analysis

Modelling churn is difficult because there is inherent uncertainty when measuring churn. Thus, it is important not only to understand any limitations associated with a churn analysis but also to be able to interpret the outcomes of a churn analysis.

In your report, interpret and describe the key findings that you were able to discover as part of your churn analysis. Describe the following facts with supporting details:

• The effectiveness of your churn analysis: What was the percentage of time at which your analysis was able to correctly identify the churn? Can this be considered a satisfactory outcome? Explain why or why not;

• Who is churning: Describe the attributes of the customers who are churning and explain what is driving the churn; and

• Improving the accuracy of your churn analysis: Describe the effects that your previous steps, model development and handling of missing values had on the outcome of your churn analysis and how the accuracy of your churn analysis could be improved.

Submission Instructions

• Zip the following files and submit the .zip files via the Assessment link in the main navigation menu in BDA601—Big Data and Analytics:

o Modified dataset (.csv file) constructed in Task 1;

o Notebook (.ipynb file) from Task 2; and

o Report (.pdf file) from Task 3.

The Learning Facilitator will provide feedback via the Grade Centre in the LMS portal. Feedback can be viewed in My Grades.

Academic Integrity Declaration

I declare that except where referenced, the work I am submitting for this assessment task is my own work. I have read and am aware of the Academic Integrity Policy and Procedure of Torrens University, Australia, viewable online at

I am also aware that I need to keep a copy of all submitted material and any drafts and I agree to do so.

Assessment Rubric

Assessment

Attributes

Fail

(Yet to Achieve Minimum Standard)

0–49%

Pass

(Functional)

50–64%

Credit

(Proficient)

65–74%

Distinction

(Advanced)

75–84%

High Distinction

(Exceptional)

85–100%

Knowledge and

understanding of

exploratory data

analysis

15%

Demonstrates partial or unsatisfactory knowledge and understanding of the exploratory data analysis.

Demonstrates unsatisfactory skills in:

• Exploring the data using both the measure of

central tendency and

the measure of

dispersions; and/or

• Exploring the data using various visual

representations, such as a histogram, scatter

plot, box plot, heatmap, pair plot or probability

distribution plot.

Demonstrates functional knowledge and

understanding of the

exploratory data analysis.

Demonstrates satisfactory skills in:

• Exploring the data using both the measure of

central tendency and the measure of dispersions;

and

• Exploring the data using various visual

representations, such as a histogram, scatter plot, box plot, heatmap, pair

plot or probability

distribution plot.

Demonstrates solid

knowledge and

understanding of the

exploratory data analysis.

Demonstrates solid skills in: • Exploring the data

using both the measure of central tendency

and the measure of

dispersions; and

• Exploring the data

using various visual

representations, such

as a histogram, scatter

plot, box plot,

heatmap, pair plot or

probability distribution plot.

• Only selective statistics were produced from

the above-mentioned

visuals.

Demonstrates advanced knowledge and

understanding of the

exploratory data analysis.

Demonstrates advanced skills in:

• Exploring the data

using both the measure of central tendency and the measure of

dispersions; and

• Exploring the data

using various visual

representations, such

as a histogram, scatter

plot, box plot,

heatmap, pair plot or

probability distribution plot.

• Appropriate statistics were produced from

the above-mentioned

visuals.

Demonstrates exceptional knowledge and

understanding of the

exploratory data analysis.

Demonstrates exemplary skills in:

• Exploring the data using both the

measure of central

tendency and the

measure of

dispersions; and

• Exploring the data using various visual

representations, such

as a histogram, scatter plot, box plot,

heatmap, pair plot or

probability distribution plot.

• Appropriate statistics were produced from

the above-mentioned

visuals.

• Gained unique insights about the dataset

through the statistical

observations.

Analytical design for data pre-processing and feature selection

15%

Demonstrates partial or unsatisfactory knowledge and understanding of data pre-processing and feature selection.

Completed less than 50% of the following tasks and the tasks completed were

unsatisfactory in terms of quality, accuracy and

completeness:

• Handling data

anomalies;

• Conducting the

redundancy and

correlation analysis;

and/or

• Selecting the feature for model building.

Demonstrates satisfactory knowledge and

understanding of data pre processing and feature

selection.

Completed most of the

following tasks with accuracy and completeness to a

satisfactory quality:

• Handling data anomalies; • Conducting the

redundancy and

correlation analysis;

and/or

• Selecting the feature for model building.

Demonstrates solid

knowledge and

understanding of data pre processing and feature selection.

Completed most of the following tasks with

accuracy and completeness to a good quality:

• Handling data

anomalies;

• Conducting the

redundancy and

correlation analysis;

• Selecting the feature for model building.

and

• Correctly interpreted 2 of the above tasks.

Demonstrates advanced knowledge and

understanding of data pre processing and feature selection.

Completed all of the

following tasks with

accuracy and completeness to a high quality:

• Handling data

anomalies.

• Conducting the

redundancy and

correlation analysis;

• Selecting the feature for model building.

and

• Correctly interpreted all 3 of the above tasks.

Demonstrates exceptional knowledge and

understanding of data pre processing and feature selection.

Completed all of the

following tasks with

accuracy and completeness to an exceptionally high quality:

• Handling data

anomalies;

• Conducting the

redundancy and

correlation analysis;

• Selecting the feature for model building.

• Correctly interpreted all 3 of the above tasks. and

• Relevant analytical insights were

presented as part of

the interpretation.

Predictive model

building

20%

Demonstrates partial or unsatisfactory knowledge and understanding of

predictive model building.

Completed less than 50% of the following tasks and the tasks completed were

Demonstrates satisfactory knowledge and

understanding of predictive model building.

Completed most of the

following tasks with accuracy

Demonstrates solid

knowledge and

understanding of predictive model building.

Completed most of the following tasks with

Demonstrates advanced knowledge and

understanding of predictive model building.

Completed all of the

following tasks with

Demonstrates exceptional knowledge and

understanding of predictive model building.

Completed all of the

following tasks with

accuracy and completeness

unsatisfactory in terms of quality, accuracy and

completeness:

• Appropriately used the data for training,

validation and testing;

• Built a ‘decision-tree’ model using Spark’s

MLlib library;

• Graphically represented the decision-tree model; and/or

• Correctly interpreted the decision-tree model.

and completeness to a

satisfactory quality:

• Appropriately used the data for training,

validation and testing;

• Built a ‘decision-tree’ model using Spark’s

MLlib library;

• Graphically represented the decision-tree model;

accuracy and completeness to a good quality:

• Appropriately used the data for training,

validation and testing;

• Built a ‘decision-tree’ model using Spark’s

MLlib library;

• Graphically

represented the

decision-tree model;

and/or

• Produced an

ambiguous

interpretation of the

decision-tree model.

accuracy and completeness to a high quality:

• Appropriately used the data for training,

validation and testing;

• Built a ‘decision-tree’ model using Spark’s

MLlib library;

• Graphically

represented the

decision-tree model;

and

• Correctly interpreted the decision-tree

model.

to an exceptionally high quality:

• Appropriately used the data for training,

validation and testing;

• Built a ‘decision-tree’ model using Spark’s

MLlib library;

• Graphically

represented the

decision-tree model;

and

• Correctly interpreted the decision-tree

model.

• Discovered unique observations through

the interpretation of

the model.

Clarity and

presentation of the notebook

10%

• Lacks overall

organisation.

• Codes are documented unsatisfactorily.

• Charts and graphs are of unsatisfactory quality.

• Narrative texts difficult to follow.

• Not well organised for the most part.

• Codes are documented satisfactorily.

• Charts and graphs are of satisfactory quality.

• Narrative texts are not cohesive but can still be

followed.

• Organised for the most part.

• Code is very well

documented.

• Charts and graphs are neat and are of good

quality.

• Narrative texts are mostly cohesive.

• Well organised

• Code is very well

documented.

• Charts and graphs are neat and of high

quality.

• Narrative texts are highly cohesive and

easy to follow.

• Exceptionally

organised.

• Code is exceptionally well documented.

• Charts and graphs are neat and of

exceptional quality.

• Narrative texts are highly cohesive and

easy to follow.

Knowledge and

understanding of

missing value

handling strategy

Demonstrates partial or unsatisfactory knowledge and understanding of a

Demonstrates satisfactory knowledge and

understanding of a missing value handling strategy.

Demonstrates solid

knowledge and

understanding of a missing value handling strategy.

Demonstrates advanced knowledge and

understanding of a missing value handling strategy.

Demonstrates exceptional knowledge and

understanding of a missing value handling strategy.

10%

missing value handling

strategy.

• Does not correctly

identify the most

important attribute

from the decision tree.

• The formulated

strategies are

unsatisfactory in terms

of accuracy and

completeness.

• The overall organisation and presentation of the report is unsatisfactory.

• Correctly identifies the most important attribute from the decision tree.

• The formulated

strategies are

satisfactorily accurate

and complete.

• The overall organisation and presentation of the

report is satisfactory.

• Correctly identifies the most important

attribute from the

decision tree.

• The formulated

strategies are mostly

accurate and complete. • The overall

organisation and

presentation of the

report is good.

• Correctly identifies the most important

attribute from the

decision tree.

• The formulated

strategies are accurate and mostly complete.

• The overall

organisation and

presentation of the

report is exceptionally

good.

• Correctly identifies the most important

attribute from the

decision tree.

• The formulated

strategies are accurate and complete.

• The overall

organisation and

presentation of the

report is exemplary.

Interpretation of

data analysis

30%

The outcomes and

discussions were not focused and missed all the following points:

• A very limited number of outcomes were

produced and the

related discussions were poor;

• The analysis produced hardly any insights; and

• Any possible

performance

improvements were

entirely missed.

The outcomes and

discussions were limited in focus and missed at least two of the following points:

• The outcomes were measured, but the

related discussions were only basic;

• The analysis produced some basic insights;

and/or

• Possible performance improvements were

evaluated at a basic

level.

The outcomes and

discussions were focused and missed at least one of the following points:

• The outcomes were measured and the

related discussions

were solid;

• The analysis produced some solid insights;

and/or

• Possible performance improvements were

evaluated.

The outcomes and

discussions were mostly focused and included all of the following points:

• The outcomes were measured and the

related discussions

were advanced;

• The analysis produced advanced insights; and

• Possible performance improvements were

mostly evaluated.

The outcomes and

discussions were well

focused and included all of the following points:

• The outcomes were measured and the

related discussions

were exceptional;

• The analysis produced thought-provoking

insights; and

• Possible performance improvements were

fully and correctly

evaluated.

The following Subject Learning Outcomes are addressed in this assessment
SLO c)	Apply data science principles to the cleaning, manipulation and visualisation of data.
SLO d)	Design analytical models based on a given problem.
SLO e)	Effectively report and communicate findings to an appropriate audience.