You are required to search and choose a suitable public dataset. You will clean and format the data, create visualisation charts, and report on key information contained in the data. You will also complete and submit a report describing the process that you followed, the analysis you completed, and your findings.
Please refer to the instructions section below for details on how to complete this task.
Data exploration, also known as “exploratory data analysis”, is the first step in data analysis where a set of simple tools are used to achieve a basic understanding of data, data types, formats, and structure of a dataset. The results of data exploration can be extremely useful in grasping the structure of the data, the distribution of the values, the presence of outliers, and the
interrelationships within the dataset. Simple descriptive statistics are useful in exploring numerical data, especially to know their averages, frequencies, and variabilities.
In Modules 1 and 2, you were introduced to data, types of data and data formats. You learnt about the need for exploring datasets before starting to use them for any analysis purposes and about methods and approaches to explore and handle data. Assessment task 1 will allow you to demonstrate your understanding of data analysis and data exploration on a public data set. You will also be given the opportunity to apply these skills to a real data set.
To successfully complete assessment task 1, you will need to review resources and content from Module 1.1 (Week 1) to 2.1 (Week 3) to recall the aspects and skills discussed under the topics of data exploration, data charts and data handling.
To complete this assessment task, you must follow the steps below:
Step 1: Find and select a public dataset
You will use a public dataset for this assessment task. Each team will choose a different dataset. The learning facilitator will assist you in finding suitable databases and websites. • Download your chosen dataset in an excel format.
- Save the dataset as an Excel file onto your computer for further analysis.
- Make sure to save the link for the dataset for your reference. You will need to provide this link in section 1 of your report (see step 4).
Step 2: Explore and review the dataset
Open your chosen dataset with Excel and manually investigate the dataset as follows: • Describe the data and its context.
- List available data types and key attributes.
- Compute descriptive statistics on numerical data columns.
- Check for consistency, errors, and missing values, and confirm the validity of the data.
If you identify any issues, you will need to document them in section 1 of your report (see step 4). You are also required to state why you think they are issues and how they can be fixed.
Step 3: Create and interpret visualisation charts
From your review in Step 2, formulate 4-5 investigative questions that you can answer using your chosen dataset. For example, if you are exploring a sales dataset, your questions may be:
– Which product made the most profit in a particular month? or
– What is the best distribution fit for profit variable?
Using Excel, you will create visualisation charts that will help you answer each of the questions you prepared. Include titles, axis, and legends with detail on each visualisation chart. Interpret the charts or graphs to answer each of your questions and document all your findings and interpretations in section 2 of your report (see step 4).
Step 4: Document your findings and write a report (1000 words)
You must include the following sections with the correct headings and relevant content in your report:
Section 1: Selected Dataset
- Provide a link to data.
- Explain why you selected this data.
- Explain what the issues are in the selected data and how they can be fixed.
- Document all findings from Step 2.
Section 2: Analysis Plan
- Document your investigative questions from Step 3 and the reasons you think these questions are important.
- Copy your relevant Excel visualisation charts for each of your questions into this section. Beneath each chart, ensure that you:
– include objectives for each graph and visualisation charts, and
– briefly explain what information and/or knowledge you obtained
from the charts.
- Explain your interpretations using appropriate terminology and clear language.
- Ensure that you document all work done in Steps 2 and 3.
- Section 3: Findings and Limitations
- Summarise your findings and list their limitations.
- Provide recommendations for further analysis if required.
Section 4: Reference List
- Ensure secondary research is referenced using APA referencing guidelines.
Use Microsoft Word for the layout of your report. Ensure your report is 1000 words (+/-10%). The dataset and created charts will be submitted as a separate Excel document.
You will need to structure the content and visualisation charts so that the information is clear and logical.