GROUP and SPARKPLUS
- This is a group assignment and groups will be formed randomly. Each group should consist of a maximum of three students, with only the group leader responsible for submitting the required files through the Moodle assignment submission link. It’s important to note that penalties will be applied if multiple students submit highly similar files, as this can be considered plagiarism.
- SPARKPLUS is a tool for Self & Peer Assessment and Feedback for group assignments. It’s a mandatory task for each student and the SPARKPLUS score will contribute towards determining the final assignment mark. This score is calculated by the SPARKPLUS website as soon as all group members complete the review task. To learn how to use SPARKPLUS, please refer to the Student’s SPARKPLUS Guideline in the assignment folder. It’s important to note that failure to submit the SPARKPLUS review will result in a penalty mark deduction.
The objective of this assignment is to give students hands-on experience in applying mathematical and statistical principles, including sampling methods, descriptive analysis, hypothesis testing, and predictive analytics, to analyse a real-world dataset. The dataset has been specifically chosen to help stakeholders make informed decisions. Students will determine whether and how to perform sampling, identify the research questions that can be answered using the data, and formulate hypotheses that can be tested to draw conclusions. In addition, students will perform predictive data analysis to forecast future trends based on the data.
To assist you in your analysis, a dataset containing information on temperature, humidity, rainfall, wind speed, wind direction, and wind force for weather station sites in Swan Hill Rural City has been provided. You can access this dataset in the assignment folder or download it directly from the following link: https://data.gov.au/data/dataset/swan-hill-rural-city-council-smart-cities-weather-stations. Your team has been assigned the task of analysing this dataset and presenting your findings through a written report and a presentation. As part of this assignment, you will need to apply mathematical and statistical principles, such as sampling methods, descriptive analysis, hypothesis testing, and predictive analysis, to explore the dataset.
The objective of your analysis is to generate insights that can aid researchers in making informed decisions based on the provided weather data. As data analysts, your team will need to determine whether and how to apply sampling techniques to the dataset. Additionally, you will need to identify specific research questions that can be answered by analysing the data and formulating hypotheses related to variations in weather data, including factors such as temperature, humidity, rainfall, and more. Furthermore, you may explore the potential for conducting predictive analyses to forecast weather patterns for upcoming months and years. By undertaking these analytical tasks, your team will contribute valuable insights to support decision-making processes and enhance understanding of the weather conditions in Swan Hill Rural City. Note that you are free to use any software tool that you are comfortable with, including Python (Jupyter Notebook) and Excel, if appropriate. You will also need to conduct additional research and include at least five resources from a combination of journals, conference papers, websites, or other reliable sources to support your analysis. It is important to ensure that all sources are published or updated within the last five years (2018-2023) to ensure they are relevant and up to date.
Task 1 Data Gathering/Sampling Method [15 marks]
To begin, carefully examine the provided dataset and identify two consecutive months with complete dates. Clean up any unused data from the dataset. Choose at least three variables, including temperature data. Once your selections are made, determine the appropriate sampling method that aligns with the chosen data points. Proceed to gather the necessary data based on your selected sampling approach. In the subsequent sections of your report, your group will need to construct one or more tables that effectively present the collected data. These tables should accurately organize and represent the selected data points. Additionally, it is important to describe the process used to obtain the data, including the steps taken and any relevant sources utilised.
Task 2 Descriptive Analysis [15 marks]
Once you have gathered the relevant data, of your it is essential to employ descriptive analysis techniques to effectively summarise and examine the dataset’s characteristics. This involves calculating key measures of central tendency, such as means and medians, as well as measures of dispersion, including standard deviations. By doing so, you can gain a deeper understanding of the data and identify opportunities for further analysis. In addition to numerical summaries, it is beneficial to create graphical representations to visually depict the data. Histograms, box plots, and line graphs are examples of visual tools that can effectively highlight patterns, trends, and outliers within the dataset. These graphical representations provide valuable insights and facilitate a comprehensive understanding of the weather characteristics specific to the council.
For this second task, students will utilise descriptive analysis techniques to extract meaningful insights from the data collected in Task 1. These insights will play a vital role in informing subsequent statistical analysis and decision-making processes, allowing for a well-rounded understanding of the council’s weather patterns and characteristics.
Task 3 Hypothesis Analysis [15 marks]
The aim of this task is to determine whether there is a significant difference in the average values of a chosen variable between consecutive months. Select one variable that you are most interested in, such as temperature. If temperature is chosen, the task is to determine whether there is a significant difference in the average temperature between two consecutive months and identify which month has a higher temperature.
To achieve this objective, students must clearly state their hypothesis and perform hypothesis testing using appropriate statistical methods. You will need to choose the suitable test, select the level of significance, compute the test statistic, and interpret the results. Upon completion of the hypothesis testing, you should provide a clear answer to the question based on the results obtained. Whether you have found a significant difference or not, it is essential to explain your answer and provide a detailed analysis of the findings. Your submission should demonstrate a clear understanding of the hypothesis testing process, including the steps involved, the statistical methods used, and the interpretation of the results. It is also important to provide a thorough explanation of the findings and the implications of the results.
Task 4 Linear Regression and Statistical Prediction [15 marks]
First, utilise Linear Regression to determine whether there is a correlation between temperature and other variables, considering data from the first month only. Develop a mathematical model for the temperature using either temperature data alone or in combination with other variables, depending on your preference. Next, use the data from the second month to assess the accuracy of your best-fit model. To complete this task, you are free to use any software of your choice to identify the best-fit pattern or relationship in the data that can be used to make accurate temperature projections. Provide a detailed step-by-step analysis, including the results and tests employed to arrive at your final model. Your submission should demonstrate a clear understanding of the data and the forecasting process, incorporating any assumptions made, limitations, and the rationale behind the chosen method.
Task 5 Presentation [10 marks]
To fulfill the assignment requirements, your team is tasked with creating a concise, maximum seven-minute video presentation where all team members present the key points of the data analysis processes and results. The recorded video should be uploaded to YouTube as an unlisted file, ensuring that only viewers with the link can access it. Remember to include the link to the YouTube video at the end of your written report.
The group leader needs to submit three files in the Assignment 2 Submission Link:
- Final report named YourGroupName.docx
The URL YouTube video link should be provided at the end of this written report.
- Presentation PowerPoint file named YourGroupName.pptx
- A Python Jupyter Notebook file or an Excel Workbook file, named YourGroupName.ipynb or YourGroupName.xlsx
Each individual student must submit the SPARKPLUS for Self & Peer Assessment and Feedback.
During the laboratory and tutorial session in Week 11, your tutor will ask questions related to your assignment report. You are expected to demonstrate your dataset processing, software, and calculation skills. It’s essential that all group members attend and understand the entire assignment, as the viva questions will be marked individually. If there are significant knowledge disparities within the group, individual marks will be awarded based on each student’s ability to answer the questions.