The essay geekCreate a box-plot for the Annual Sales variable. Does it look symmetric? Would you prefer the IQR instead of the standard d
standard deviation skew 5-number summary and interquartile range (IQR) for each of the variables.Create a box-plot for the Annual Sales variable. Does it look symmetric? Would you prefer the IQR instead of the standard deviation to describe this variables dispersion? Why?Create a histogram for the Sales/SqFt variable. Is the distribution symmetric? If not what is the skew? Are there any outliers? If so which one(s)? What is the SqFt area of the outlier(s)? Is the outlier(s) smaller or larger than the average restaurant in the database? What can you conclude from this observation?What measure of central tendency is more appropriate to describe Sales/SqFt? Why?Answer: (Below is the response/ answers to the above questions)STATISTICAL DATA ANALYSIS 1For the purpose of generating this statistical report we will be using the Pastas R Us Inc. database. Additionally for the creation of the descriptive analysis a spreadsheet was used. Therefore will be a separate attachment of the spreadsheet for clear explanations.Descriptive Statistics-Calculation of the standard deviation Mean interquartile range (IQR) skew and 5-number summary for each of the variables For the purpose of calculating the interquartile range the skew means and standard deviation there were applications to the SPSSS. The table posted below includes a screenshot of the excel output which shows all the variables calculated.The creation of a box-plot for the Annual Sales variable. Determining if it looks symmetric and deciding if the IQR method or Standard Deviation method would be best for describing the variables dispersion and of course why?Box-plot for the Annual SalesThe image posted above shows the annual sales variable box-plot. However there is a right-skewing of the data; hence it is not symmetrical. Since the data has no outliers the interquartile range would not be applicable. Therefore the standard deviations would provide essential information regarding the dispersion of the variables.From my personal experience and opinion the standard deviation would be the most applicable because on the box plot we have no outliers. The standard deviation ensures that all the variables within a dataset are considered and contribute to the total result.Create a histogram for the Sales/SqFt variable. Was the distribution symmetric? If not what is the skew for the dataset? Were there any outliers? If so which ones? What is the SqFt area of the outliers? Is the outlier(s) larger or smaller than the average restaurant in the database? What can you conclude from this observation?What measure of central tendency is more appropriate to describe Sales/SqFt? Why?The application of the mean when making predictions will not be stable since there are no significant outliers. Therefore for this scenario the application of median will have more weight. There is no symmetrical distribution on the histogram for the sale/SqFt variables.The media =396.02Mean: 430.31Because the mean is higher than the median there will be a right skewing of the sales/SqFt variable.The SqFt outlier areas will be 1092 (364 * 3)From the database we observe that the average is lower than the outliers.Since the median is rarely affected by the outliers it will be considered to be the most effective measure of central tendency. Despite mean is a measure of central tendency its normally affected by the outliers. Mishra et al. (2019).ReferenceMishra P. Pandey C. M. Singh U. Gupta A. Sahu C. & Keshri A. (2019). Descriptive statistics and normality tests for statistical data. Annals of cardiac anaesthesia 22(1) 67. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/Professors comments: (Please see professors comments below and fix the problem)Nice job on this assignment. Your box plot and histogram are perfect as are your values for annual sales and the descriptive statistics. Please read the following feedback so you can revise this assignment and resubmit it for full points.I did take off for the following items:(1) even though the box plot does not show any outliers keep in mind that the data is skewed (asymmetrical). Skewed data occurs with or without outliers. Therefore the preferred measure of central tendency would be the IQR because the IQR tells us exactly where the data falls within the four quadrants of the distribution. We only use the Standard Deviation with symmetrical distributions (normal distributions.) Let me know if you have questions about that(2) The histogram does have an outlier. It’s the far right column labeled as 948.56 sq ft. Make sure that if you revise this assignment for full points that you include the answers to the questions about the outlier (from the instructions) in your resubmission. Nov 11 2021