Оцените презентацию от 1 до 5 баллов!
Тип файла:
ppt / pptx (powerpoint)
Всего слайдов:
43 слайда
Для класса:
1,2,3,4,5,6,7,8,9,10,11
Размер файла:
895.06 kB
Просмотров:
70
Скачиваний:
0
Автор:
неизвестен
Слайды и текст к этой презентации:
№1 слайд
Содержание слайда: BBA182 Applied Statistics
Week 3 (2) Using numerical data to describe data
Dr Susanne Hansen Saral
Email: susanne.saral@okan.edu.tr
https://piazza.com/class/ixrj5mmox1u2t8?cid=4#
www.khanacademy.org
№2 слайд
Содержание слайда: Using numerical measures to describe data
«Is the data in the sample centered or located around a specific value?»
First question that business people, economists, corporate executives, etc. ask when presented with sample data.
№3 слайд
Содержание слайда: Using numerical measures to describe data
The histogram gives an idea whether the data is centered around a specific value.
The histogram provides a visual picture of how the data is distributed (symmetric, skewed, etc.)
№4 слайд
Содержание слайда: Is the data centered around a specific value?
№5 слайд
Содержание слайда: Numerical measures to describe data
№6 слайд
Содержание слайда: Measures of the center of the data set
№7 слайд
Содержание слайда: Mean
Population mean,
The mean is the most common measure of the center of a data set
For a population of N values:
№8 слайд
Содержание слайда: Mean
Sample Mean,
For a sample of n values:
№9 слайд
Содержание слайда: The Mean
symmetry and unimodal distribution
When we have a symmetric distribution with one Mode, then the mean represents the middle value in a data set.
№10 слайд
Содержание слайда: Mean
The most common measure for the center of a data set
Affected by extreme values (outliers)
№11 слайд
Содержание слайда: Mean
The most common measure for the center of a data set
Affected by extreme values (outliers)
№12 слайд
Содержание слайда: Skewed distribution
An outlier will distort the picture of the data.
It will inflate or deflate the mean, depending
on the value of the outlier
This creates a skewed distribution.
In this case we may want to use a different measure of the data center
№13 слайд
Содержание слайда: Median
In an ordered list of data, the median is the “middle” number (50% above, 50% below)
Not affected by outliers
№14 слайд
Содержание слайда: Finding the Median
The location of the median:
If the number of values is odd (uneven), the median is the middle number
- 17 6 25 -5 13 9 33
For this data set: -17 -5 6 9 13 25 33
№15 слайд
Содержание слайда: Finding the Median
The location of the median:
If the number of values is even, the median is the two middle numbers divided by 2
№16 слайд
Содержание слайда: Finding the median
Determine the median of the following data set:
17 5 3 11 12 8 25 3
№17 слайд
Содержание слайда: Finding the median
Determine the median of the following data set:
17 5 3 11 12 8 25 3
3 3 5 8 11 12 17 25
Median: 8 +11 = 19/ 2 = 9.5
№18 слайд
Содержание слайда: Mode
Value that occurs most often in the data set
Not affected by outliers
Used for either numerical or categorical data
There may be no mode
There may be several modes, uni-modal, bi-modal, multimodal
№19 слайд
Содержание слайда: Measures of the center summary data
Five houses on a hill by the beach
№20 слайд
Содержание слайда: Measures of the center summary data
What is the mean house price?
What is the median house price?
What is the modal house price?
№21 слайд
Содержание слайда: Mean: ($3,000,000/5)
= $600,000
Median: middle value of ranked data
= $300,000
Mode: most frequent house price
= $100,000
№22 слайд
Содержание слайда: When is which measure of the center the “best”?
Mean is generally used, unless outliers exist. If there are outliers the mean does not represent the center well.
Then median is used when outliers exist in the data set.
Example: Median home prices may be reported for a region – less sensitive to outliers
№23 слайд
Содержание слайда: Shape of a Distribution
Describe the shape of a distribution
Describes how data is distributed
The presence or not of outliers in a data set, influence the shape of a distribution
Symmetric or skewed
№24 слайд
Содержание слайда: Histogram of annual salaries (in $) for a sample of U.S. marketing managers:
Describe the shape of this histogram (of the distribution)
Without doing calculations. Do you expect the mean salary to be higher or lower than the median salary?
№25 слайд
Содержание слайда: Class exercise
Eleven economists were asked to predict the percentage growth in the Consumer Price Index over the next year.
Their forecasts were as follows:
3.6 3.1 3.9 3.7 3.5 1.0 3.7 3.4 3.0 3.7 3.4
Compute the mean, median and the mode
Are there any outliers in the data set that may influence the value of the mean?
If there are outliers, how do they affect the shape of the data distribution?
№26 слайд
Содержание слайда: Solution to class exercise
Mean: 36/11 = 3.27 rounded up to 3.3
Median: 3.5
Mode: 3.7
Outlier: 1.0
How does the outlier affect the shape of the distribution?
It decreases the average of the data set and distorts the picture of the histogram.
The shape is skewed to the left.
№27 слайд
Содержание слайда: Measures of variability
The three measures of data center do not provide complete and sufficient description of the data.
Next to knowing how data is located around a specific value (mean, median or mode), we need information on how far the data is spread from that specific value, most often from the mean.
The measure of variability will provide us with this information.
№28 слайд
Содержание слайда: Measures of Variability
№29 слайд
Содержание слайда: Quartiles
Quartiles are descriptive measures that separate large data set into four quarters.
The first quartile ( separates approximately the smallest 25 % of the data from the remaining largest 75 % of the data.
The second quartile (), is the median, which separates the data set into two identical halves.
The third quartile ( separates approximately the smallest 75 % of the data from the remaining largest 25 % of the data
№30 слайд
Содержание слайда: Quartiles
№31 слайд
Содержание слайда: How to calculate quartiles manually
№32 слайд
Содержание слайда: Quartiles
№33 слайд
Содержание слайда: Quartiles
№34 слайд
Содержание слайда: Quartiles and Enron case
In the Enron data we had 60 data points.
There are 30 values to right and 30 values to left side of the median (:
( = -$1.68 (between15th and 16th data points) - 75 % of the data is larger than -$ 1.68
( = -$ 0.19 median (between 30th and 31st points) - 50 % of the data is smaller than -$.19 and 50 %
of the data is larger than -$.19 .
( = $2.14 (between 45th and 46th data pots) - 25 % of the data is larger than $2.14
№35 слайд
Содержание слайда: Range
Simplest measure of variation
Difference between the largest and the smallest observations:
№36 слайд
Содержание слайда: Range – Example Enron case
Range = Maximum value – minimum value
Enron data range = $21.06 – (-$17.75) = $ 38.81
№37 слайд
Содержание слайда: Disadvantages of the Range
Ignores the way in which data is distributed
№38 слайд
Содержание слайда: Disadvantages of the Range
Sensitive to outliers
№39 слайд
Содержание слайда: Range: short-comings
as a good measure for variability
Because the range does not provide us with a lot of information about the spread of the data it is not a very good measure for variability.
№40 слайд
Содержание слайда: Interquartile Range
We can eliminate some outlier problems by using the interquartile range and
concentrate on the middle 50 % of the data in the data set
Eliminate high- and low-valued observations and calculate the range of
the middle 50% of the data
Q1 Q3
Interquartile range
The Interquartile range, IQR =
№41 слайд
Содержание слайда: Interquartile Range
The interquartile range (IQR) measures the spread of the data in the middle 50% of the data set
Defined as the difference between the observation at the third quartile and the observation at the first quartile
IQR = Q3 - Q1
№42 слайд
Содержание слайда: Interquartile Range
Raw data: 6 8 10 12 14 9 11 7 13 11 n = 10
Ranked data: 6 7 8 9 10 11 11 12 13 14
1. Quartile: 7.75
3. Quartile: 12.25
IQR = Q3 – Q1 = 12.25 – 7.75 = 4.5
Q1: 7.75 Q3: 12.25
№43 слайд
Содержание слайда: Enron data: Interquartile range
Interquartile range:
IQR : $2.14 – (-$ 1.68) = $ 3.82
The middle 50 % of the Enron data has a spread of $ 3.82 compared to the range of $ 38. 81!