Презентация Statistics. Data Description. Data Summarization. Numerical Measures of the Data онлайн

На нашем сайте вы можете скачать и просмотреть онлайн доклад-презентацию на тему Statistics. Data Description. Data Summarization. Numerical Measures of the Data абсолютно бесплатно. Урок-презентация на эту тему содержит всего 69 слайдов. Все материалы созданы в программе PowerPoint и имеют формат ppt или же pptx. Материалы и темы для презентаций взяты из открытых источников и загружены их авторами, за качество и достоверность информации в них администрация сайта не отвечает, все права принадлежат их создателям. Если вы нашли то, что искали, отблагодарите авторов - поделитесь ссылкой в социальных сетях, а наш сайт добавьте в закладки.

Презентации » Математика » Statistics. Data Description. Data Summarization. Numerical Measures of the Data

Просмотр ВСЕЙ презентации! ЖМИТЕ

Оцените презентацию от 1 до 5 баллов!

Смотреть онлайн
Скачать

Тип файла:

ppt / pptx (powerpoint)
Всего слайдов:

69 слайдов
Для класса:

1,2,3,4,5,6,7,8,9,10,11
Размер файла:

2.27 MB
Просмотров:

82
Скачиваний:

0
Автор:

неизвестен

Слайды и текст к этой презентации:

№1 слайд

Содержание слайда: Chapter Three: Data Description Data Summarization Numerical Measures of the Data

№2 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Outline Introduction 3-1 Measures of Central Tendency 3-2 Measures of Variation 3-3 Measures of Position 3-4 Exploratory Data Analysis

№3 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Objectives Summarize data using the measures of central tendency, such as the mean, median, mode, and midrange. Describe data using the measures of variation, such as the range, variance, and standard deviation. Identify the position of a data value in a data set using various measures of position, such as percentiles, and quartiles. Use the techniques of exploratory data analysis, including stem and leaf plots, box plots, and five-number summaries to discover various aspects of data.

№4 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data 3-1 Measures of Central tendency We will compute two means: one for the sample and one for a finite population of values.

№5 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example:- (Sample Mean) The ages of a random sample of seven students at a certain school are 11, 10, 12, 13, 7, 9, 15 Find the average (Mean) age of this sample

№6 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№7 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The Sample Mean for an Ungrouped Frequency Distribution

№8 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The Sample Mean for an Ungrouped Frequency Distribution – Example

№9 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The Sample Mean for a Grouped Frequency Distribution The mean for a grouped frequency distribution is given by : Here is the corresponding class midpoint

№10 слайд

Содержание слайда: Important remark : In some situations the mean may not be representative of the data. As an example, the annual salaries of five vice presidents at AVX, LLC are $90,000, $92,000, $94,000, $98,000, and $350,000. The mean is: Notice how the one extreme value ($350,000) pulled the mean upward. Four of the five vice presidents earned less than the mean, raising the question whether the arithmetic mean value of $144,800 is typical of the salary of the five vice presidents.

№11 слайд

Содержание слайда: Properties of the mean As stated, the mean is a widely used measure of central tendency . It has several important properties. Every set of interval level and ratio level data has a mean. All the data values are included in the calculation. A set of data has only one mean, that is, the mean is unique. The mean is a useful measure for comparing two or more populations. The sum of the deviations of each value from the mean will always be zero, that is The mean is highly affected by extreme data . Note: Illustrating the fifth property Consider the set of values: 3, 8, and 4. The mean is 5.

№12 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Median : The median splits the ordered data into halves the symbol used to denote the median is

№13 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data When there is an even number of values in the data set, the median is obtained by taking the average of the two middle numbers. Example:- Six customers purchased the following number of magazines: 1, 7, 3, 2, 3, 4. Find the median. Arrange the data in order and compute the middle point. Data array: 1, 2, 3, 3, 4, 7. The median, = (3 + 3)/2 = 3. Example:-Find the median grade of the following sample 62, 68, 71, 74, 77, 82, 84, 88, 90, 94 62, 68, 71, 74, 77 82, 84, 88, 90, 94 5 on the left 5 on the right = 79.5

№14 слайд

Содержание слайда: example Find the median grade of the following sample of students grades : A B A D F D F A B C C C F D A F D A A B B F D A B F C Data array: F F F F F F D D D D D C C C C B B B B B A A A A A A A The median grade is : C Half of the students had at least C ( a grade less than or equal C. Half of the students had at most C ( a grade more than or equal C . The median can be determined for ordinal level data .

№15 слайд

Содержание слайда: Properties of the Median The major properties of the median are: The median is a unique value, that is, like the mean, there is only one median for a set of data. It is not influenced by extremely large or small values and is therefore a valuable measure of central tendency when such values do occur. It can be computed for ratio level, interval level, and ordinal-level data. Fifty percent of the observations are greater than the median and fifty percent of the observations are less than the median.

№16 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Mode:- is the score that occurs most frequently (denoted by M) Example:- The following data represent the duration (in days) of U.S. space shuttle voyages for the years 1992-94. Find the mode. Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8, 14, 11. Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 14, 14, 14. Mode = 8 days. Example:- Six strains of bacteria were tested to see how long they could remain alive outside their normal environment. The time, in minutes, is given below. Find the mode. Data set: 2, 3, 5, 7, 8, 10. There is no mode. since each data value occurs equally with a frequency of one.

№17 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example:- Eleven different automobiles were tested at a speed of 15 mph for stopping distances. The distance, in feet, is given below. Find the mode. Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26. There are two modes (bimodal). The values are 18 and 24.

№18 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The Mode for a Grouped Frequency Distribution – Can be approximated by the midpoint of the modal class. Example

№19 слайд

Содержание слайда: Properties of the Mode The mode can be found for all levels of data (nominal, ordinal, interval, and ratio). The mode is not affected by extremely high or low values. A set of data can have more than one mode. If it has two modes, it is said to be bimodal. A disadvantage is that a set of data may not have a mode because no value appears more than once.

№20 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The weighted mean is used when the values in a data set are not all equally represented. The weighted mean of a variable X is found by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights.

№21 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example:- During a one hour period on a hot Saturday afternoon a boy served fifty drinks. He sold five drinks for $0.50, fifteen for $0.75, fifteen for $0.90, and fifteen for $1.10. Compute the weighted mean of the of the price of the drinks :afternoon a boy served fifty

№22 слайд

Содержание слайда: Best measure of central tendency

№23 слайд

Содержание слайда: Relationship between mean , median and mode and the shape of the distribution Symmetric – the mean =the median=the mode Skewed left – the mean will usually be smaller than the median Skewed right – the mean will usually be larger than the median

№24 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data 3-2 Measures of Dispersion( variation) o the spread or variability in the data. Learning objectives The range of a variable The variance of a variable The standard deviation of a variable Use the Empirical Rule Comparing two sets of data The measures of central tendency (mean, median, mode) measure the differences between the “average” or “typical” values between two sets of data The measures of dispersion in this section measure the differences between how far “spread out” the data values are.

№25 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Variability -- provides a quantitative measure of the degree to which scores in a distribution are spread out or clustered together. Tells how meaningful measures of central tendency are Help to see which scores are outliers (extreme scores) Why do we Study Dispersion? A direct comparison of two sets of data based only on two measures of central tendency such as the mean and the median can be misleading since an average does not tell us anything about the spread of the data. See Example 3-15 page 128 of your text book Comparison of two outdoor paints : 6 gallons of each brand have been tested and the data obtained show how long ( in months) each brand will last before fading . Brand A : 10 60 50 30 40 20 Brand B : 35 45 30 35 40 25 Calculate the mean for each brand :

№26 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Measures of dispersion are : The range , The interquartile range , The variance and standard deviation , The coefficient of variation The range (R) of a variable is the difference between the largest data value and the smallest data value R = highest value – lowest value. Properties of the range Only two values are used in the calculation. It is influenced by extreme values. It is easy to compute and understand.

№27 слайд

Содержание слайда: Example Example Compute the range of 6, 1, 2, 6, 11, 7, 3, 3 The largest value is 11 The smallest value is 1 Subtracting the two … 11 – 1 = 10 … the range is 10 Relative measure of Range called coefficient of Range

№28 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The variance of a variable The variance is based on the deviation from the mean ( xi – μ ) for populations ( xi – ) for samples To treat positive differences and negative differences, we square the deviations ( xi – μ )2 for populations ( xi – )2 for samples

№29 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The population variance of a variable is the sum of the squared deviations of the data values from the mean divided by the number in the population where The population variance is represented by σ2 i.e. the square root of the arithmetic mean of the squares of deviations from arithmetic mean of given distribution.

№30 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№31 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The sample variance of a variable is the sum of the squared deviations of data values from the mean divided by one less than the number in the sample The sample variance is represented by s2 Sample standard deviation (s) or

№32 слайд

Содержание слайда:

№33 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Sample Variance for Grouped and Ungrouped Data For grouped data, use the class midpoints for the observed value in the different classes. For ungrouped data, use the same formula with the class midpoints, Xm, replaced with the actual observed X value. Example:- Find the variance and SD for the following data set 2,3,4,5,2,2,2,3,2,4,3,2,5,2,3,3,4,2,5,4,4,3,3,2,5,2

№34 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Step one put the data I ungrouped frequency table

№35 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example:- find the variance and SD for the frequency distribution of the data representing number of miles that 20 runners run during one week

№36 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№37 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Interpretation and Uses of the Standard Deviation The standard deviation is used to measure the spread of the data. A small standard deviation indicates that the data is clustered close to the mean, thus the mean is representative of the data. A large standard deviation indicates that the data are spread out from the mean and the mean is not representative of the data.

№38 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Coefficient of Variation :- The relative measure of St. Dev. is the coefficient of variation which is defined to be the standard deviation divided by the mean. The result is expressed as a percentage. Or Important note: The coefficient of variation should only be computed for data measured on a ratio scale. See the following example

№39 слайд

Содержание слайда: Example : To see why the coefficient of variation should not be applied to interval level data, compare the same set of temperatures in Celsius and Fahrenheit: Celsius: [0, 10, 20, 30, 40] Fahrenheit: [32, 50, 68, 86, 104] The CV of the first set is 15.81/20 = 0.79. For the second set (which are the same temperatures) it is 28.46/68 = 0.42 So the coefficient of variation does not have any meaning for data on an interval scale.

№40 слайд

Содержание слайда:

№41 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example:- Data about the annual salary (000’s) and age of CEO’s in a number of firms has been collected. The means and standard deviations are as follows: Which distribution has more dispersion? Is direct comparison appropriate? Salary and age are measured in different units and the means show that there is also a significant difference in magnitude. Direct comparison is not appropriate Comparing CV’s we can now see clearly that the dispersion or variability relative to the mean is greater for CEO annual salary than for age.

№42 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Measure of position: Measures of position are used to locate the relative position of a data value in the data set 1- Standard Scores To compare values of different units a z-score for each value is needed to be obtained then compared A z-score or standard score for each value is obtained by For sample For population The z-score represents the number SD that a data value falls above or below the mean.

№43 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Standard Scores (or z-scores) specify the exact location of a score within a distribution relative to the mean The sign (- or +) tells whether the score is above or below the mean The numerical value tells the distance from the mean in terms of standard deviations E.g., a z-score of -1.3 tells us that the raw score fell 1.3 standard deviations below the mean. Raw score is the original, untransformed score. To make them more meaningful, raw scores can be converted to z-scores.

№44 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Characteristics of Standard Scores The shape of the distribution of standard scores is the same as the shape of the distribution of raw scores (the only thing that changes is the units on the x-axis) The mean of a set of standard scores = 0. The St. deviation of a set of standard scores = 1. A standard score of greater than +3 or less than - 3 is an extreme score, or an outlier.

№45 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example:- A student scored 65 on a statistics exam that had a mean of 50 and a standard deviation of 10. Compute the z-score. z = (65 – 50)/10 = 1.5. That is, the score of 65 is 1.5 standard deviations above the mean. Above - since the z-score is positive. Assume that this student scored 70 on a math exam that had a mean of 80 and a standard deviation of 5 . Compute the z-score . Z= ( 70-80)/5=-2 That is, the score of 70 is 2 standard deviations below the mean. below - since the z-score is positive.

№46 слайд

Содержание слайда: Example:- a student scored 65 on a calculus test that had a mean of 50 and a SD of 10. she scored 30 on statistics test with a mean of 25 and variance of 25, compare relative positions of the two tests. Example:- a student scored 65 on a calculus test that had a mean of 50 and a SD of 10. she scored 30 on statistics test with a mean of 25 and variance of 25, compare relative positions of the two tests. Since the z-score for calculus is larger , her relative position in the calculus class is higher than her relative position in the statistics class.

№47 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Quartiles divide the data set into 4 groups. Quartiles are denoted by Q1, Q2, and Q3. The median is the same as Q2. Finding the Quartiles Procedure: Let be the quartile and n the sample size. Step 1: Arrange the data in order. Step 2: Compute c = ({n+1}k)/4. Step 3: If c is not a whole number, round off to whole number. use the value halfway between and . Step 4: If c is a whole number then the value of is the position value of the required percentile.

№48 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example: For the following data set: 2, 3, 5, 6, 8, 10, 12 Find Q1 and Q3 n = 7, so for Q1 we have c = ((7+1)1)/4 = 2. Hence the value of Q1 is the 2nd value. Thus Q1 for the data set is 3. for Q3 we have c = ((7+1)3)/4 = 6. Hence the value of Q3 is the 6th value. Thus Q3 for the data set is 10.

№49 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example: Find Q1 and Q3 for the following data set: 2, 3, 5, 6, 8, 10, 12, 15, 18. Note: the data set is already ordered. n = 9, so for Q1 we have c = ((9+1)1)/4 = 2.5. Hence the value of Q1 is the halfway between the 2nd value and 3rd value. for Q3 we have c = ((9+1)3)/4 = 7.5. Hence the value of Q3 is the halfway between the 7th value and 8th value

№50 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example: For the following data set: 2, 3, 5, 6, 8, 10, 12 Find Q1 and Q3 The median for the above data is 6 The median for the lower group of data which is less than median is 3 So the value of Q1 is the 2nd value which means that Q1 =3. The median for the upper group of data which is grater than median is 10 So the value of Q3 is the 6th value which means that Q3 =10.

№51 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№52 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№53 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The Interquartile Range (IQR) The Interquartile Range, IQR = Q3 – Q1. the Interquartile Range (IQR), also called the midspread , middle fifty or inner 50% data range, is a measure of statistical dispersion (variation), being equal to the difference between the third and first quartiles.

№54 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data An outlier is an extremely high or an extremely low data value when compared with the rest of the data values.

№55 слайд

Содержание слайда: Example Given the data set 5, 6, 12, 13, 15, 18, 22, 50, can the value of 50 be considered as an outlier? Q1 = 9, Q3 = 20, IQR = 11. Verify. (1.5)(IQR) = (1.5)(11) = 16.5. 9 – 16.5 = – 7.5 and 20 + 16.5 = 36.5. The value of 50 is outside the range (– 7.5 to 36.5), hence 50 is an outlier.

№56 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Measure of Dispersion tells us about the variation of the data set. Skewness tells us about the direction of variation of the data set. Definition: Skewness is a measure of symmetry, or more precisely, the lack of symmetry. Coefficient of Skewness Unitless number that measures the degree and direction of symmetry of a distribution There are several ways of measuring Skewness: Pearson’s coefficient of Skewness

№57 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data For any bell shaped distribution: Approximately 68% of the data values will fall within one standard deviation of the mean. Approximately 95% will fall within two standard deviations of the mean. Approximately 99.7% will fall within three standard deviations of the mean.

№58 слайд

Содержание слайда: The Empirical (Normal) Rule

№59 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data What is a Box Plot To construct a box plot, first obtain the 5 number summary { Min, Q1, M, Q3, Max }

№60 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data The box plot is useful in analyzing small data sets that do not lend themselves easily to histograms. Because of the small size of a box plot, it is easy to display and compare several box plots in a small space. A box plot is a good alternative or complement to a histogram and is usually better for showing several simultaneous comparisons.

№61 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data How to use it: Collect and arrange data. Collect the data and arrange it into an ordered set from lowest value to highest. Calculate the median. M = median= Q2 Calculate the first quartile. (Q1) Calculate the third quartile. (Q3) Calculate the interquartile rage (IQR). This range is the difference between the first and third quartile vales. (Q3 - Q1) Obtain the maximum. This is the largest data value that is less than or equal to the third quartile plus 1.5 X IQR. Q3 + [(Q3 - Q1) X 1.5] .

№62 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Obtain the minimum. This value will be the smallest data value that is greater than or equal to the first quartile minus 1.5 X IQR. Q1 - [(Q3 - Q1) X 1.5] Draw and label the axes of the graph. The scale of the horizontal axis must be large enough to encompass the greatest value of the data sets. Draw the box plots. Construct the box, insert median points, and attach maximum and minimum. Identify outliers (values outside the upper and lower fences) with asterisks. The box plot can provide answers to the following questions: Does the location differ between subgroups? Does the variation differ between subgroups? Are there any outliers?

№63 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№64 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№65 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Now find the interquartile range (IQR). The interquartile range is the difference between the upper quartile and the lower quartile. In this case the IQR = 87 - 52 = 35. The IQR is a very useful measurement. It is useful because it is less influenced by extreme values, it limits the range to the middle 50% of the values. 35 is the interquartile range begin to draw Box-plot graph.

№66 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data Example 2 Consider two datasets: A1={0.22, -0.87, -2.39, -1.79, 0.37, -1.54, 1.28, -0.31, -0.74, 1.72, 0.38, -0.17, -0.62, -1.10, 0.30, 0.15, 2.30, 0.19, -0.50, -0.09} A2={-5.13, -2.19, -2.43, -3.83, 0.50, -3.25, 4.32, 1.63, 5.18, -0.43, 7.11, 4.87, -3.10, -5.81, 3.76, 6.31, 2.58, 0.07, 5.76, 3.50} Notice that both datasets are approximately balanced around zero; evidently the mean in both cases is "near" zero. However there is substantially more variation in A2 which ranges approximately from -6 to 6 whereas A1 ranges approximately from -2½ to 2½. Below find box plots. Notice the difference in scales: since the box plot is displaying the full range of variation, the y-range must be expanded.

№67 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№68 слайд

Содержание слайда: Chapter Three: Numerical Measures of the Data

№69 слайд

Содержание слайда: If the median is near the center of the box, the distribution is approximately symmetric. If the median falls to the left of the center of the box, the distribution is positively skewed. If the median falls to the right of the center of the box, the distribution is negatively skewed Similarly : If the lines are about the same length, the distribution is approximately symmetric. If the right line is larger than the left line, the distribution is positively skewed. If the left line is larger than the right line, the distribution is negatively skewed.

Скачать все slide презентации Statistics. Data Description. Data Summarization. Numerical Measures of the Data одним архивом:

Скачать

Похожие презентации