The value of the standard deviation. Standard deviation

17.10.2019

The values ​​obtained from experience inevitably contain errors due to a variety of reasons. Among them, systematic and random errors should be distinguished. Systematic errors are due to causes that act in a very specific way, and can always be eliminated or taken into account with sufficient accuracy. Random errors are caused by a very large number of individual causes that cannot be accurately accounted for and act differently in each individual measurement. These errors cannot be completely ruled out; they can be taken into account only on the average, for which it is necessary to know the laws to which random errors are subject.

We will denote the measured value by A, and the random error in the measurement x. Since the error x can take any value, it is a continuous random variable, which is fully characterized by its own distribution law.

The simplest and most accurately reflecting reality (in the vast majority of cases) is the so-called normal distribution of errors:

This distribution law can be obtained from various theoretical premises, in particular, from the requirement that the most probable value of an unknown quantity for which a series of values ​​with the same degree of accuracy is obtained by direct measurement is the arithmetic mean of these values. The value 2 is called dispersion of this normal law.

Average

Determination of dispersion according to experimental data. If for any quantity A, n values ​​a i are obtained by direct measurement with the same degree of accuracy, and if the errors in the quantity A are subject to the normal distribution law, then the most probable value of A will be average:

a - arithmetic mean,

a i - measured value at the i-th step.

Deviation of the observed value (for each observation) a i of the value A from arithmetic mean: a i - a.

To determine the dispersion of the normal distribution of errors in this case, use the formula:

2 - dispersion,
a - arithmetic mean,
n is the number of parameter measurements,

standard deviation

standard deviation shows the absolute deviation of the measured values ​​from arithmetic mean. In accordance with the formula for the linear combination accuracy measure root mean square error the arithmetic mean is determined by the formula:

, Where


a - arithmetic mean,
n is the number of parameter measurements,
a i - measured value at the i-th step.

The coefficient of variation

The coefficient of variation characterizes the relative degree of deviation of the measured values ​​from arithmetic mean:

, Where

V - coefficient of variation,
- standard deviation,
a - arithmetic mean.

The greater the value coefficient of variation, the relatively greater the scatter and the less uniformity of the studied values. If the coefficient of variation less than 10%, then the variability of the variation series is considered to be insignificant, from 10% to 20% refers to the average, more than 20% and less than 33% to significant, and if the coefficient of variation exceeds 33%, this indicates the heterogeneity of information and the need to exclude the largest and smallest values.

Average linear deviation

One of the indicators of the range and intensity of variation is mean linear deviation(average modulus of deviation) from the arithmetic mean. Average linear deviation calculated by the formula:

, Where

_
a - average linear deviation,
a - arithmetic mean,
n is the number of parameter measurements,
a i - measured value at the i-th step.

To check the compliance of the studied values ​​with the law of normal distribution, the relation is used asymmetry index to his mistake and attitude kurtosis indicator to his mistake.

Asymmetry index

Asymmetry index(A) and its error (m a) is calculated using the following formulas:

, Where

A - asymmetry indicator,
- standard deviation,
a - arithmetic mean,
n is the number of parameter measurements,
a i - measured value at the i-th step.

Kurtosis indicator

Kurtosis indicator(E) and its error (m e) is calculated using the following formulas:

, Where

According to the sample survey, depositors were grouped according to the size of the deposit in the Sberbank of the city:

Define:

1) range of variation;

2) average deposit amount;

3) average linear deviation;

4) dispersion;

5) standard deviation;

6) coefficient of variation of contributions.

Solution:

This distribution series contains open intervals. In such series, the value of the interval of the first group is conventionally assumed to be equal to the value of the interval of the next, and the value of the interval of the last group is equal to the value of the interval of the previous one.

The interval value of the second group is 200, therefore, the value of the first group is also 200. The interval value of the penultimate group is 200, which means that the last interval will also have a value equal to 200.

1) Define the range of variation as the difference between the largest and smallest value of the feature:

The range of variation in the size of the contribution is 1000 rubles.

2) The average size of the contribution is determined by the formula of the arithmetic weighted average.

Let us preliminarily determine the discrete value of the attribute in each interval. To do this, using the simple arithmetic mean formula, we find the midpoints of the intervals.

The average value of the first interval will be equal to:

the second - 500, etc.

Let's put the results of calculations in the table:

Deposit amount, rub.Number of contributors, fThe middle of the interval, xxf
200-400 32 300 9600
400-600 56 500 28000
600-800 120 700 84000
800-1000 104 900 93600
1000-1200 88 1100 96800
Total 400 - 312000

The average deposit in the city's Sberbank will be 780 rubles:

3) The average linear deviation is the arithmetic average of the absolute deviations of the individual values ​​of the attribute from the total average:

The procedure for calculating the average linear deviation in the interval distribution series is as follows:

1. The arithmetic weighted average is calculated, as shown in paragraph 2).

2. The absolute deviations of the variant from the mean are determined:

3. The obtained deviations are multiplied by the frequencies:

4. The sum of weighted deviations is found without taking into account the sign:

5. The sum of the weighted deviations is divided by the sum of the frequencies:

It is convenient to use the table of calculated data:

Deposit amount, rub.Number of contributors, fThe middle of the interval, x
200-400 32 300 -480 480 15360
400-600 56 500 -280 280 15680
600-800 120 700 -80 80 9600
800-1000 104 900 120 120 12480
1000-1200 88 1100 320 320 28160
Total 400 - - - 81280

The average linear deviation of the size of the deposit of Sberbank clients is 203.2 rubles.

4) Dispersion is the arithmetic mean of the squared deviations of each feature value from the arithmetic mean.

Calculation of variance in the interval distribution series is carried out according to the formula:

The procedure for calculating the variance in this case is as follows:

1. Determine the arithmetic weighted average, as shown in paragraph 2).

2. Find deviations from the mean:

3. Squaring the deviation of each option from the mean:

4. Multiply squared deviations by weights (frequencies):

5. Summarize the received works:

6. The resulting amount is divided by the sum of the weights (frequencies):

Let's put the calculations in a table:

Deposit amount, rub.Number of contributors, fThe middle of the interval, x
200-400 32 300 -480 230400 7372800
400-600 56 500 -280 78400 4390400
600-800 120 700 -80 6400 768000
800-1000 104 900 120 14400 1497600
1000-1200 88 1100 320 102400 9011200
Total 400 - - - 23040000

In this article, I will talk about how to find standard deviation. This material is extremely important for a full understanding of mathematics, so a math tutor should devote a separate lesson or even several to studying it. In this article, you will find a link to a detailed and understandable video tutorial that explains what the standard deviation is and how to find it.

standard deviation makes it possible to estimate the spread of values ​​obtained as a result of measuring a certain parameter. It is denoted by a symbol (Greek letter "sigma").

The formula for the calculation is quite simple. To find the standard deviation, you need to take the square root of the variance. So now you have to ask, “What is variance?”

What is dispersion

The definition of variance is as follows. Dispersion is the arithmetic mean of the squared deviations of values ​​from the mean.

To find the variance, perform the following calculations sequentially:

  • Determine the mean (simple arithmetic mean of a series of values).
  • Then subtract the average from each of the values ​​​​and square the resulting difference (we got difference squared).
  • The next step is to calculate the arithmetic mean of the squares of the differences obtained (You can find out why exactly the squares are below).

Let's look at an example. Let's say you and your friends decide to measure the height of your dogs (in millimeters). As a result of measurements, you received the following height measurements (at the withers): 600 mm, 470 mm, 170 mm, 430 mm and 300 mm.

Let's calculate the mean, variance and standard deviation.

Let's find the average first. As you already know, for this you need to add all the measured values ​​\u200b\u200band divide by the number of measurements. Calculation progress:

Average mm.

So, the average (arithmetic mean) is 394 mm.

Now we need to define deviation of the height of each of the dogs from the average:

Finally, to calculate the variance, each of the obtained differences is squared, and then we find the arithmetic mean of the results obtained:

Dispersion mm 2 .

Thus, the dispersion is 21704 mm 2 .

How to find the standard deviation

So how now to calculate the standard deviation, knowing the variance? As we remember, take the square root of it. That is, the standard deviation is:

mm (rounded to the nearest whole number in mm).

Using this method, we found that some dogs (eg Rottweilers) are very large dogs. But there are also very small dogs (for example, dachshunds, but you should not tell them this).

The most interesting thing is that the standard deviation carries useful information. Now we can show which of the obtained results of measuring growth are within the interval that we get if we set aside from the average (on both sides of it) the standard deviation.

That is, using the standard deviation, we get a “standard” method that allows you to find out which of the values ​​is normal (statistical average), and which is extraordinarily large or, conversely, small.

What is Standard Deviation

But ... things will be a little different if we analyze sampling data. In our example, we considered the general population. That is, our 5 dogs were the only dogs in the world that interested us.

But if the data is a sample (values ​​chosen from a large population), then the calculations need to be done differently.

If there are values, then:

All other calculations are made in the same way, including the determination of the average.

For example, if our five dogs are just a sample of a population of dogs (all dogs on the planet), we must divide by 4 instead of 5 namely:

Sample variance = mm 2 .

In this case, the standard deviation for the sample is equal to mm (rounded to the nearest whole number).

We can say that we made some "correction" in the case when our values ​​are just a small sample.

Note. Why exactly the squares of the differences?

But why do we take the squares of the differences when calculating the variance? Let's admit at measurement of some parameter, you received the following set of values: 4; 4; -4; -4. If we just add the absolute deviations from the mean (difference) among themselves ... negative values ​​cancel out with positive ones:

.

It turns out that this option is useless. Then maybe it's worth trying the absolute values ​​of the deviations (that is, the modules of these values)?

At first glance, it turns out not bad (the resulting value, by the way, is called the mean absolute deviation), but not in all cases. Let's try another example. Let the measurement result in the following set of values: 7; 1; -6; -2. Then the mean absolute deviation is:

Wow! We again got the result 4, although the differences have a much larger spread.

Now let's see what happens if we square the differences (and then take the square root of their sum).

For the first example, you get:

.

For the second example, you get:

Now it's a completely different matter! The root-mean-square deviation is the greater, the greater the spread of the differences ... which is what we were striving for.

In fact, this method uses the same idea as when calculating the distance between points, only applied in a different way.

And from a mathematical point of view, the use of squares and square roots is more useful than we could get on the basis of the absolute values ​​​​of the deviations, due to which the standard deviation is applicable to other mathematical problems.

Sergey Valerievich told you how to find the standard deviation

Lesson number 4

Topic: “Descriptive statistics. Indicators of the diversity of the trait in the aggregate "

The main criteria for the diversity of a trait in the statistical population are: limit, amplitude, standard deviation, oscillation coefficient and coefficient of variation. In the previous lesson, it was discussed that the average values ​​give only a generalizing characteristic of the studied trait in the aggregate and do not take into account the values ​​of its individual variants: the minimum and maximum values, above the average, below the average, etc.

Example. Average values ​​of two different numerical sequences: -100; -20; 100; 20 and 0.1; -0.2; 0.1 are exactly the same and equalABOUT.However, the data scatter ranges of these relative mean sequences are very different.

The definition of the listed criteria for the diversity of a trait is primarily carried out taking into account its value for individual elements of the statistical population.

Indicators of measuring the variation of a trait are absolute And relative. The absolute indicators of variation include: the range of variation, limit, standard deviation, variance. The coefficient of variation and the coefficient of oscillation refer to relative measures of variation.

Limit (lim)– this is a criterion that is determined by the extreme values ​​of the variant in the variation series. In other words, this criterion is limited by the minimum and maximum values ​​of the attribute:

Amplitude (Am) or range of variation - this is the difference between the extremes. The calculation of this criterion is carried out by subtracting its minimum value from the maximum value of the attribute, which makes it possible to estimate the degree of dispersion of the variant:

The disadvantage of the limit and amplitude as criteria for variability is that they completely depend on the extreme values ​​of the trait in the variation series. In this case, fluctuations in the values ​​of the attribute within the series are not taken into account.

The most complete characterization of the diversity of a trait in a statistical population is given by standard deviation(sigma), which is a general measure of the deviation of a variant from its mean value. The standard deviation is also often referred to as standard deviation.

The basis of the standard deviation is the comparison of each option with the arithmetic mean of this population. Since in the aggregate there will always be options both less and more than it, then the sum of the deviations having the sign "" will be repaid by the sum of the deviations having the sign "", i.e. the sum of all deviations is zero. In order to avoid the influence of the signs of the differences, the deviations of the variant from the arithmetic mean squared are taken, i.e. . The sum of squared deviations is not equal to zero. To obtain a coefficient capable of measuring variability, take the average of the sum of squares - this value is called dispersion:

By definition, variance is the mean square of the deviations of the individual values ​​of a feature from its mean value. Dispersion squared standard deviation .

Dispersion is a dimensional quantity (named). So, if the variants of the number series are expressed in meters, then the dispersion gives square meters; if the variants are expressed in kilograms, then the variance gives the square of this measure (kg 2), and so on.

Standard deviation is the square root of the variance:

, then when calculating the variance and standard deviation in the denominator of the fraction, instead ofit is necessary to put.

The calculation of the standard deviation can be divided into six stages, which must be carried out in a certain sequence:

Applying standard deviation:

a) to judge the fluctuation of variational series and a comparative assessment of the typicality (representativeness) of arithmetic means. This is necessary in differential diagnosis when determining the stability of signs.

b) for the reconstruction of the variational series, i.e. restoring its frequency response based on three sigma rules. In the interval (М±3σ) there is 99.7% of all variants of the series, in the interval (М±2σ) - 95.5% and in the interval (М±1σ) - 68.3% row option(Fig. 1).

c) to identify "pop-up" options

d) to determine the parameters of the norm and pathology using sigma estimates

e) to calculate the coefficient of variation

e) to calculate the average error of the arithmetic mean.

To characterize any general population that hasnormal distribution type , it is enough to know two parameters: the arithmetic mean and the standard deviation.

Figure 1. Three Sigma Rule

Example.

In pediatrics, the standard deviation is used to assess the physical development of children by comparing the data of a particular child with the corresponding standard indicators. The arithmetic mean indicators of the physical development of healthy children are taken as the standard. Comparison of indicators with standards is carried out according to special tables, in which the standards are given along with their corresponding sigma scales. It is believed that if the indicator of the physical development of the child is within the standard (arithmetic mean) ±σ, then the physical development of the child (according to this indicator) corresponds to the norm. If the indicator is within the standard ±2σ, then there is a slight deviation from the norm. If the indicator goes beyond these limits, then the physical development of the child differs sharply from the norm (pathology is possible).

In addition to variation indicators expressed in absolute values, statistical research uses variation indicators expressed in relative values. Oscillation coefficient - this is the ratio of the range of variation to the average value of the trait. The coefficient of variation - this is the ratio of the standard deviation to the average value of the feature. Typically, these values ​​are expressed as a percentage.

Formulas for calculating the relative indicators of variation:

From the above formulas it can be seen that the larger the coefficient V close to zero, the smaller the variation of the trait values. The more V, the more variable the sign.

In statistical practice, the coefficient of variation is most often used. It is used not only for a comparative assessment of variation, but also to characterize the homogeneity of the population. The set is considered homogeneous if the coefficient of variation does not exceed 33% (for distributions close to normal). Arithmetically, the ratio of σ and the arithmetic mean eliminates the influence of the absolute value of these characteristics, and the percentage ratio makes the coefficient of variation a dimensionless (unnamed) value.

The obtained value of the coefficient of variation is estimated in accordance with the approximate gradations of the degree of diversity of the trait:

Weak - up to 10%

Average - 10 - 20%

Strong - more than 20%

The use of the coefficient of variation is advisable in cases where it is necessary to compare features that are different in size and dimension.

The difference between the coefficient of variation and other scatter criteria is clearly demonstrated by example.

Table 1

Composition of employees of an industrial enterprise

Based on the statistical characteristics given in the example, it can be concluded that the age composition and educational level of the enterprise's employees are relatively homogeneous, with low professional stability of the surveyed contingent. It is easy to see that an attempt to judge these social trends by the standard deviation would lead to an erroneous conclusion, and an attempt to compare the accounting features "work experience" and "age" with the accounting feature "education" would generally be incorrect due to the heterogeneity of these features.

Median and Percentiles

For ordinal (rank) distributions, where the criterion for the middle of the series is the median, the standard deviation and variance cannot serve as characteristics of the dispersion of the variant.

The same is true for open variational series. This circumstance is due to the fact that the deviations by which the variance and σ are calculated are counted from the arithmetic mean, which is not calculated in open variational series and in the series of distributions of qualitative features. Therefore, for a compressed description of distributions, another scatter parameter is used - quantile(synonym - "percentile"), suitable for describing qualitative and quantitative characteristics in any form of their distribution. This parameter can also be used to convert quantitative features into qualitative ones. In this case, such scores are assigned depending on which order of the quantile corresponds to one or another specific option.

In the practice of biomedical research, the following quantiles are most often used:

– median;

, are quartiles (quarters), where is the lower quartile, top quartile.

Quantiles divide the area of ​​possible changes in a variational series into certain intervals. The median (quantile) is the variant that is in the middle of the variation series and divides this series in half, into two equal parts ( 0,5 And 0,5 ). The quartile divides the series into four parts: the first part (lower quartile) is the option separating options whose numerical values ​​do not exceed 25% of the maximum possible in this series, the quartile separates options with a numerical value up to 50% of the maximum possible. The upper quartile () separates options up to 75% of the maximum possible values.

In case of asymmetric distribution variable relative to the arithmetic mean, the median and quartiles are used to characterize it. In this case, the following form of displaying the average value is used - Me (;). For example, the trait under study - "the period in which the child began to walk independently" - in the study group has an asymmetric distribution. At the same time, the lower quartile () corresponds to the start of walking - 9.5 months, the median - 11 months, the upper quartile () - 12 months. Accordingly, the characteristic of the average trend of the specified attribute will be presented as 11 (9.5; 12) months.

Assessment of the statistical significance of the study results

The statistical significance of the data is understood as the degree of their correspondence to the displayed reality, i.e. Statistically significant data are those that do not distort and correctly reflect objective reality.

To assess the statistical significance of the results of a study means to determine with what probability it is possible to transfer the results obtained on a sample population to the entire population. An assessment of statistical significance is necessary to understand how part of the phenomenon can be used to judge the phenomenon as a whole and its patterns.

The assessment of the statistical significance of the results of the study consists of:

1. errors of representativeness (errors of average and relative values) - m;

2. confidence limits of average or relative values;

3. reliability of the difference between average or relative values ​​according to the criterion t.

Standard error of the arithmetic mean or representativeness error characterizes fluctuations in the average. It should be noted that the larger the sample size, the smaller the spread of the average values. The standard error of the mean is calculated by the formula:

In modern scientific literature, the arithmetic mean is written together with the representativeness error:

or together with standard deviation:

As an example, consider data for 1,500 urban polyclinics in the country (general population). The average number of patients served in the polyclinic is 18150 people. Random selection of 10% of objects (150 polyclinics) gives an average number of patients equal to 20051 people. The sampling error, obviously related to the fact that not all 1500 polyclinics were included in the sample, is equal to the difference between these averages - the general average ( M gene) and sample mean ( M sb). If we form another sample of the same size from our population, it will give a different amount of error. All these sample means, with sufficiently large samples, are normally distributed around the general mean with a sufficiently large number of repetitions of a sample of the same number of objects from the general population. Standard error of the mean m is the inevitable spread of the sample means around the general mean.

In the case when the results of the study are represented by relative values ​​(for example, percentages), the share standard error:

where P is the indicator in %, n is the number of observations.

The result is displayed as (P ± m)%. For example, the percentage of recovery among patients was (95.2±2.5)%.

If the number of elements in the population, then when calculating the standard errors of the mean and the share in the denominator of the fraction, instead ofit is necessary to put.

For a normal distribution (the distribution of sample means is normal), it is known how much of the population falls within any interval around the mean. In particular:

In practice, the problem lies in the fact that the characteristics of the general population are unknown to us, and the sample is made precisely for the purpose of assessing them. This means that if we take samples of the same size n from the general population, then in 68.3% of cases the interval will contain the value M(it will be on the interval in 95.5% of cases and on the interval in 99.7% of cases).

Since only one sample is actually made, this statement is formulated in terms of probability: with a probability of 68.3%, the average value of the attribute in the general population is contained in the interval, with a probability of 95.5% - in the interval, etc.

In practice, such an interval is built around the sample value, which would, with a given (high enough) probability - confidence probability - would “cover” the true value of this parameter in the general population. This interval is called confidence interval.

Confidence probabilityP is the degree of confidence that the confidence interval will indeed contain the true (unknown) value of the parameter in the population.

For example, if the confidence level R equal to 90%, this means that 90 samples out of 100 will give a correct estimate of the parameter in the general population. Accordingly, the probability of error, i.e. incorrect estimate of the general average for the sample, is equal in percentage: . For this example, this means that 10 samples out of 100 will give an incorrect estimate.

Obviously, the degree of confidence (confidence probability) depends on the size of the interval: the wider the interval, the higher the confidence that an unknown value for the general population will fall into it. In practice, at least twice the sampling error is taken to construct a confidence interval to provide at least 95.5% confidence.

Determining the confidence limits of average and relative values ​​allows us to find their two extreme values ​​- the minimum possible and the maximum possible, within which the indicator under study can occur in the entire general population. Based on this, confidence limits (or confidence interval)- these are the boundaries of average or relative values, going beyond which due to random fluctuations has an insignificant probability.

The confidence interval can be rewritten as: , where t is a confidence criterion.

The confidence limits of the arithmetic mean in the general population are determined by the formula:

M gene = M select + t m M

for relative value:

R gene = P select + t m R

Where M gene And R gene- values ​​of the average and relative values ​​for the general population; M select And R select- the values ​​of the average and relative values ​​obtained on the sample population; m M And m P- errors of average and relative values; t- confidence criterion (accuracy criterion, which is set when planning the study and can be equal to 2 or 3); t m- this is the confidence interval or Δ - the marginal error of the indicator obtained in the sample study.

It should be noted that the value of the criterion t to a certain extent, it is related to the probability of an error-free forecast (p), expressed in%. It is chosen by the researcher himself, guided by the need to obtain a result with the required degree of accuracy. So, for the probability of an error-free forecast of 95.5%, the value of the criterion t is 2, for 99.7% - 3.

The given estimates of the confidence interval are acceptable only for statistical populations with more than 30 observations. With a smaller population size (small samples), special tables are used to determine the criterion t. In these tables, the desired value is at the intersection of the line corresponding to the size of the population (n-1), and a column corresponding to the level of probability of an error-free forecast (95.5%; 99.7%) chosen by the researcher. In medical research, when establishing confidence limits for any indicator, the probability of an error-free forecast is 95.5% or more. This means that the value of the indicator obtained on the sample population must be found in the general population in at least 95.5% of cases.

    Questions on the topic of the lesson:

    The relevance of indicators of the diversity of a trait in the statistical population.

    General characteristics of the absolute indicators of variation.

    Standard deviation, calculation, application.

    Relative indicators of variation.

    Median, quartile score.

    Evaluation of the statistical significance of the results of the study.

    Standard error of the arithmetic mean, calculation formula, example of use.

    Calculation of the share and its standard error.

    The concept of confidence probability, an example of use.

10. The concept of confidence interval, its application.

    Test tasks on the topic with sample answers:

1. ABSOLUTE INDICATORS OF VARIATION ARE

1) coefficient of variation

2) oscillation coefficient

4) median

2. RELATIVE INDICATORS OF VARIATION ARE

1) dispersion

4) coefficient of variation

3. A CRITERION DETERMINED BY THE EXTREME VALUES OF A VARIANT IN A VARIATIONAL SERIES

2) amplitude

3) dispersion

4) coefficient of variation

4. THE DIFFERENCE OF THE EXTREME OPTION IS

2) amplitude

3) standard deviation

4) coefficient of variation

5. MEAN SQUARE OF DEVIATIONS OF INDIVIDUAL SIGNIFICANT VALUES FROM ITS AVERAGE VALUE IS

1) oscillation coefficient

2) median

3) dispersion

6. RATIO OF THE RANGE OF VARIATION TO THE AVERAGE VALUE OF A FEATURE IS

1) coefficient of variation

2) standard deviation

4) oscillation coefficient

7. RATIO OF THE MEAN SQUARE DEVIATION TO THE AVERAGE VALUE OF A FEATURE IS

1) dispersion

2) coefficient of variation

3) oscillation coefficient

4) amplitude

8. A VARIANT THAT IS IN THE MIDDLE OF A VARIATION SERIES AND DIVIDES IT INTO TWO EQUAL PARTS IS

1) median

3) amplitude

9. IN MEDICAL RESEARCH, WHEN ESTABLISHING CONFIDENCE LIMITS OF ANY INDICATOR, THE PROBABILITY OF AN ERROR-FREE PREDICTION IS ACCEPTED

10. IF 90 SAMPLES OUT OF 100 GIVE A CORRECT ESTIMATE OF A PARAMETER IN A GENERAL POPULATION, THEN THIS MEANS THAT THE CONFIDENCE PROBABILITY P EQUAL

11. IN THE EVENT IF 10 SAMPLES OUT OF 100 GIVE AN INCORRECT ESTIMATE, THE PROBABILITY OF ERROR IS

12. THE LIMITS OF AVERAGE OR RELATIVE VALUES, THERE IS A MINOR PROBABILITY TO GO BEYOND THE LIMITS OF WHICH

1) confidence interval

2) amplitude

4) coefficient of variation

13. A SMALL SAMPLE IS CONSIDERED THAT POPULATION IN WHICH

1) n is less than or equal to 100

2) n is less than or equal to 30

3) n is less than or equal to 40

4) n is close to 0

14. FOR THE PROBABILITY OF ERROR-FREE FORECAST 95% CRITERION VALUE t COMPOSES

15. FOR THE PROBABILITY OF ERROR-FREE FORECAST 99% CRITERION VALUE t COMPOSES

16. FOR DISTRIBUTIONS CLOSE TO NORMAL, THE POPULATION IS CONSIDERED HOMOGENEOUS IF THE COEFFICIENT OF VARIATION DOES NOT EXCEED

17. OPTION SEPARATING VARIANTS WHICH NUMERICAL VALUES DO NOT EXCEED 25% OF THE MAXIMUM POSSIBLE IN THIS ROW IS

2) lower quartile

3) upper quartile

4) quartile

18. DATA THAT DO NOT DISTORT AND CORRECTLY REFLECT OBJECTIVE REALITY IS CALLED

1) impossible

2) equally possible

3) reliable

4) random

19. ACCORDING TO THE THREE-SIGM RULE, WITH A NORMAL DISTRIBUTION OF A SIGN WITHIN
WILL BE LOCATED

1) 68.3% option

When statistical testing of hypotheses, when measuring a linear relationship between random variables.

Standard deviation:

Standard deviation(an estimate of the standard deviation of the random variable Floor, walls around us and the ceiling, x relative to its mathematical expectation based on an unbiased estimate of its variance):

where - variance; - The floor, the walls around us and the ceiling, i-th sample element; - sample size; - arithmetic mean of the sample:

It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, an estimate based on an unbiased variance estimate is consistent.

three sigma rule

three sigma rule() - almost all values ​​of a normally distributed random variable lie in the interval . More strictly - with no less than 99.7% certainty, the value of a normally distributed random variable lies in the specified interval (provided that the value is true, and not obtained as a result of sample processing).

If the true value is unknown, then you should use not, but the floor, the walls around us and the ceiling, s. Thus, the rule of three sigma is translated into the rule of three Floor, walls around us and the ceiling, s .

Interpretation of the value of the standard deviation

A large value of the standard deviation shows a large spread of values ​​in the presented set with the average value of the set; a small value, respectively, indicates that the values ​​in the set are grouped around the average value.

For example, we have three number sets: (0, 0, 14, 14), (0, 6, 8, 14) and (6, 6, 8, 8). All three sets have mean values ​​of 7 and standard deviations of 7, 5, and 1, respectively. The last set has a small standard deviation because the values ​​in the set are clustered around the mean; the first set has the largest value of the standard deviation - the values ​​within the set strongly diverge from the average value.

In a general sense, the standard deviation can be considered a measure of uncertainty. For example, in physics, the standard deviation is used to determine the error of a series of successive measurements of some quantity. This value is very important for determining the plausibility of the phenomenon under study in comparison with the value predicted by the theory: if the mean value of the measurements differs greatly from the values ​​predicted by the theory (large standard deviation), then the obtained values ​​or the method of obtaining them should be rechecked.

Practical use

In practice, the standard deviation allows you to determine how much the values ​​in the set can differ from the average value.

Climate

Suppose there are two cities with the same average daily maximum temperature, but one is located on the coast and the other is inland. Coastal cities are known to have many different daily maximum temperatures less than inland cities. Therefore, the standard deviation of the maximum daily temperatures in the coastal city will be less than in the second city, despite the fact that the average value of this value is the same for them, which in practice means that the probability that the maximum air temperature of each particular day of the year will be stronger differ from the average value, higher for a city located inside the continent.

Sport

Let's assume that there are several football teams that are ranked according to some set of parameters, for example, the number of goals scored and conceded, scoring chances, etc. It is most likely that the best team in this group will have the best values ​​in more parameters. The smaller the team's standard deviation for each of the presented parameters, the more predictable the team's result is, such teams are balanced. On the other hand, a team with a large standard deviation has a hard time predicting the result, which in turn is explained by an imbalance, for example, a strong defense but a weak attack.

The use of the standard deviation of the team parameters allows one to predict the result of the match of two teams to some extent, evaluating the strengths and weaknesses of the teams, and hence the chosen methods of struggle.

Technical analysis

see also

Literature

* Borovikov, V. STATISTICS. The art of computer data analysis: For professionals / V. Borovikov. - St. Petersburg. : Peter, 2003. - 688 p. - ISBN 5-272-00078-1.



Similar articles