2. The concept of distribution series. Discrete and interval distribution series

Distribution rows are called groupings of a special type in which for each characteristic, group of characteristics or class of characteristics the number of units in the group or the proportion of this number in the total is known. Those. distribution series– an ordered set of attribute values, arranged in ascending or descending order with their corresponding weights. Distribution series can be constructed either by quantitative or attribute characteristics.

Distribution series constructed on a quantitative basis are called variation series. They are discrete and interval. A distribution series can be constructed based on a continuously varying characteristic (when the characteristic can take any values ​​within any interval) and on a discretely varying characteristic (it takes strictly defined integer values).

Discrete variation series distribution is a ranked set of options with their corresponding frequencies or particulars. Variants of a discrete series are discretely continuously changing values ​​of a characteristic, usually the result of a count.

Discrete

Variation series are usually constructed if the values ​​of the characteristic being studied may differ from each other by no less than a certain finite amount. In discrete series, point values ​​of a characteristic are specified. Example : Distribution men's suits sold by stores per month by size.

Interval

variation series is an ordered set of intervals of varying values random variable with the corresponding frequencies or frequencies of occurrences of value values ​​in each of them. Interval series are designed to analyze the distribution of a continuously changing characteristic, the value of which is most often recorded by measurement or weighing. Variants of such a series are groupings.

Example : Distribution of purchases in a grocery store by amount.

If in discrete variation series the frequency response relates directly to a variant of the series, then in interval series it refers to a group of variants.

It is convenient to analyze distribution series using their graphical representation, which allows one to judge the shape of the distribution and patterns. Discrete series depicted on the graph as a broken line – distribution polygon. To construct it, in a rectangular coordinate system, the ranked (ordered) values ​​of the varying characteristic are plotted along the abscissa axis on the same scale, and a scale for expressing frequencies is plotted along the ordinate axis.

Interval series are depicted as distribution histograms(that is, bar charts).

When constructing a histogram, the values ​​of the intervals are plotted on the abscissa axis, and the frequencies are depicted by rectangles built on the corresponding intervals. The height of the columns in the case of equal intervals should be proportional to the frequencies.

Any histogram can be converted into a distribution polygon; to do this, it is necessary to connect the vertices of its rectangles with straight segments.

2. Index method for analyzing the influence of average output and average number to changes in production volume

Index method used to analyze the dynamics and compare general indicators, as well as factors influencing changes in the levels of these indicators. Using indices, it is possible to identify the influence of average output and average headcount on changes in production volume. This problem is solved by constructing a system of analytical indices.

The production volume index is related to the average number of employees and the average output index in the same way as production volume (Q) is related to output ( w) and numbers ( r) .

We can conclude that the volume of production will be equal to the product of average output and average headcount:

Q = w r, where Q is the volume of production,

w - average output,

r – average number of employees.

As seen, we're talking about about the relationship of phenomena in statics: the product of two factors gives the total volume of the resulting phenomenon. It is also obvious that this connection is functional; therefore, the dynamics of this connection are studied using indices. For the example given, this is the following system:

Jw × Jr = Jwr.

For example, the production volume index Jwr, as an index of a productive phenomenon, can be decomposed into two factor indexes: the average output index (Jw), and the average headcount index (Jr):

Index Index Index

volume of average payroll

production output number

Where J w- labor productivity index calculated using the Laspeyres formula;

J r- index of the number of employees, calculated using the Paasche formula.

Index systems are used to determine the influence of individual factors on the formation of the level of an effective indicator, allowing for 2 known values indexes to determine the value of the unknown.

Based on the above system of indices, one can also find the absolute increase in production volume, decomposed into the influence of factors.

1. General increase in production volume:

∆wr = ∑w 1 r 1 - ∑w 0 r 0 .

2. Increase due to the action of the average output indicator:

∆wr/w = ∑w 1 r 1 - ∑w 0 r 1 .

3. Increase due to the action of the average headcount indicator:

∆wr/r = ∑w 0 r 1 - ∑w 0 r 0

∆wr = ∆wr/w + ∆wr/r.

Example. The following data is known

We can determine how production volume has changed in relative and absolute terms and how individual factors influenced this change.

The volume of production was:

in the base period

w 0 * r 0 = 2000 * 90 = 180000,

and in the reporting

w 1 * r 1 = 2100 * 100 = 210000.

Consequently, the volume of production increased by 30,000 or 1.16%.

∆wr=∑w 1 r 1 -∑w 0 r 0= (210000-180000)=30000

or (210000:180000)*100%=1.16%.

This change in production volume was due to:

1) an increase in the average headcount by 10 people or 111.1%

r 1 / r 0 = 100 / 90 = 1.11 or 111.1%.

In absolute terms, due to this factor, the volume of production increased by 20,000:

w 0 r 1 – w 0 r 0 = w 0 (r 1 -r 0) = 2000 (100-90) = 20000.

2) an increase in average output by 105% or 10,000:

w 1 r 1 /w 0 r 1 = 2100*100/2000*100 = 1.05 or 105%.

In absolute terms, the increase is:

w 1 r 1 – w 0 r 1 = (w 1 -w 0)r 1 = (2100-2000)*100 = 10000.

Hence, the combined influence of factors was:

1. In absolute terms

10000 + 20000 = 30000

2. In relative terms

1,11 * 1,05 = 1,16 (116%)

Therefore, the increase is 1.16%. Both results were obtained previously.

The word “index” in translation means pointer, indicator. In statistics, an index is interpreted as a relative indicator that characterizes a change in a phenomenon in time, space, or compared to a plan. Since the index is a relative value, the names of the indices are consonant with the names of the relative values.

In cases where we analyze changes over time in compared products, we can raise the question of how the components of the index (price, physical volume, structure of production or sales of individual types of products) change under different conditions (in different areas). In this regard, indices of constant composition, variable composition, and structural changes are constructed.

Index of permanent (fixed) composition – This is an index that characterizes the dynamics of the average value for the same fixed structure of the population.

The principle of constructing an index of constant composition is to eliminate the impact of changes in the structure of weights on the indexed value by calculating the weighted average level of the indexed indicator with the same weights.

The constant composition index is identical in form to the aggregate index. The aggregate form is the most common.

The index of constant composition is calculated with weights fixed at the level of one period and shows the change only in the indexed value. The index of constant composition eliminates the impact of changes in the structure of weights on the indexed value by calculating the weighted average level of the indexed indicator with the same weights. Indices of constant composition compare indicators calculated on the basis of an unchanged structure of phenomena.

If the random variable under study is continuous, then ranking and grouping of observed values ​​often does not allow identifying character traits varying its values. This is explained by the fact that individual values ​​of a random variable can differ from each other as little as desired and therefore in the totality of observed data same values values ​​may occur rarely, and the frequencies of the variants differ little from each other.

It is also impractical to construct a discrete series for a discrete random variable, the number possible values which is great. In such cases, you should build interval variation series distributions.

To construct such a series, the entire interval of variation of the observed values ​​of a random variable is divided into a series partial intervals and counting the frequency of occurrence of the value values ​​in each partial interval.

Interval variation series call an ordered set of intervals of varying values ​​of a random variable with corresponding frequencies or relative frequencies of values ​​of the variable falling into each of them.

To build an interval series you need:

  1. define size partial intervals;
  2. define width intervals;
  3. set it for each interval top And lower limit ;
  4. group the observation results.

1 . The question of choosing the number and width of grouping intervals has to be decided in each specific case based on goals research, volume samples and degree of variation characteristic in the sample.

Approximately number of intervals k can be estimated based only on sample size n in one of the following ways:

  • according to the formula Sturges : k = 1 + 3.32 log n ;
  • using table 1.

Table 1

2 . Spaces of equal width are generally preferred. To determine the width of intervals h calculate:

  • range of variation R - sample values: R = x max - x min ,

Where xmax And xmin - maximum and minimum sampling options;

  • width of each interval h determined by the following formula: h = R/k .

3 . Bottom line first interval x h1 is selected so that the minimum sample option xmin fell approximately in the middle of this interval: x h1 = x min - 0.5 h .

Intermediate intervals obtained by adding the length of the partial interval to the end of the previous interval h :

x hi = x hi-1 +h.

The construction of an interval scale based on the calculation of interval boundaries continues until the value x hi satisfies the relation:

x hi< x max + 0,5·h .

4 . In accordance with the interval scale, the characteristic values ​​are grouped - for each partial interval the sum of frequencies is calculated n i option included in i th interval. In this case, the interval includes values ​​of the random variable that are greater than or equal to the lower limit and less than the upper limit of the interval.

Polygon and histogram

For clarity, various statistical distribution graphs are constructed.

Based on the data of a discrete variation series, they construct polygon frequencies or relative frequencies.

Frequency polygon x 1 ; n 1 ), (x 2 ; n 2 ), ..., (x k ; n k ). To construct a frequency polygon, options are plotted on the abscissa axis. x i , and on the ordinate - the corresponding frequencies n i . Points ( x i ; n i ) are connected by straight segments and a frequency polygon is obtained (Fig. 1).

Polygon of relative frequencies called a broken line whose segments connect points ( x 1 ; W 1 ), (x 2 ; W 2 ), ..., (x k ; Wk ). To construct a polygon of relative frequencies, options are plotted on the abscissa axis x i , and on the ordinate - the corresponding relative frequencies W i . Points ( x i ; W i ) are connected by straight segments and a polygon of relative frequencies is obtained.

When continuous sign it is advisable to build histogram .

Frequency histogram called a stepped figure consisting of rectangles, the bases of which are partial intervals of length h , and the heights are equal to the ratio n i/h (frequency density).

To construct a frequency histogram, partial intervals are laid out on the abscissa axis, and segments parallel to the abscissa axis are drawn above them at a distance n i/h .

Laboratory work No. 1

By mathematical statistics

Topic: Primary processing of experimental data

3. Score in points. 1

5. Test questions.. 2

6. Execution method laboratory work.. 3

Goal of the work

Acquiring skills in primary processing of empirical data using methods of mathematical statistics.

Based on the totality of experimental data, complete the following tasks:

Exercise 1. Construct an interval variation distribution series.

Task 2. Construct a histogram of frequencies of an interval variation series.

Task 3. Create an empirical distribution function and plot a graph.

a) mode and median;

b) conditional initial moments;

c) sample average;

d) sample variance, corrected variance population, corrected mean standard deviation;

e) coefficient of variation;

f) asymmetry;

g) kurtosis;

Task 5. Determine the boundaries of the true values ​​of the numerical characteristics of the random variable being studied with a given reliability.

Task 6. Content-based interpretation of the results of primary processing according to the conditions of the task.

Score in points

Tasks 1-56 points

Task 62 points

Defense of laboratory work(oral interview on test questions and laboratory work) - 2 points

The work must be submitted in written form on A4 sheets and includes:

1) Title page(Annex 1)

2) Initial data.

3) Submission of work according to the specified sample.

4) Calculation results (done manually and/or using MS Excel) in the specified order.

5) Conclusions - meaningful interpretation of the results of primary processing according to the conditions of the problem.

6) Oral interview on work and control questions.



5. Test questions


Methodology for performing laboratory work

Task 1. Construct an interval variational distribution series

In order to present statistical data in the form of a variation series with equally spaced options, it is necessary:

1.In the original data table, find the smallest and largest values.

2.Define range of variation :

3. Determine the length of the interval h, if the sample contains up to 1000 data, use the formula: , where n – sample size – the amount of data in the sample; for calculations take lgn).

The calculated ratio is rounded to convenient integer value .

4. To determine the beginning of the first interval for an even number of intervals, it is recommended to take the value ; and for an odd number of intervals .

5. Write down the grouping intervals and arrange them in ascending order of boundaries

, ,………., ,

where is the lower limit of the first interval. A convenient number is taken that is no greater than , the upper limit of the last interval should be no less than . It is recommended that the intervals contain the initial values ​​of the random variable and be separated from 5 to 20 intervals.

6. Write down the initial data on grouping intervals, i.e. use the source table to calculate the number of random variable values ​​falling within the specified intervals. If some values ​​coincide with the boundaries of the intervals, then they are attributed either only to the previous or only to the subsequent interval.

Note 1. The intervals do not have to be equal in length. In areas where the values ​​are denser, it is more convenient to take smaller, short intervals, and where there are less frequent intervals, larger ones.

Note 2.If for some values ​​“zero” or small frequency values ​​are obtained, then it is necessary to regroup the data, enlarging the intervals (increasing the step).

The simplest way to summarize statistical material is to construct series. Summary result statistical research there may be distribution series. A distribution series in statistics is an ordered distribution of population units into groups according to any one characteristic: qualitative or quantitative. If a series is constructed on a qualitative basis, then it is called attributive, and if on a quantitative basis, then it is called variational.

A variation series is characterized by two elements: variant (X) and frequency (f). A variant is a separate value of a characteristic of an individual unit or group of a population. A number showing how many times a given attribute value occurs is called frequency. If frequency is expressed as a relative number, then it is called frequency. A variation series can be intervalal, when the boundaries “from” and “to” are defined, or it can be discrete, when the characteristic being studied is characterized by a certain number.

Let's look at the construction of variation series using examples.

Example. and there is data on the tariff categories of 60 workers in one of the plant’s workshops.

Distribute workers according to tariff category, build a variation series.

To do this, we write down all the values ​​of the characteristic in ascending order and count the number of workers in each group.

Table 1.4

Distribution of workers by category

Worker Rank (X)

Number of workers

person (f)

in % of the total (particularly)

We received a variational discrete series in which the characteristic being studied (the worker’s rank) is represented by a certain number. For clarity, variation series are depicted graphically. Based on this distribution series, a distribution surface was constructed.

Rice. 1.1. Polygon for distribution of workers by tariff category

We will consider the construction of an interval series with equal intervals using the following example.

Example. Data are known on the value of fixed capital of 50 companies in million rubles. It is required to show the distribution of firms by cost of fixed capital.

To show the distribution of firms by cost of fixed capital, we first solve the question of the number of groups that we want to highlight. Suppose we decided to identify 5 groups of enterprises. Then we determine the size of the interval in the group. To do this, we use the formula

According to our example.

By adding the value of the interval to the minimum value of the attribute, we obtain groups of firms by cost of fixed capital.

Unit having double meaning, belongs to the group where it acts as the upper limit (i.e., the value of the attribute 17 will go to the first group, 24 to the second, etc.).

Let's count the number of factories in each group.

Table 1.5

Distribution of firms by value of fixed capital (million rubles)

Cost of fixed capital
in million rubles (X)

Number of firms
(frequency) (f)

Accumulated frequencies
(cumulative)

According to this distribution, a variational interval series was obtained, from which it follows that 36 firms have fixed capital worth from 10 to 24 million rubles. etc.

Interval distribution series can be represented graphically in the form of a histogram.

The results of data processing are presented in statistical tables. Statistical tables contain their own subject and predicate.

The subject is the totality or part of the totality that is being characterized.

Predicates are indicators that characterize the subject.

Tables are distinguished: simple and group, combinational, with simple and complex development of the predicate.

A simple table in the subject contains a list of individual units.

If the subject contains a grouping of units, then such a table is called a group table. For example, a group of enterprises by number of workers, population groups by gender.

The subject of the combination table contains grouping according to two or more characteristics. For example, the population is divided by gender into groups by education, age, etc.

Combination tables contain information that allows one to identify and characterize the relationship of a number of indicators and the pattern of their changes both in space and time. To make the table clear when developing its subject, limit yourself to two or three characteristics, forming a limited number of groups for each of them.

The predicate in tables can be developed in different ways. With a simple development of the predicate, all its indicators are located independently of each other.

With complex development of the predicate, the indicators are combined with each other.

When constructing any table, one must proceed from the purposes of the study and the content of the processed material.

In addition to tables, statistics also use graphs and diagrams. Chart – statistical data is depicted using geometric shapes. Charts are divided into linear and bar charts, but there can be figured charts (drawings and symbols), pie charts (a circle is taken as the size of the entire population, and the areas of individual sectors display the specific gravity or proportion of its components), radial charts (built on the basis of polar ordinates ). The cartogram is a combination contour map or a site plan with a diagram.

Positioning the data statistical observation, characterizing this or that phenomenon, first of all it is necessary to order them, i.e. give a systematic character

English statistician. UJReichman figuratively said about disordered collections that encountering a mass of ungeneralized data is equivalent to a situation where a person is thrown into a thicket without a compass. What is the systematization of statistical data in the form of distribution series?

The statistical series of distributions are ordered statistical aggregates (Table 17). The simplest type of statistical distribution series is a ranked series, i.e. a series of numbers in ascending or descending order, varying the characteristics. Such a series does not allow one to judge the patterns inherent in the distributed data: which value has the majority of indicators grouped, what deviations there are from this value; as well as the general distribution picture. For this purpose, data are grouped, showing how often individual observations occur in their total number (Scheme 1a 1).

. Table 17

. General form statistical series distribution

. Scheme 1. Statistical scheme distribution series

The distribution of population units according to characteristics that do not have quantitative expression is called attributive series(for example, distribution of enterprises by their production area)

The series of distribution of population units according to characteristics, have a quantitative expression, are called variation series. In such series, the value of the characteristic (options) are in ascending or descending order

In the variational distribution series, two elements are distinguished: variant and frequency . Option- this is a separate meaning of the grouping characteristics frequency- a number that shows how many times each option occurs

In mathematical statistics, one more element of the variation series is calculated - partly. The latter is defined as the ratio of the frequency of cases of a given interval to the total sum of frequencies; the part is determined in fractions of a unit, percent (%) in ppm (%o)

Thus, a variation distribution series is a series in which the options are arranged in ascending or descending order, and their frequencies or frequencies are indicated. Variation series are discrete (intervals) and other intervals (continuous).

. Discrete variation series- these are distribution series in which the variant as the value of a quantitative characteristic can only take on a certain value. Options differ from each other by one or more units

Thus, the number of parts produced per shift by a specific worker can be expressed only by one specific number (6, 10, 12, etc.). An example of a discrete variation series could be the distribution of workers by the number of parts produced (Table 18 18).

. Table 18

. Discrete series distribution _

. Interval (continuous) variation series- such distribution series in which the value of the options are given in the form of intervals, i.e. the values ​​of the features can differ from each other by an arbitrarily small amount. When constructing a variation series of NEP peri-variant characteristics, it is impossible to indicate each value of the variant, so the population is distributed over intervals. The latter can be equal or unequal. For each of them, frequencies or frequencies are indicated (Table 1 9 19).

In interval series of distributions with unequal intervals, mathematical characteristics such as distribution density and relative density distributions over a given interval. The first characteristic is determined by the ratio of frequency to the value of the same interval, the second - by the ratio of frequency to the value of the same interval. For the example above, the distribution density in the first interval will be 3: 5 = 0.6, and the relative density in this interval is 7.5: 5 = 1.55%.

. Table 19

. Interval distribution series _