The book logically completes the reference publications “Applied Statistics: Fundamentals of Modeling and Primary Data Processing” (1983) and “Applied Statistics: Study of Dependencies” (1985). The problems of object classification and dimension reduction are considered. Much attention is paid to exploratory statistical analysis.
For specialists who use data analysis methods.

Effect of essential multidimensionality.
The essence of this principle is that the conclusions obtained as a result of the analysis and classification of a set of statistically surveyed (by a number of properties) objects must be based simultaneously on the totality of these interrelated properties with mandatory consideration of the structure and nature of their connections. In 15], the nature of the effect of significant multidimensionality is explained with the following example: an attempt to distinguish between two types of consumer behavior of families, based on the sequential application of the Student’s homogeneity criterion 112, paragraph 11.2.81, first on one basis (unit food costs), then on another (unit expenditures on industrial goods and services) did not give a result, while a multivariate analogue of this criterion, based on the so-called Mahalanobis distance and taking into account simultaneously the values ​​of both mentioned characteristics and the nature of the statistical relationship between them, gives the correct result (i.e., it statistically detects significant difference between the two analyzed sets of families). We find the formulation of the essence of this principle already in the mentioned work of V.I. Lenin.

Objecting to the classification of peasant farms in isolation for each of the analyzed characteristics with a focus on their average values, he writes: “The characteristics for distinguishing these types should be taken in accordance with local conditions and forms of agriculture; If in extensive grain farming one can limit oneself to grouping by crop (or by draft animals), then under other conditions it is necessary to take into account the sowing of industrial plants, the technical processing of agricultural products, the sowing of root crops or forage grasses, dairy farming, gardening, etc.


Download the e-book for free in a convenient format, watch and read:
Download the book Applied Statistics, Classifications and Dimensionality Reduction, Ayvazyan S.A., Buchstaber V.M., Enyukov I.S., Meshalkin L.D., 1989 - fileskachat.com, fast and free download.

  • Applied statistics, Basics of modeling and primary data processing, Ayvazyan S.A., Enyukov I.S., Meshalkin L.D., 1983

Reading: 1-2 module 3 courses
Prerequisites: Methods of preliminary statistical analysis or knowledge of statistics at a basic level
Labor intensity: 5 credits

76 classroom hours:

  • 28 hours of lectures;
  • 48 hours of practical training.

Forms of control:

  • exam;
  • 2 homework


Teachers

About the course

Methods for analyzing the type of dependence and the degree of relationship between variables are widely used in various fields of applied statistical research.
The course covers methods correlation analysis to assess the presence and degree of statistical relationship between signs of different nature, to determine the structure of the connection. The regression analysis section examines the problems of estimating and testing the significance of parameters of linear and nonlinear regression models, regression models with variable structure, typological regression and binary choice models, and systems of simultaneous equations. Dependency modeling is illustrated with examples based on real data.

The knowledge and skills acquired in the course will allow you to solve a wide range of problems to create an information basis for decision-making in various fields of knowledge and practical activity.

The study of dependencies is the main occupation of experimenters in any field of knowledge. An object under study, especially one as complex as a biological one, cannot be studied in its entirety. It is necessary to highlight certain cause-and-effect relationships in it, which are formalized in the form of dependencies. The dependence of effects on causes or the dependence between several effects due to a common cause are studied.

A special case is the dependence of any attribute of an object on time– Chapter 7 was devoted to the study of such dependencies. In this (eighth) chapter, on the contrary, we will mainly consider static dependencies in the description of which time does not participate, and yet the subject of this chapter is extremely extensive. Due to the limited scope of the course, it will be necessary to present only the “skeleton” of the topic under consideration. It is hoped that readers will become familiar with the specific issues in the study of dependencies through their own research, using the extensive literature on various aspects of this complex task, as well as available software.

For example, a thorough reference book, which is difficult for initial acquaintance, is devoted to the topic directly under consideration. A simpler source could be a textbook. Quite simply and briefly, from an applied perspective, the issues of dependency research are discussed in the brochure. Modern methods processing of experimental data is presented in the monograph. However, along with complex statistical methods of data analysis and processing, in many cases methods of visual “exploratory analysis” are useful, which will not be considered here, although they, of course, should not be forgotten either.

8.2. General structure of an experiment to study dependencies

In the general formulation of the problem of studying dependencies, it is assumed (Fig. 8.1) that the object under study is affected by many factors(in the previous chapter the term was used in almost the same sense stimulus), and the result of this influence is response, V general case also multi-component. Among the parameters characterizing the components of impact and response, generally speaking, there can be quantitative, ordinal, and classification, and, of course, the types of scales used greatly influence the methodology of the experiment and data processing.

Some of the factors (more precisely - parameters factors, but in what follows we will not follow the strictness of expressions) can be specified or measured; the values ​​of others usually remain unknown - they introduce uncertainty into the object’s response to changes in controlled factors. Added to this uncertainty is the uncertainty in the measurement (or classification) of the response components. The behavior of the object itself also does not have to be completely deterministic. All this leads to the need to widely use methods of mathematical statistics.

Thus, we can say that the mathematical apparatus for studying dependencies is aimed at solving the problem: how, based on particular results statistical observation of the analyzed events, identify and describe the stochastic (probabilistic) connections existing between them.

To reduce formulas when studying dependencies, you can consider independent (“predictor”) variables x 1 x k as vector components x, and the dependent variables y 1 y m– as vector components y. Quite often you can limit yourself to studying addiction one variable y from k vector components x(or consider y 1 y m separately, as if dividing a single experiment into m private experiments).

The reader is invited to a book that continues the implementation of the authors’ plan: to create a multi-volume reference manual on modern mathematical methods of statistical data processing, including simultaneous coverage of the necessary mathematical apparatus corresponding software Computers and recommendations for overcoming computational difficulties associated with the use of the described methods and algorithms. The book is addressed to specialists in various fields of human activity who use methods of mathematical statistics and data analysis in their work.

To understand the material in the book, the reader only needs to have mathematical training in the scope of the programs of an economics or technical university or become familiar with the basic concepts of probability theory and mathematical statistics described in the first volume of the reference book. In turn, mastering the material in the proposed book can serve as a reliable and convenient basis for deeper penetration into the subject of research, based on the study of special monographs and journal articles.

The theme of the book is undoubtedly central to the entire reference work. It is such both in the depth and diversity of the mathematical apparatus developed to date, and in the proportion of the use of the described methods and models in practical developments of various profiles.

The main goal that the autos set for themselves was to equip the researcher who uses in their work statistical methods, the tools necessary to solve the key problem of any research: how, based on the partial results of statistical observation of the analyzed events or indicators, to identify and describe the relationships that exist between them. It is this problem, the problem of statistical research of dependencies, that turns out to be the main one in solving such typical practical problems as standardization, forecasting, planning, diagnostics, assessment of characteristics of the analyzed system that are difficult to directly observe and measure, assessment of the operating efficiency or quality of an object, regulation of process or system parameters .

The authors strived for an objectively balanced presentation of the material both in the structure of the book and in its content. However, the breadth and diversity of the problem raised does not allow them to claim comprehensive coverage of the topic. For example, it is relatively narrowly represented in this volume topics of statistical analysis of dynamic dependencies; no description is given of the apparatus of logical decision rules, which is very useful in certain types of problems; The book did not include material on the topic of planning regression experiments, which is relevant in practical terms (especially in problems of technological process control).

The book consists of an introduction and four sections.

The introduction plays a special role in understanding the methods described later and the logic of the entire book as a whole. We can say that it presents the content and logical connections of all parts of the book in a form accessible to the inexperienced reader. The main statements of problems and the “addresses” (in the book) of their solutions are given. The presentation is illustrated simple examples. Therefore, we recommend that the relatively poorly prepared reader take the time to read the introduction.

Section I is devoted to methods and techniques that allow us to answer the questions: is there any connection at all between the variables under study, how to measure their closeness, and what is the structure of the relationships between the indicators of the set under study? In this case, structure is understood as the nature of all possible pairwise binary relationships of the characteristics under consideration (of the type “there is a connection” or “there is no connection”), but not the form of dependence of one on the other. The methods described in this section form the content of correlation analysis.

Section II contains a description of methods and models that allow us to study the type of dependence of the “output” (or “resulting”) quantitative indicator that interests us on a set of explanatory variables of a quantitative nature ( regression analysis). A separate chapter (Chapter 12) considers the case when “time” plays the role of an explanatory variable.

IN section III the same problems are solved as in section II, but in a situation where non-quantitative or simultaneously non-quantitative and quantitative characteristics act as explanatory variables (analysis of variance and covariance).

Finally, Section IV includes a chapter devoted to the description of methods for statistical analysis of so-called systems of simultaneous econometric equations (i.e., a set of simultaneously executed relationships in which the same variables can participate in different relationships: both as a resulting indicator and as a predictor variable), and a chapter that provides an overview of the most interesting domestic and foreign software methods for statistical research of dependencies.

To narrow down the search results, you can refine your query by specifying the fields to search for. The list of fields is presented above. For example:

You can search in several fields at the same time:

Logical operators

The default operator is AND.
Operator AND means that the document must match all elements in the group:

research development

Operator OR means that the document must match one of the values ​​in the group:

study OR development

Operator NOT excludes documents containing this element:

study NOT development

Search type

When writing a query, you can specify the method in which the phrase will be searched. Four methods are supported: search taking into account morphology, without morphology, prefix search, phrase search.
By default, the search is performed taking into account morphology.
To search without morphology, just put a “dollar” sign in front of the words in the phrase:

$ study $ development

To search for a prefix, you need to put an asterisk after the query:

study *

To search for a phrase, you need to enclose the query in double quotes:

" research and development "

Search by synonyms

To include synonyms of a word in the search results, you need to put a hash " # " before a word or before an expression in parentheses.
When applied to one word, up to three synonyms will be found for it.
When applied to a parenthetical expression, a synonym will be added to each word if one is found.
Not compatible with morphology-free search, prefix search, or phrase search.

# study

Grouping

In order to group search phrases you need to use brackets. This allows you to control the Boolean logic of the request.
For example, you need to make a request: find documents whose author is Ivanov or Petrov, and the title contains the words research or development:

Approximate word search

For an approximate search you need to put a tilde " ~ " at the end of a word from a phrase. For example:

bromine ~

When searching, words such as "bromine", "rum", "industrial", etc. will be found.
You can additionally specify maximum amount possible edits: 0, 1 or 2. For example:

bromine ~1

By default, 2 edits are allowed.

Proximity criterion

To search by proximity criterion, you need to put a tilde " ~ " at the end of the phrase. For example, to find documents with the words research and development within 2 words, use the following query:

" research development "~2

Relevance of expressions

To change the relevance of individual expressions in the search, use the " sign ^ " at the end of the expression, followed by the level of relevance of this expression in relation to the others.
The higher the level, the more relevant the expression is.
For example, in this expression, the word “research” is four times more relevant than the word “development”:

study ^4 development

By default, the level is 1. Valid values ​​are a positive real number.

Search within an interval

To indicate the interval in which the value of a field should be located, you should indicate the boundary values ​​in parentheses, separated by the operator TO.
Lexicographic sorting will be performed.

Such a query will return results with an author starting from Ivanov and ending with Petrov, but Ivanov and Petrov will not be included in the result.
To include a value in a range, use square brackets. To exclude a value, use curly braces.