Material from Wikipedia - the free encyclopedia

Mann-Whitney U test(English) Mann - Whitney U-test) - statistical criterion used to assess differences between two independent samples in terms of the level of any characteristic measured quantitatively. Allows you to identify differences in parameter values ​​between small samples.

Other names: Mann-Whitney-Wilcoxon test Mann-Whitney-Wilcoxon, MWW ), Wilcoxon rank sum test (eng. Wilcoxon rank-sum test) or Wilcoxon-Mann-Whitney test (eng. Wilcoxon - Mann - Whitney test ). Less common: criterion for the number of inversions.

Story

This method of identifying differences between samples was proposed in 1945 by Frank Wilcoxon ( F. Wilcoxon). In 1947 it was substantially revised and expanded by H. B. Mann ( H. B. Mann) and D. R. Whitney ( D. R. Whitney), by whose names today it is usually called.

Description of criterion

Simple nonparametric test. The power of the test is higher than that of the Rosenbaum Q test.

This method determines whether the area of ​​overlapping values ​​between two series (a ranked series of parameter values ​​in the first sample and the same in the second sample) is small enough. The lower the criterion value, the more likely it is that the differences between the parameter values ​​in the samples are reliable.

Limitations on the Applicability of the Criterion

  1. Each sample must have at least 3 characteristic values. It is allowed that there are two values ​​in one sample, but then in the second there are at least five.
  2. There should be no matching values ​​in the sample data (all numbers are different) or very few such matches.

Using the criterion

To apply the Mann-Whitney U test, you need to perform the following operations.

  1. Compose a single ranked series from both compared samples, arranging their elements according to the degree of growth of the characteristic and assigning a lower rank to the smaller value. The total number of ranks will be equal to: N=n_1+n_2, Where n_1 is the number of elements in the first sample, and n_2- the number of elements in the second sample.
  2. Divide the single ranked series into two, consisting respectively of the units of the first and second samples. Calculate separately the sum of ranks that fall on the share of the elements of the first sample, and separately - on the share of the elements of the second sample. Define big of two rank sums ( T_x), corresponding to the sample with n_x elements.
  3. Determine the value of the Mann-Whitney U test using the formula: U=n_1\cdot n_2+\frac(n_x\cdot(n_x+1))(2)-T_x.
  4. Using the table for the selected level of statistical significance, determine the critical value of the criterion for the data n_1 And n_2. If the received value U less tabular or equal to it, then the presence of a significant difference between the level of the attribute in the samples under consideration is recognized (the alternative hypothesis is accepted). If the resulting value U greater than the table, the null hypothesis is accepted. The smaller the value, the higher the reliability of the differences. U.
  5. If the null hypothesis is true, the criterion has a mathematical expectation M(U)=\frac(n_1\cdot n_2)(2) and variance D(U)=\frac(n_1\cdot n_2\cdot (n_1+n_2+1))(12) and with a sufficiently large volume of sample data (n_1>19,\;n_2>19) almost normally distributed.

Critical value table

see also

  • The Kruskal-Wallis test is a multivariate generalization of the Mann-Whitney U test.

Write a review of the article "Mann-Whitney U-test"

Notes

Literature

  • Mann H. B., Whitney D. R. On a test of whether one of two random variables is stochastically larger than the other. // Annals of Mathematical Statistics. - 1947. - No. 18. - P. 50-60.
  • Wilcoxon F. Individual Comparisons by Ranking Methods. // Biometrics Bulletin 1. - 1945. - P. 80-83.
  • Gubler E. V., Genkin A. A. Application of nonparametric statistical criteria in biomedical research. - L., 1973.
  • Sidorenko E. V. Methods of mathematical processing in psychology. - St. Petersburg, 2002.

An excerpt characterizing the Mann-Whitney U test

He forgot himself for one minute, but in this short period of oblivion he saw countless objects in his dreams: he saw his mother and her big white hand, he saw Sonya’s thin shoulders, Natasha’s eyes and laughter, and Denisov with his voice and mustache, and Telyanin , and his whole story with Telyanin and Bogdanich. This whole story was one and the same thing: this soldier with a sharp voice, and this whole story and this soldier so painfully, relentlessly held, pressed and all pulled his hand in one direction. He tried to move away from them, but they did not let go of his shoulder, not even a hair, not even for a second. It wouldn’t hurt, it would be healthy if they didn’t pull on it; but it was impossible to get rid of them.
He opened his eyes and looked up. The black canopy of night hung an arshin above the light of the coals. In this light, particles of falling snow flew. Tushin did not return, the doctor did not come. He was alone, only some soldier was now sitting naked on the other side of the fire and warming his thin yellow body.
“Nobody needs me! - thought Rostov. - There is no one to help or feel sorry for. And I was once at home, strong, cheerful, loved.” “He sighed and involuntarily groaned with a sigh.
- Oh, what hurts? - asked the soldier, shaking his shirt over the fire, and, without waiting for an answer, he grunted and added: - You never know how many people have been spoiled in a day - passion!
Rostov did not listen to the soldier. He looked at the snowflakes fluttering over the fire and remembered the Russian winter with a warm, bright house, a fluffy fur coat, fast sleighs, a healthy body and with all the love and care of his family. “And why did I come here!” he thought.
The next day, the French did not resume the attack, and the rest of Bagration’s detachment joined Kutuzov’s army.

Prince Vasily did not think about his plans. He even less thought of doing evil to people in order to gain benefit. He was only a secular man who had succeeded in the world and made a habit out of this success. He constantly, depending on the circumstances, depending on his rapprochement with people, drew up various plans and considerations, of which he himself was not well aware, but which constituted the entire interest of his life. Not one or two such plans and considerations were in his mind, but dozens, of which some were just beginning to appear to him, others were achieved, and others were destroyed. He did not say to himself, for example: “This man is now in power, I must gain his trust and friendship and through him arrange for the issuance of a one-time allowance,” or he did not say to himself: “Pierre is rich, I must lure him to marry his daughter and borrow the 40 thousand I need”; but a man in strength met him, and at that very moment instinct told him that this man could be useful, and Prince Vasily became close to him and at the first opportunity, without preparation, by instinct, flattered, became familiar, talked about what what was needed.
Pierre was under his arm in Moscow, and Prince Vasily arranged for him to be appointed a chamber cadet, which was then equivalent to the rank of state councilor, and insisted that the young man go with him to St. Petersburg and stay in his house. As if absent-mindedly and at the same time with an undoubted confidence that this should be so, Prince Vasily did everything that was necessary in order to marry Pierre to his daughter. If Prince Vasily had thought about his plans ahead, he could not have had such naturalness in his manners and such simplicity and familiarity in his relations with all the people placed above and below himself. Something constantly attracted him to people stronger or richer than himself, and he was gifted with the rare art of catching exactly the moment when it was necessary and possible to take advantage of people.
Pierre, having unexpectedly become a rich man and Count Bezukhy, after recent loneliness and carelessness, felt so surrounded and busy that he could only be left alone with himself in bed. He had to sign papers, deal with government offices, the meaning of which he had no clear idea of, ask the chief manager about something, go to an estate near Moscow and receive many people who previously did not want to know about his existence, but now would offended and upset if he didn’t want to see them. All these various persons - businessmen, relatives, acquaintances - were all equally well disposed towards the young heir; all of them, obviously and undoubtedly, were convinced of the high merits of Pierre. He constantly heard the words: “With your extraordinary kindness,” or “with your wonderful heart,” or “you yourself are so pure, Count...” or “if only he were as smart as you,” etc., so he He sincerely began to believe in his extraordinary kindness and his extraordinary mind, especially since it always seemed to him, deep down in his soul, that he was really very kind and very smart. Even people who had previously been angry and obviously hostile became tender and loving towards him. Such an angry eldest of the princesses, with a long waist, with hair smoothed like a doll’s, came to Pierre’s room after the funeral. Lowering her eyes and constantly flushing, she told him that she was very sorry for the misunderstandings that had happened between them and that now she felt she had no right to ask for anything, except permission, after the blow that had befallen her, to stay for a few weeks in the house that she loved so much and where made so many sacrifices. She couldn't help but cry at these words. Touched that this statue-like princess could change so much, Pierre took her hand and asked for an apology, without knowing why. From that day on, the princess began to knit a striped scarf for Pierre and completely changed towards him.

The U-test is a rank test, so it is invariant under any monotonic transformation of the measurement scale.

Other names: Mann-Whitney-Wilcoxon test (MWW), Wilcoxon rank-sum test or Wilcoxon-Mann-Whitney test (WMW).

Sample problems

Example 1. The first sample is patients who were treated with drug A. The second sample is patients who were treated with drug B. The values ​​​​in the samples are some characteristic of the effectiveness of treatment (metabolite level in the blood, temperature three days after the start of treatment, recovery time, number of beds). days, etc.) It is required to find out whether there is a significant difference in the effectiveness of drugs A and B, or whether the differences are purely random and explained by the “natural” dispersion of the selected characteristic.

Example 2. The first sample is fields treated with agrotechnical method A. The second sample is fields treated with agrotechnical method B. The values ​​in the samples are yields. It is necessary to find out whether one of the methods is more effective than the other, or whether the differences in yield are due to random factors.

Example 3. The first sample is the days when the supermarket had a type A promotion (red discount price tags). The second sample is days of promotion type B (every fifth pack is free). The values ​​in the samples are an indicator of the effectiveness of the promotion (sales volume or revenue in rubles). You need to find out which type of promotion is more effective.

Description of criterion

Two samples are given.

Additional assumptions:

It is sometimes mistakenly believed that the U test tests the null hypothesis that the medians in two samples are equal. There are distributions for which the hypothesis is true, but their medians are different.

The U-test can be used to test the shift hypothesis as an alternative , where is some constant other than zero. With this alternative, the U-test is consistent. It is advisable to use it if two series of measurements of two values ​​of some physical quantity are carried out with the same device. In this case, the distribution function describes the measurement errors of one value and another. However, in many applications (particularly econometric ones) there is little reason to assume that the distribution of the second sample only shifts and does not change in any other way.

The U test is a nonparametric analogue of the Student's t test. If the samples are normal, then it is preferable to use the more powerful Student's test to test the shift hypothesis.

Story

This method of identifying differences between samples was proposed in 1945 by Frank Wilcoxon. In 1947, it was substantially revised and expanded by Mann and Whitney, by whose names it is commonly called today.

Literature

  1. Mann H. B., Whitney D. R. On a test of whether one of two random variables is stochastically larger than the other. // Annals of Mathematical Statistics. - 1947, No. 18. - Pp. 50-60.
  2. Wilcoxon F. Individual Comparisons by Ranking Methods. // Biometrics Bulletin 1. 1945. - Pp. 80–83.
  3. Orlov A.I. Econometrics. - M.: Exam, 2003. - 576 p. (§4.5 What hypotheses can be tested using the two-sample Wilcoxon test?)
  4. Kobzar A. I. Applied math statistics. - M.: Fizmatlit, 2006. - 816 p.

Where T x is the largest sum of ranks, n x is the largest of the sample volumes n 1 and n 2 .

Purpose of the service. Using this online calculator you can calculate Mann-Whitney U test.

Purpose of the criterion

The criterion is intended to assess differences between two samples in terms of the level of any quantitatively measured attribute. It allows you to identify differences between small samples when n 1, n 2 ≥ 3 or n 1 =2, n 2 ≥ 5. Each sample should have no more than 60 observations.
This method determines whether the area of ​​crossing values ​​between two series is small enough. Let us assume that the first row (sample, group) is the row of values ​​in which the values, according to preliminary estimates, are higher, and the second row is the one where they are supposedly lower.
The smaller the area of ​​overlapping values, the more likely it is that the differences are significant. Sometimes these differences are called differences in the location of the two samples.
The empirical value of the U criterion reflects how large the area of ​​agreement between the rows is. Therefore, the smaller U em, the more likely it is that the differences are significant.

Hypotheses
H 0: The level of the trait in group 2 is not lower than the level of the trait in group 1.
H 1: The level of the trait in group 2 is lower than the level of the trait in group 1.

Algorithm for calculating the Mann-Whitney criterion

  1. Combine all data into a single series, marking data belonging to different samples.
  2. Rank the values, assigning a lower rank to the smaller value. The total number of ranks is (n 1 + n 2).
  3. Calculate the sum of ranks separately for each sample.
  4. Determine the larger of the two ranking sums.
  5. Determine the U value using the formula:
    U = n 1 n 2 + n x (n x + 1)/2 – T x ,
    where n 1 – sample size No. 1; n 2 – sample size No. 2; T x – the larger of the two rank sums; n x – maximum sample size: n x = max(n 1, n 2).
  6. Determine the critical values ​​of U cr using the table. If U em > U cr (0.05). H 0 is accepted. If U em ≤ U cr (0.05) H 0 is rejected. How less than value U, the higher the reliability of the differences.

Example. The level of verbal and non-verbal intelligence was measured in the prospective participants in the psychological experiment using D. Wechsler’s technique. Two groups of young men aged 18 to 24 years, students of the Faculty of Physics and Faculty of Psychology. Indicators of verbal intelligence are presented in the table. Is it possible to say that one of the groups is superior to the other in terms of verbal intelligence?

FP
135 130
130 129
131 121
128 129
127 119
137 124
126 125
137 129
131 129
137 130
137 131
127 123
133
125

A comparison of the results shows that the values ​​of sample X are slightly higher than those of sample Y, so we consider sample X first.
Thus, we need to determine whether the existing difference between the scores can be considered significant.
Solution.
Let's rank the presented table. When ranking, we combine two samples into one. Ranks are assigned in ascending order of the value of the measured quantity, i.e. the lowest rank corresponds to the lowest score. Note that if the scores for several students coincide, the rank of such a score should be considered as the arithmetic mean of those positions occupied by these scores when arranged in ascending order.
Since the matrix contains related ranks (the same rank number) of the 1st row, we will rearrange them. The reorganization of ranks is carried out without changing the importance of the rank, that is, the corresponding relationships (more than, less than or equal to) must be maintained between the rank numbers. It is also not recommended to set the rank above 1 and below a value equal to the number of parameters (in this case n = 26). Reorganization of ranks is carried out in table.
Seat numbers in ordered rowArrangement of factors according to the expert's assessmentNew ranks
1 119 1
2 121 2
3 123 3
4 124 4
5 125 5.5
6 125 5.5
7 126 7
8 127 8.5
9 127 8.5
10 128 10
11 129 12.5
12 129 12.5
13 129 12.5
14 129 12.5
15 130 16
16 130 16
17 130 16
18 131 19
19 131 19
20 131 19
21 133 21
22 135 22
23 137 24.5
24 137 24.5
25 137 24.5
26 137 24.5

Using the proposed ranking principle, we obtain a table of ranks.
XRank XYRank Y
125 5.5 119 1
126 7 121 2
127 8.5 123 3
127 8.5 124 4
128 10 125 5.5
130 16 129 12.5
131 19 129 12.5
131 19 129 12.5
133 21 129 12.5
135 22 130 16
137 24.5 130 16
137 24.5 131 19
137 24.5
137 24.5
Sum234.5 Sum116.5

This data is enough to use the formula for calculating the empirical value of the criterion:

Hypothesis H 0 about the insignificance of differences between samples is accepted if U cr< u эмп. В противном случае H 0 отвергается и различие определяется как существенное.
where U kp is the critical point, which is found using the Mann-Whitney table.
We'll find critical point U kp
From the table we find U kp (0.05) = 45
Since U kp > u em - we accept the alternative hypothesis H 1; differences in sampling levels can be considered significant.

The Mann-Whitney U test is most often used when processing results empirical research when writing coursework, diploma and master's theses in psychology.

The Mann-Whitney U test is a nonparametric statistical test. This means that its requirements for groups and measured psychological indicators are minimal:

  1. The compared samples should not be very large - no more than 60 people each. If the groups are larger, then it is better to use Student's t-test.
  2. The minimum number of groups is limited to 3 subjects in each group.
  3. The size of the groups being compared may not be the same, but should not differ very much.
  4. Psychological indicators can be indicators psychological tests, school grades, expert assessments of success professional activity and so on.

How is the Mann-Whitney U test calculated?

Without going into mathematical subtleties, let’s consider the logic of calculating the Mann-Whitney U test.

For example, as a result of testing, integral indicators of the meaningfulness of life of married and unmarried women were obtained. One of the objectives of the thesis is to identify differences in the meaningfulness of life among married and unmarried women. The samples are small (30 people each), so the Mann-Whitney U test can be used.

The procedure for calculating the Mann-Whitney U test in its most general and approximate form is as follows:

  1. Indicators of the meaningfulness of life of women of both groups are ranked (arranged in ascending order).
  2. Both ordered series are combined and ranked again.
  3. If in the general ranked series of indicators of meaningfulness of life the indicators of married and not married women alternate or intersect, then most likely there are no differences.
  4. It is possible that in the general ranked series of indicators of meaningful life, the indicators of married and unmarried women overlap slightly. For example, indicators for unmarried women are located in the region low indicators meaningfulness of life, and the indicators of married women are in the high range. In this case, most likely, differences in the level of meaningfulness of life in experimental and control groups yes - the meaningfulness of life among married women is higher than among unmarried women.

When calculating the Mann-Whitney U-criterion using statistical programs, the value of the criterion itself and the level of statistical significance of the differences in the severity of the psychological indicator are given. These indicators must be entered into the table and those psychological indicators must be highlighted whose significance level of differences in groups is lower than 0.05.

An example of calculating the Mann-Whitney U test manually

As a result of a psychodiagnostic examination of groups of men and women (20 people each), indicators of internal resistance when contacting a dating service were identified (in points):

  • Men: 45 67 45 67 88 67 56 67 78 56 45 67 89 56 4 56 74 57 89 67
  • Women: 70 66 66 66 63 63 61 60 54 47 13 45 56 45 34 45 34 5 62 34

The criterion is intended to assess differences between two samples in terms of the level of any quantitatively measured characteristic, with a distribution variant different from normal. Moreover, it allows us to identify differences between small samples(when n 1, n 2 ³3 or n 1 =2, n 2 ³5). This method determines how weakly the values ​​overlap (match) between two samples. The fewer overlapping values, the more likely it is that the differences are significant.

The smaller U em, the more likely it is that the differences are significant.

Null hypothesis: the level of the trait in sample 2 is not lower than the level of the trait in sample 1.

Before evaluating the criterion U ranking needs to be done.

DEFINITION: Ranging – distribution option inside variation series from smaller to larger values.

Ranking rules:

1. The smaller value is assigned a lower rank, as a rule, it is 1. The largest value is assigned a rank corresponding to the number of ranked values ​​(if n=10, then highest value will receive rank 10).

2. If several values ​​are equal, they are assigned a rank that is the average of the ranks they would receive if they were not equal:

3. The total sum of ranks must coincide with the calculated one, which is determined by the formula: , where N- total ranked values. A discrepancy between the actual and calculated rank sums will indicate an error made when calculating ranks or summing them up. Before you continue, you must find the error and fix it.

Example.

Let's rank the next row.

Using the formula, we will check the correctness of the ranking.

. Let's determine the sum of ranks: 1+2.5+2.5+4+5+6+7=28.

The total sum of ranks coincides with the calculated one. Therefore, we ranked correctly.

Mann-Whitney criterion calculation scheme:

The lower the value U, the higher the reliability of the differences and the greater the confidence in rejecting the null hypothesis.


3 example.

In diseases of the retina, the permeability of its vessels increases. The researchers measured retinal vascular permeability in healthy people and in patients with retinal damage. The results obtained are shown in the table.

To test whether these data support the hypothesis of differences in retinal vascular permeability.

Null hypothesis : the permeability of retinal vessels in retinal diseases in patients is not greater than in healthy ones (there is no statistical difference between the two samples).

Alternative hypothesis : the permeability of retinal vessels in patients with retinal diseases is greater than in healthy ones (there is a statistical difference between the two samples).

Healthy sick
Serial number Rank retinal vascular permeability Serial number Rank
0,5 1,2 6,5
0,7 2,5 1,4
0,7 2,5 1,6
1,0 4,5 1,7
1,0 4,5 1,7
1,2 6,5 1,8
1,4 2,2 18,5
1,4 2,3
1,6 2,4
1,6 6,4
1,7
2,2 18,5 23,6