Additional Nonparametric Tests for Two or More Samples (Mann–Whitney, Kruskal–Wallis) (CFA Level 1): Mann–Whitney (Wilcoxon Rank-Sum) Test and Basic Idea. Key definitions, formulas, and exam tips.
Nonparametric tests offer analysts a robust way to compare sample distributions without strictly assuming normality or other rigid parametric conditions. When you’re dealing with heavily skewed data, small sample sizes, or outliers that might wreak havoc on the usual t-tests (see Section 8.5 for parametric tests of means and variances), you might find nonparametric procedures more reliable. In fact, these techniques focus on ranks rather than raw values, which gives them added resilience against outliers and skewed distributions.
Now, I remember once a colleague asked: “Hey, can we trust these average returns when we see such an uneven spread of daily gains and losses?” We both realized that a typical parametric test (like a two-sample t-test) might not paint an accurate picture if the returns were shaped like a lopsided distribution. So I said, “Well, let’s try the Mann–Whitney test and see if there’s a real difference in median returns.” That moment drove home how helpful—and sometimes essential—nonparametric methods can be.
Below, we’re going to explore two such methods: (1) the Mann–Whitney test (also called Wilcoxon rank-sum) for two samples, and (2) the Kruskal–Wallis test for three or more samples. These are part of the broader family of rank-based procedures and are quite standard in statistical analysis—especially useful in finance for analyzing median differences, returns distributions comparison, or anything else that doesn’t behave nicely under normal assumptions.
Before diving into these specific tests, it might help to visualize how parametric and nonparametric approaches fit into your decision-making:
flowchart LR
A["Data Sampling <br/>(Various Distributions)"]
B["Parametric <br/>(Assumes Normal Distribution)"]
C["Nonparametric <br/>(No assumption <br/>of Normality)"]
D["Mann-Whitney <br/>(2 samples)"]
E["Kruskal-Wallis <br/>(k > 2 samples)"]
A --> B
A --> C
C --> D
C --> E
If you’ve got two independent samples—say, daily returns from two different portfolios—and suspect that the distribution may be non-normal or riddled with outliers, the Mann–Whitney test is a viable alternative to the two-sample t-test. Rather than comparing means, it tests whether the two samples likely come from populations with the same median (or at least the same location if they have a similar shape).
Let’s define our hypotheses in the classic way:
In practice, some textbooks phrase this as “there is no difference in stochastic dominance” or “there is no location shift.” But basically, it’s a check on median equivalence when the distributions aren’t straying too far in shape.
We combine all data points from both groups, rank them from smallest to largest, then compare how these ranks are distributed between the two groups. Here’s one formula for the test statistic U:
$$ U = n_1 n_2 + \frac{n_1(n_1 + 1)}{2} - \sum_{i=1}^{n_1} R(X_i), $$
where
You’ll typically use a table (or software) to get the p-value associated with U, or you might use a normal approximation for larger sample sizes. If the p-value is below your chosen significance level (commonly 5%), you reject H₀ and conclude there might be a difference in medians.
For instance, if you’re looking at realized returns for small-cap stocks from two different sectors, both sets might display highly skewed and leptokurtic (fat-tailed) returns. Mann–Whitney can help determine if the median returns differ, unaffected by outliers.
Let’s say you have daily returns from two small-cap stock portfolios, each with 40 days of returns. You suspect the presence of one or two days with extreme losses or gains that could shift the mean drastically. You rank all 80 returns from smallest to largest (1 to 80), sum the ranks separately for each portfolio, and compute the Mann–Whitney statistic. A significantly low or high U might indicate that one portfolio tends to dominate the other in median returns.
What if you have, not two, but three or more samples? The Kruskal–Wallis test extends the Mann–Whitney approach to k groups. It’s sometimes called the “nonparametric ANOVA” because it’s the nonparametric counterpart to the one-way ANOVA test (which is introduced in typical parametric contexts).
We’re testing a more general scenario: the possibility that one or more groups might have a different location compared to the others.
Following the same rank-based logic, the data from all k groups are pooled, ranked, then the ranks are summed within each group. The Kruskal–Wallis statistic (H) is calculated as:
$$ H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3,(N+1), $$
where
For larger samples, H approximately follows a chi-square (\(\chi^2\)) distribution with \(k-1\) degrees of freedom. If H is large enough (or equivalently, the p-value is small enough), you reject H₀ and conclude that at least one group stands out in terms of its median.
If you do get a significant result—meaning you reject H₀ for the Kruskal–Wallis test—you’ll probably need to follow up with further pairwise comparisons (e.g., Mann–Whitney tests with an appropriate multiple comparison adjustment) to identify exactly which groups differ from which. Finance professionals frequently do this if they see that, say, emerging market bonds significantly differ from developed market bonds or high-yield corporate bonds in terms of median returns, but aren’t sure which pair (or pairs) is driving the difference.
Imagine you want to compare returns from four asset classes—large-cap stocks, small-cap stocks, corporate bonds, and government bonds—over a certain period. You suspect some (or all!) of these might have skewed distributions. You pool all returns, rank them, and see how the ranks are distributed among the four categories. If the Kruskal–Wallis statistic suggests a significant difference, you might zero in further on which asset class is pushing that difference.
Both Mann–Whitney and Kruskal–Wallis depend on the following assumptions:
In finance, these assumptions can be tricky. Markets and assets can exhibit all kinds of non-stationary behavior—one day, one asset might show volatility spikes. So always approach these methods with a healthy portion of caution, just like you do with everything else involved in real-world capital markets research.
Here’s a quick demonstration of how you might run a Mann–Whitney test in Python. Suppose you have two NumPy arrays, returns_a and returns_b:
1import numpy as np
2from scipy.stats import mannwhitneyu
3
4returns_a = np.array([0.02, -0.01, 0.03, 0.01, 0.10])
5returns_b = np.array([0.00, 0.01, 0.04, -0.02, 0.06])
6
7stat, p_value = mannwhitneyu(returns_a, returns_b, alternative='two-sided')
8print("Mann–Whitney U statistic:", stat)
9print("p-value:", p_value)
If your p-value is super low, you’d likely reject H₀ and conclude that the median returns from portfolio A and portfolio B differ.
Let’s say you’re analyzing a hedge fund’s performance across several strategies (long/short equity, global macro, managed futures, and credit arbitrage). You collect monthly returns for each strategy over one year, generating four sets of 12 observations each. A single outlier in the global macro strategy might distort a parametric ANOVA, but a Kruskal–Wallis approach might be more stable for that discrepancy. If the test indicates a difference, you could do pairwise Mann–Whitney tests to see which strategy’s median stands out.
For the CFA Level I exam, you might see:
On the exam day, keep these points in mind:
Anyway, we’ve all faced moments where the regular parametric tests just don’t fit. The Mann–Whitney and Kruskal–Wallis tests are two powerful tools in your toolkit for tackling data that’s skewed, outlier-prone, or simply not following the usual normal route. Always weigh their assumptions, particularly the distribution shape and independence of samples. And if the tests finds something interesting, don’t forget your post-hoc analyses. In the real world, these steps might spare you from drawing misleading conclusions about whether a certain asset class or investment strategy truly outperforms another.
Important Notice: FinancialAnalystGuide.com provides supplemental CFA study materials, including mock exams, sample exam questions, and other practice resources to aid your exam preparation. These resources are not affiliated with or endorsed by the CFA Institute. CFA® and Chartered Financial Analyst® are registered trademarks owned exclusively by CFA Institute. Our content is independent, and we do not guarantee exam success. CFA Institute does not endorse, promote, or warrant the accuracy or quality of our products.