Skip to content

GRE Data Analysis: Statistics, Probability, Distributions

Last updated: May 2, 2026

Data Analysis: Statistics, Probability, Distributions questions are one of the highest-leverage areas to study for the GRE. This guide breaks down the rule, the elements you need to recognize, the named traps that catch most students, and a memory aid that scales to test day. Read it once, then practice the same sub-topic adaptively in the app.

The rule

Data analysis questions on the GRE reward you for matching the situation to the correct tool: a measure of center (mean, median, mode), a measure of spread (range, standard deviation, IQR), a counting/probability rule (and vs. or, with vs. without replacement), or a distribution shape (uniform, normal, skewed). The right answer follows from the definition, not from intuition. Most students lose points by averaging when they should be using the median, by adding probabilities that should be multiplied, or by treating a skewed distribution as if it were normal.

Elements breakdown

Center & Spread Toolkit

Pick the summary statistic whose definition matches what the question is actually asking.

  • Mean: sum divided by count; sensitive to outliers
  • Median: middle value when sorted; robust to outliers
  • Mode: most frequent value; useful for categorical data
  • Range: max minus min; only uses two points
  • Standard deviation: typical distance from the mean
  • Interquartile range: $Q_3 - Q_1$; spread of middle 50%

Probability Operations

Translate the English connector into the correct arithmetic operation.

  • AND independent events: multiply probabilities
  • OR mutually exclusive: add probabilities
  • OR overlapping: $P(A) + P(B) - P(A \cap B)$
  • NOT: subtract from 1 (complement)
  • Conditional: $P(A \mid B) = \frac{P(A \cap B)}{P(B)}$
  • With replacement: probabilities stay constant
  • Without replacement: denominator shrinks each draw

Counting Principles

Decide whether order matters before you compute.

  • Multiplication principle: multiply choices per slot
  • Permutations (order matters): $\frac{n!}{(n-r)!}$
  • Combinations (order doesn't): $\binom{n}{r} = \frac{n!}{r!(n-r)!}$
  • Identical items: divide by repeats' factorials
  • Complement counting: total minus unwanted

Normal Distribution Benchmarks

Use the 68–95–99.7 rule to convert standard deviations into percentages.

  • Within $\pm 1\sigma$ of mean: about 68%
  • Within $\pm 2\sigma$: about 95%
  • Within $\pm 3\sigma$: about 99.7%
  • Each tail beyond $\pm 1\sigma$: about 16%
  • Below the mean: exactly 50%
  • Convert raw score to z-score: $z = \frac{x - \mu}{\sigma}$

Distribution Shape Diagnostics

Read the shape of a distribution to predict how mean and median relate.

  • Symmetric: mean $\approx$ median
  • Right-skewed (long right tail): mean $>$ median
  • Left-skewed (long left tail): mean $<$ median
  • Uniform: all outcomes equally likely
  • Bimodal: two peaks; mean/median may mislead

Common patterns and traps

The Outlier-Pulls-the-Mean Trap

GRE problems plant a single extreme value (a CEO salary, a 99-point test score in a class of 70s) and then ask for 'the typical' value or 'the average.' Students compute the mean reflexively. The correct response is to notice the outlier and use the median, or to recognize that adding/removing the outlier shifts the mean dramatically while leaving the median unchanged.

An answer choice that equals the arithmetic mean of a list whose largest entry is 5–10 times the others — and a competing choice that equals the middle value.

The Add-When-You-Should-Multiply Trap

Sequential probability questions ('first event AND then event') require multiplication of probabilities, but the wrong answer is often the sum. The trap is reinforced when both probabilities are simple fractions like $\frac{1}{2}$ and $\frac{1}{3}$ — adding gives $\frac{5}{6}$, multiplying gives $\frac{1}{6}$, and only one matches the joint event.

A wrong choice equal to $P(A) + P(B)$ when the question describes both events happening in sequence.

The Forgotten Denominator (Without Replacement)

When items are drawn and not replaced, the denominator of each subsequent probability drops by 1, and the numerator may drop too if the favorable pool shrinks. Students often hold the denominator constant, producing an answer that's slightly too small or too large. The fix is to write each fraction explicitly before multiplying.

A wrong choice that uses $\frac{k}{n} \times \frac{k-1}{n}$ instead of the correct $\frac{k}{n} \times \frac{k-1}{n-1}$.

The 68–95–99.7 Misapplication

The empirical rule applies only to (approximately) normal distributions. GRE questions sometimes describe a skewed or unspecified distribution and offer a tempting answer that uses the 68–95–99.7 percentages. If the problem doesn't say 'normal' or 'normally distributed,' those percentages don't apply.

A wrong choice of 16% or 2.5% in a problem about a distribution explicitly described as right-skewed.

The Order-Matters Mismatch

Counting problems hinge on whether arrangements are distinguishable. Choosing a 3-person committee from 8 people is a combination ($\binom{8}{3} = 56$); assigning 3 distinct offices is a permutation ($8 \times 7 \times 6 = 336$). Wrong answers in counting questions often equal the other operation's result.

Two answer choices where one equals $\binom{n}{r}$ and another equals $\frac{n!}{(n-r)!}$ — only one matches the scenario.

How it works

Suppose a small startup has 9 employees earning \$40K each and one founder earning \$640K. The mean salary is \$100K, but the median is \$40K — and only the median honestly describes a 'typical' employee. The GRE loves this gap. The same definition-first discipline applies to probability: 'draws a red marble AND then a blue marble, without replacement' means you multiply, but the second fraction's denominator drops by 1. For distribution questions, anchor to the 68–95–99.7 rule: a score 2 standard deviations above the mean beats roughly $50\% + 34\% + 13.5\% = 97.5\%$ of the population. Don't estimate; recite the benchmark.

Worked examples

Worked Example 1

A small architecture firm has 11 employees. Ten of them earn annual salaries of \$55{,}000, \$58{,}000, \$60{,}000, \$60{,}000, \$62{,}000, \$64{,}000, \$65{,}000, \$68{,}000, \$70{,}000, and \$72{,}000. The eleventh, the founder Marta Reyes, earns \$310{,}000. Which of the following statements is true about the salaries at the firm?

Select the correct statement.

  • A The mean and the median are equal.
  • B The median exceeds the mean by approximately \$22{,}000.
  • C The mean exceeds the median by approximately \$22{,}000. ✓ Correct
  • D The mean exceeds the median by approximately \$56{,}000.
  • E The mean and median both equal \$64{,}000.

Why C is correct: Sorted, the 11 salaries place the median at the 6th value, \$64{,}000. The sum of all 11 salaries is \$944{,}000, so the mean is $\frac{944{,}000}{11} \approx \$85{,}818$. The mean exceeds the median by about \$21{,}818, matching choice C. Marta's outlier salary pulls the mean up sharply but leaves the median untouched.

Why each wrong choice fails:

  • A: Mean equals median only in symmetric distributions; the founder's outlier salary creates strong right-skew. (The Outlier-Pulls-the-Mean Trap)
  • B: The direction is reversed — in a right-skewed distribution the mean exceeds the median, not the other way around. (The Outlier-Pulls-the-Mean Trap)
  • D: This treats Marta's full salary as if it shifts the mean by its entire amount; spreading the excess over 11 employees yields about \$22K, not \$56K.
  • E: \$64{,}000 is the median, but the mean is approximately \$85{,}818 because of the outlier — they aren't equal. (The Outlier-Pulls-the-Mean Trap)
Worked Example 2

A jar contains 5 green marbles and 7 yellow marbles. Fei draws two marbles, one after the other, without replacement. What is the probability that both marbles are green?

What is the probability?

  • A $\frac{5}{36}$
  • B $\frac{25}{144}$
  • C $\frac{5}{33}$ ✓ Correct
  • D $\frac{10}{12}$
  • E $\frac{1}{6}$

Why C is correct: On the first draw, $P(\text{green}) = \frac{5}{12}$. After removing one green, only 4 greens remain in 11 marbles, so $P(\text{green on 2nd}) = \frac{4}{11}$. Multiply: $\frac{5}{12} \times \frac{4}{11} = \frac{20}{132} = \frac{5}{33}$.

Why each wrong choice fails:

  • A: This is $\frac{5}{12} \times \frac{5}{12}$, the with-replacement probability rounded — both numerator and denominator should drop after the first draw. (The Forgotten Denominator (Without Replacement))
  • B: This is exactly $\left(\frac{5}{12}\right)^2$, treating the draws as independent. With no replacement, the second probability changes. (The Forgotten Denominator (Without Replacement))
  • D: This is $\frac{5}{12} + \frac{5}{12}$, adding when the events should be multiplied. AND of sequential events calls for multiplication. (The Add-When-You-Should-Multiply Trap)
  • E: $\frac{1}{6}$ is the right ballpark but isn't exact; the correct fraction is $\frac{5}{33} \approx 0.1515$, while $\frac{1}{6} \approx 0.1667$.
Worked Example 3

Scores on a standardized aptitude test are normally distributed with mean 500 and standard deviation 80. Let $p$ be the proportion of test-takers scoring above 660.

Quantity A: $p$
Quantity B: $0.05$

Compare Quantity A and Quantity B.

  • A Quantity A is greater.
  • B Quantity B is greater. ✓ Correct
  • C The two quantities are equal.
  • D The relationship cannot be determined from the information given.

Why B is correct: A score of 660 is exactly 2 standard deviations above the mean: $z = \frac{660 - 500}{80} = 2$. By the 68–95–99.7 rule, about 95% of scores fall within $\pm 2\sigma$ of the mean, so each tail beyond $\pm 2\sigma$ holds about 2.5%. Thus $p \approx 0.025$, which is less than 0.05.

Why each wrong choice fails:

  • A: This would require $p > 0.05$, but the upper tail beyond $2\sigma$ is only about 2.5%, well under 5%. (The 68–95–99.7 Misapplication)
  • C: $p \approx 0.025$, which is half of 0.05 — the two values aren't equal. Confusing the tail beyond $1\sigma$ (about 16%) with the tail beyond $2\sigma$ won't produce equality either. (The 68–95–99.7 Misapplication)
  • D: The distribution is explicitly normal with given mean and standard deviation, so the empirical rule pins $p$ down to about 2.5% — the relationship is fully determined.

Memory aid

Before computing, ask three questions: (1) Center or spread? (2) AND or OR — multiply or add? (3) Replacement or no replacement? If you can't answer all three from the problem text, re-read before you touch numbers.

Key distinction

Mean vs. median is the single highest-leverage distinction: the mean follows outliers, the median ignores them. If a question mentions skew, outliers, top earners, or 'typical,' the median is almost always the intended summary.

Summary

Match the tool to the definition — center vs. spread, AND vs. OR, with vs. without replacement, normal vs. skewed — and the arithmetic almost always falls out cleanly.

Practice data analysis: statistics, probability, distributions adaptively

Reading the rule is the start. Working GRE-format questions on this sub-topic with adaptive selection, watching your mastery score climb in real time, and seeing the items you missed return on a spaced-repetition schedule — that's where score lift actually happens. Free for seven days. No credit card required.

Start your free 7-day trial

Frequently asked questions

What is data analysis: statistics, probability, distributions on the GRE?

Data analysis questions on the GRE reward you for matching the situation to the correct tool: a measure of center (mean, median, mode), a measure of spread (range, standard deviation, IQR), a counting/probability rule (and vs. or, with vs. without replacement), or a distribution shape (uniform, normal, skewed). The right answer follows from the definition, not from intuition. Most students lose points by averaging when they should be using the median, by adding probabilities that should be multiplied, or by treating a skewed distribution as if it were normal.

How do I practice data analysis: statistics, probability, distributions questions?

The fastest way to improve on data analysis: statistics, probability, distributions is targeted, adaptive practice — working questions that focus on your specific weak spots within this sub-topic, getting immediate feedback, and revisiting items you missed on a spaced-repetition schedule. Neureto's adaptive engine does this automatically across the GRE; start a free 7-day trial to see your sub-topic mastery climb in real time.

What's the most important distinction to remember for data analysis: statistics, probability, distributions?

Mean vs. median is the single highest-leverage distinction: the mean follows outliers, the median ignores them. If a question mentions skew, outliers, top earners, or 'typical,' the median is almost always the intended summary.

Is there a memory aid for data analysis: statistics, probability, distributions questions?

Before computing, ask three questions: (1) Center or spread? (2) AND or OR — multiply or add? (3) Replacement or no replacement? If you can't answer all three from the problem text, re-read before you touch numbers.

What's a common trap on data analysis: statistics, probability, distributions questions?

Averaging when an outlier makes the mean misleading; the median is what's asked.

What's a common trap on data analysis: statistics, probability, distributions questions?

Adding probabilities of sequential events that should be multiplied.

Ready to drill these patterns?

Take a free GRE assessment — about 20 minutes and Neureto will route more data analysis: statistics, probability, distributions questions your way until your sub-topic mastery score reflects real improvement, not luck. Free for seven days. No credit card required.

Start your free 7-day trial