U18MAI4201 — Probability & Statistics

EXAM MODE

Unit 1: Correlation & Regression

Karl Pearson's Coefficient • Spearman's Rank Correlation • Regression Lines • Regression Coefficients

2-Mark Q&A 10 Questions
1. Define Correlation. HOT
Correlation is a statistical measure that expresses the strength and direction of the linear relationship between two variables X and Y.
  • If both variables change in the same directionPositive correlation (r > 0)
  • If variables change in opposite directionsNegative correlation (r < 0)
  • No relationship → Zero correlation (r = 0)
  • Range: −1 ≤ r ≤ +1
2. Define Karl Pearson's Coefficient of Correlation. Give its formula. HOT
Karl Pearson's coefficient of correlation (r) measures the degree of linear association between two variables X and Y.
Computation Formula (preferred for calculation):
r = (n·ΣXY − ΣX·ΣY) / √[(n·ΣX² − (ΣX)²)(n·ΣY² − (ΣY)²)]
  • n = number of pairs of observations
  • Range: −1 ≤ r ≤ +1; r = ±1 means perfect correlation
  • r is dimensionless (no units)
3. Define Spearman's Rank Correlation Coefficient. Give its formula. HOT
Spearman's Rank Correlation (ρ) measures correlation when data is qualitative or when the exact values are replaced by ranks.
ρ = 1 − (6·Σd²) / (n(n²−1))
  • d = difference in ranks of corresponding values (d = R₁ − R₂)
  • n = number of pairs
  • Range: −1 ≤ ρ ≤ +1
  • Used when data is ordinal or distribution is non-normal
4. State the properties of Karl Pearson's Correlation Coefficient.
  • Range: −1 ≤ r ≤ +1
  • r = +1: Perfect positive linear correlation
  • r = −1: Perfect negative linear correlation
  • r = 0: No linear correlation
  • r is independent of change of origin and scale
  • r is symmetric: rXY = rYX
  • r is a pure number (dimensionless)
  • |r| > 0.7 → High correlation; 0.5 < |r| ≤ 0.7 → Moderate; |r| ≤ 0.5 → Low
5. What are Regression Lines? Why are there two regression lines? HOT
Regression lines are lines of best fit that describe the relationship between two variables.
  • Line 1 — Regression of Y on X: Used to estimate Y for a given X. Minimises sum of squared errors in the Y direction.
  • Line 2 — Regression of X on Y: Used to estimate X for a given Y. Minimises sum of squared errors in the X direction.
There are two lines because each minimises a different type of error. They coincide only when r = ±1 (perfect correlation).
6. Define Regression Coefficients. Give their formulas. HOT
  • byx (regression coefficient of Y on X): byx = r · (σyx) or byx = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²)
  • bxy (regression coefficient of X on Y): bxy = r · (σxy) or bxy = (n·ΣXY − ΣX·ΣY) / (n·ΣY² − (ΣY)²)
r² = byx × bxy  ⟹  r = ±√(byx · bxy)
  • Both regression coefficients must have the same sign as r
7. When do the two Regression Lines coincide?
The two regression lines coincide (become the same line) when r = +1 or r = −1 (perfect correlation). In this case there is a perfect linear relationship between X and Y — knowing one variable completely determines the other, so both estimation lines are identical.
8. What is the Coefficient of Determination? HOT
The coefficient of determination is . It measures the proportion of the total variation in Y that is explained by the linear regression on X.
  • r² = 0.81 means 81% of variation in Y is explained by X
  • Range: 0 ≤ r² ≤ 1
  • Closer to 1 → better fit of regression line
9. Give the equations of the two Regression Lines. HOT
Regression of Y on X:
y − ȳ = byx(x − x̄)
Regression of X on Y:
x − x̄ = bxy(y − ȳ)
Both lines pass through the point (x̄, ȳ) — the means of X and Y.
10. What is the formula for Spearman's Rank Correlation when ranks are tied? HOT
When two or more observations have the same value (tied ranks), each tied observation gets the average of the positions they would have occupied. A correction factor is added:
ρ = 1 − 6[Σd² + Σm(m²−1)/12] / [n(n²−1)]
where m = number of observations tied at a particular rank. Add one correction factor for each group of tied ranks.

8-Mark Topics

1. Karl Pearson's Coefficient of Correlation — Theory & Numerical 8M
Definition & Formula

Karl Pearson's correlation coefficient r is the ratio of the covariance of X and Y to the product of their standard deviations.

Definition form:
r = Cov(X,Y) / (σx · σy) = Σ(X−X̄)(Y−Ȳ) / √[Σ(X−X̄)² · Σ(Y−Ȳ)²]
Computation form (use this in exam):
r = (n·ΣXY − ΣX·ΣY) / √[(n·ΣX² − (ΣX)²)(n·ΣY² − (ΣY)²)]
Properties
  • −1 ≤ r ≤ +1 (always)
  • r = +1: Perfect positive linear correlation
  • r = −1: Perfect negative linear correlation
  • r = 0: No linear relationship (variables may still be related non-linearly)
  • Independent of origin & scale: If U = (X−a)/h and V = (Y−b)/k, then r(X,Y) = r(U,V)
  • Symmetric: rXY = rYX
  • r is a pure number (no units)
Worked Numerical Example

Find the correlation coefficient between X and Y:

XYXY
151255
244168
33999
421648
512515
ΣX=15ΣY=15ΣX²=55ΣY²=55ΣXY=35
n = 5, ΣX=15, ΣY=15, ΣX²=55, ΣY²=55, ΣXY=35
n·ΣXY − ΣX·ΣY = 5×35 − 15×15 = 175 − 225 = −50
n·ΣX² − (ΣX)² = 5×55 − 225 = 275 − 225 = 50
n·ΣY² − (ΣY)² = 5×55 − 225 = 50
√(50 × 50) = √2500 = 50
r = −50 / 50 = −1.0 (Perfect negative correlation)
Change of Origin & Scale Method (Shortcut)

When values are large, substitute U = X − A, V = Y − B (or U = (X−A)/h etc.) to simplify calculations. The value of r remains unchanged.

Tip: In exams, always make a table with columns X, Y, X², Y², XY and sum all columns. Then substitute directly into the formula.
2. Spearman's Rank Correlation — Theory & Numerical 8M
When to Use Spearman's Rank Correlation?
  • Data is qualitative (e.g., rankings of beauty, intelligence)
  • Data does not follow normal distribution
  • Data has extreme outliers (rank correlation is more robust)
  • Actual values are not available but ranks are given
Formula
Without tied ranks: ρ = 1 − (6·Σd²) / (n(n²−1))
With tied ranks: ρ = 1 − 6[Σd² + m₁(m₁²−1)/12 + m₂(m₂²−1)/12 + ...] / [n(n²−1)]
  • Tied ranks: assign each tied item the mean of their ranks
  • m = number of items tied at a rank
Worked Numerical Example

10 students are ranked in Mathematics (X) and Physics (Y). Find Spearman's ρ:

StudentX (Math)Y (Phys)d = X−Y
A13−24
B25−39
C34−11
D4139
E5239
F67−11
G79−24
H8624
I9811
J101000
Σd²42
n = 10, Σd² = 42
ρ = 1 − 6×42 / (10×(100−1))
ρ = 1 − 252 / (10×99)
ρ = 1 − 252/990
ρ = 1 − 0.2545 = 0.7455 (High positive correlation)
Tied Ranks Example

If 3 values are tied at positions 4, 5, 6 → each gets rank = (4+5+6)/3 = 5. Correction = m(m²−1)/12 = 3(9−1)/12 = 2. Add this to Σd² in numerator.

3. Lines of Regression — Theory & Numerical 8M
Two Lines of Regression
Regression of Y on X (estimate Y given X):
y − ȳ = byx(x − x̄)  where byx = r·(σyx)
Regression of X on Y (estimate X given Y):
x − x̄ = bxy(y − ȳ)  where bxy = r·(σxy)
Key Relationships & Properties
  • Both lines pass through (x̄, ȳ)
  • r² = byx · bxy  →  r = ±√(byx · bxy)
  • byx and bxy have the same sign as r
  • If r = 0 → lines are perpendicular to each other
  • If r = ±1 → lines coincide
  • AM of regression coefficients ≥ |r|: (byx + bxy)/2 ≥ |r|
Angle Between the Two Regression Lines
tan θ = [(1−r²)/r] · [σx·σy/(σx²+σy²)]
  • θ = 0° when r = ±1 (lines coincide)
  • θ = 90° when r = 0 (lines perpendicular)
Worked Numerical Example

Given data: n=5, ΣX=25, ΣY=30, ΣX²=145, ΣY²=220, ΣXY=158. Find the two regression lines.

x̄ = ΣX/n = 25/5 = 5    ȳ = ΣY/n = 30/5 = 6
b_yx = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²)
b_yx = (5×158 − 25×30) / (5×145 − 625)
b_yx = (790 − 750) / (725 − 625) = 40/100 = 0.4
b_xy = (n·ΣXY − ΣX·ΣY) / (n·ΣY² − (ΣY)²)
b_xy = 40 / (5×220 − 900) = 40/(1100−900) = 40/200 = 0.2
r = √(b_yx × b_xy) = √(0.4 × 0.2) = √0.08 ≈ 0.283
Regression of Y on X: y − 6 = 0.4(x − 5) → y = 0.4x + 4
Regression of X on Y: x − 5 = 0.2(y − 6) → x = 0.2y + 3.8
To estimate Y when X=7: y = 0.4(7) + 4 = 6.8
To estimate X when Y=8: x = 0.2(8) + 3.8 = 5.4

Unit 2: Probability & Random Variables

Axioms • Conditional Probability • Total Probability • Bayes' Theorem • PMF • PDF • CDF • Moments • MGF

2-Mark Q&A 10 Questions
1. State the Axioms of Probability (Kolmogorov's Axioms). HOT
For a sample space S and any event A:
  • Axiom 1 (Non-negativity): P(A) ≥ 0
  • Axiom 2 (Certainty): P(S) = 1 (probability of the entire sample space is 1)
  • Axiom 3 (Additivity): If A and B are mutually exclusive (A∩B = ∅), then P(A∪B) = P(A) + P(B)
From these, all other probability rules are derived.
2. Define Conditional Probability. HOT
The conditional probability of event A given that event B has already occurred is:
P(A|B) = P(A∩B) / P(B)  , provided P(B) > 0
  • Restricts the sample space to event B
  • Similarly, P(B|A) = P(A∩B) / P(A)
  • Multiplication rule: P(A∩B) = P(A|B)·P(B) = P(B|A)·P(A)
3. State the Theorem of Total Probability. HOT
If B₁, B₂, ..., Bₙ are mutually exclusive and exhaustive events (a partition of S) with P(Bᵢ) > 0, then for any event A:
P(A) = P(A|B₁)·P(B₁) + P(A|B₂)·P(B₂) + ... + P(A|Bₙ)·P(Bₙ) = Σᵢ P(A|Bᵢ)·P(Bᵢ)
This is used to compute P(A) by conditioning on the partition events.
4. State Bayes' Theorem. HOT
If B₁, B₂, ..., Bₙ form a partition of S, and A is any event with P(A) > 0, then:
P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / [Σᵢ P(A|Bᵢ)·P(Bᵢ)]
  • P(Bₖ) = Prior probability (before observing A)
  • P(Bₖ|A) = Posterior probability (after observing A)
  • Used to update probability based on new evidence
5. Define a Random Variable. HOT
A random variable X is a real-valued function defined on a sample space S that assigns a numerical value to each outcome of a random experiment.
  • Discrete RV: Takes countable values (e.g., number of heads in 3 tosses: 0, 1, 2, 3)
  • Continuous RV: Takes any value in a continuous interval (e.g., height, temperature)
6. Define Probability Mass Function (PMF). State its properties. HOT
The PMF of a discrete random variable X is a function p(x) such that:
  • p(x) = P(X = x) for each value x that X can take
  • Property 1: p(x) ≥ 0 for all x
  • Property 2: Σ p(x) = 1 (sum over all possible values)
  • P(a ≤ X ≤ b) = Σ p(x) for x in [a, b]
7. Define Probability Density Function (PDF). State its properties. HOT
The PDF of a continuous random variable X is a function f(x) such that:
  • Property 1: f(x) ≥ 0 for all x
  • Property 2: ∫₋∞^∞ f(x) dx = 1
  • P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
  • Note: P(X = a) = 0 for any specific value a (continuous distribution)
8. Define the Distribution Function (CDF) and state its properties.
The Cumulative Distribution Function F(x) = P(X ≤ x) for all x ∈ ℝ.
  • 0 ≤ F(x) ≤ 1
  • F(−∞) = 0  and  F(+∞) = 1
  • F(x) is non-decreasing: if a < b then F(a) ≤ F(b)
  • F(x) is right-continuous: F(x⁺) = F(x)
  • For continuous RV: f(x) = dF(x)/dx
  • P(a ≤ X ≤ b) = F(b) − F(a)
9. Define Moments of a Random Variable. HOT
  • r-th Raw Moment (about origin): μ'ᵣ = E(Xʳ)
  • r-th Central Moment (about mean): μᵣ = E[(X − μ)ʳ] where μ = E(X)
  • μ'₁ = E(X) = Mean
  • μ₂ = E[(X−μ)²] = Variance = E(X²) − [E(X)]²
  • μ₃ used for skewness; μ₄ for kurtosis
  • Relation: μ₂ = μ'₂ − (μ'₁)²
10. Define the Moment Generating Function (MGF). HOT
The MGF of a random variable X is:
MX(t) = E(etX) = Σ etx·p(x) (discrete)  or  ∫ etx·f(x) dx (continuous)
  • The r-th moment: E(Xʳ) = drMX(t)/dtr evaluated at t=0
  • MX(0) = 1 always
  • Uniqueness: if two RVs have the same MGF, they have the same distribution

8-Mark Topics

1. Bayes' Theorem — Theory, Proof & Numerical 8M
Statement

Let B₁, B₂, ..., Bₙ be mutually exclusive and exhaustive events with P(Bᵢ) > 0. If A is any event with P(A) > 0, then:

P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / Σᵢ[P(A|Bᵢ)·P(Bᵢ)]  for k = 1, 2, ..., n
Proof
By definition of conditional probability:
P(Bₖ|A) = P(Bₖ∩A) / P(A) ... (i)
P(Bₖ∩A) = P(A|Bₖ)·P(Bₖ) ... (ii) [multiplication rule]
By Total Probability Theorem:
P(A) = Σᵢ P(A|Bᵢ)·P(Bᵢ) ... (iii)
Substituting (ii) and (iii) into (i):
P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / Σᵢ[P(A|Bᵢ)·P(Bᵢ)] ∎
Numerical Example

Three machines I, II, III produce 50%, 30%, 20% of total output. The percentage of defectives are 3%, 4%, 5% respectively. An item is selected and found defective. Find the probability it came from machine II.

Let B₁=Machine I, B₂=Machine II, B₃=Machine III, A=Defective
P(B₁)=0.5, P(B₂)=0.3, P(B₃)=0.2
P(A|B₁)=0.03, P(A|B₂)=0.04, P(A|B₃)=0.05
P(A) = 0.5×0.03 + 0.3×0.04 + 0.2×0.05
P(A) = 0.015 + 0.012 + 0.010 = 0.037
P(B₂|A) = P(A|B₂)·P(B₂) / P(A)
P(B₂|A) = (0.04 × 0.3) / 0.037 = 0.012/0.037 ≈ 0.3243

There is a 32.43% probability the defective item came from Machine II.

2. Random Variables — PMF, PDF, CDF & Numerical 8M
Discrete RV — PMF Example

A RV X has PMF p(x) = kx for x = 1,2,3,4. Find k, P(X≤2), Mean, Variance.

Σp(x)=1: k(1+2+3+4)=1 → 10k=1 → k=0.1
PMF: p(1)=0.1, p(2)=0.2, p(3)=0.3, p(4)=0.4
P(X≤2) = p(1)+p(2) = 0.1+0.2 = 0.3
E(X) = 1×0.1 + 2×0.2 + 3×0.3 + 4×0.4 = 0.1+0.4+0.9+1.6 = 3.0
E(X²) = 1×0.1 + 4×0.2 + 9×0.3 + 16×0.4 = 0.1+0.8+2.7+6.4 = 10.0
Var(X) = E(X²)−[E(X)]² = 10 − 9 = 1.0
Continuous RV — PDF Example

A RV X has PDF f(x) = kx² for 0 ≤ x ≤ 1, 0 otherwise. Find k, P(0.2 ≤ X ≤ 0.5), Mean, Variance.

∫₀¹ kx² dx = 1 → k[x³/3]₀¹ = 1 → k/3 = 1 → k = 3
f(x) = 3x² for 0 ≤ x ≤ 1
P(0.2≤X≤0.5) = ∫₀.₂^0.5 3x² dx = [x³]₀.₂^0.5 = 0.125 − 0.008 = 0.117
E(X) = ∫₀¹ x·3x² dx = 3∫₀¹ x³ dx = 3[x⁴/4]₀¹ = 3/4 = 0.75
E(X²) = ∫₀¹ x²·3x² dx = 3∫₀¹ x⁴ dx = 3/5 = 0.6
Var(X) = E(X²)−[E(X)]² = 0.6 − 0.5625 = 0.0375
CDF from PDF

For f(x) = 3x² on [0,1]: F(x) = ∫₀ˣ 3t² dt = x³ for 0 ≤ x ≤ 1

Check: F(0)=0, F(1)=1 ✓

P(X > 0.7) = 1 − F(0.7) = 1 − 0.343 = 0.657

3. Moments & Moment Generating Function (MGF) 8M
Raw Moments & Central Moments
MomentDiscreteContinuousMeaning
μ'₁ (1st raw)Σx·p(x)∫x·f(x)dxMean (μ)
μ'₂ (2nd raw)Σx²·p(x)∫x²·f(x)dxE(X²)
μ₂ (2nd central)E[(X−μ)²] = μ'₂ − (μ'₁)²Variance (σ²)
μ₃ (3rd central)E[(X−μ)³]Skewness
μ₄ (4th central)E[(X−μ)⁴]Kurtosis
Key relations:
μ₂ = μ'₂ − (μ'₁)²  |  μ₃ = μ'₃ − 3μ'₂μ'₁ + 2(μ'₁)³  |  μ₄ = μ'₄ − 4μ'₃μ'₁ + 6μ'₂(μ'₁)² − 3(μ'₁)⁴
Moment Generating Function (MGF)
MX(t) = E(etX)

Expanding etX as a series: etX = 1 + tX + t²X²/2! + t³X³/3! + ...

So: MX(t) = 1 + tμ'₁ + t²μ'₂/2! + t³μ'₃/3! + ...

Therefore: μ'ᵣ = [drMX(t)/dtr]t=0

MGF Numerical Example

X has PMF: p(0)=1/4, p(1)=1/2, p(2)=1/4. Find MGF and first two moments.

M_X(t) = Σ e^(tx)·p(x) = e^0·(1/4) + e^t·(1/2) + e^(2t)·(1/4)
M_X(t) = 1/4 + (1/2)e^t + (1/4)e^(2t)
M'_X(t) = (1/2)e^t + (1/2)e^(2t)
μ'₁ = M'_X(0) = 1/2 + 1/2 = 1 → Mean = 1
M''_X(t) = (1/2)e^t + e^(2t)
μ'₂ = M''_X(0) = 1/2 + 1 = 3/2 → Var = 3/2 − 1² = 1/2
Properties of MGF
  • MX(0) = 1 (always)
  • If Y = aX + b, then MY(t) = ebt·MX(at)
  • If X and Y are independent: MX+Y(t) = MX(t)·MY(t)
  • MGF uniquely determines the distribution

Unit 3: Normal Distribution

Normal Distribution • PDF • Properties • Area Rule • Moments • Moment Generating Function

2-Mark Q&A 10 Questions
1. Define Normal Distribution. HOT
A continuous random variable X is said to follow a Normal Distribution with mean μ and variance σ² (written X ~ N(μ, σ²)) if its PDF is:
f(x) = (1 / (σ√(2π))) · exp[−(x−μ)² / (2σ²)],  −∞ < x < ∞
  • Parameters: μ (mean), σ² (variance), σ (standard deviation)
  • Bell-shaped, symmetric about x = μ
  • Also called the Gaussian Distribution
2. Define Standard Normal Distribution. HOT
The Standard Normal Distribution is a special case of the normal distribution with mean = 0 and variance = 1, denoted Z ~ N(0,1).
φ(z) = (1/√(2π)) · exp(−z²/2),  −∞ < z < ∞
Standardisation: Z = (X − μ) / σ
Probabilities are found using the Standard Normal Table (z-table).
3. State the properties of the Normal Distribution. HOT
  • Mean = Median = Mode = μ (perfectly symmetric)
  • Curve is bell-shaped and symmetric about x = μ
  • Total area under curve = 1
  • The curve is asymptotic to the x-axis (never touches)
  • Skewness β₁ = 0; Kurtosis β₂ = 3 (mesokurtic)
  • All odd central moments = 0
  • Linear combination of independent normal RVs is also normal
4. State the Area (68-95-99.7) Rule for Normal Distribution. HOT
For X ~ N(μ, σ²):
IntervalProbabilityPercentage
μ − σ to μ + σP(μ−σ < X < μ+σ)68.27%
μ − 2σ to μ + 2σP(μ−2σ < X < μ+2σ)95.45%
μ − 3σ to μ + 3σP(μ−3σ < X < μ+3σ)99.73%
5. What is the MGF of Normal Distribution? HOT
The Moment Generating Function of X ~ N(μ, σ²) is:
MX(t) = exp(μt + σ²t²/2)
For Standard Normal Z ~ N(0,1):
MZ(t) = exp(t²/2)
This is used to derive all moments of the normal distribution.
6. Write the Central Moments of Normal Distribution. HOT
For X ~ N(μ, σ²):
  • μ₁ = 0 (all odd moments = 0 due to symmetry)
  • μ₂ = σ² (variance)
  • μ₃ = 0 (zero skewness)
  • μ₄ = 3σ⁴
  • In general: μ₂ₙ₊₁ = 0 and μ₂ₙ = 1·3·5···(2n−1)·σ²ⁿ
β₁ = μ₃²/μ₂³ = 0  (Skewness)   |   β₂ = μ₄/μ₂² = 3  (Kurtosis)
7. What are the Raw Moments of Normal Distribution?
The raw moments are obtained by differentiating the MGF MX(t) = eμt + σ²t²/2:
  • μ'₁ = M'X(0) = μ  (Mean)
  • μ'₂ = M''X(0) = μ² + σ²  (Second raw moment)
  • Var(X) = μ'₂ − (μ'₁)² = (μ²+σ²) − μ² = σ² ✓
8. How do you find P(a < X < b) for a Normal Distribution? HOT
Step 1: Convert to standard normal: Z = (X−μ)/σ
P(a < X < b) = P((a−μ)/σ < Z < (b−μ)/σ) = Φ(z₂) − Φ(z₁)
where Φ(z) = P(Z ≤ z) is read from the Standard Normal table.
  • P(Z < 0) = 0.5 (by symmetry)
  • P(Z < −z) = 1 − P(Z < z) = 1 − Φ(z)
9. State the reproductive property of Normal Distribution.
If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then:
X₁ + X₂ ~ N(μ₁+μ₂, σ₁²+σ₂²)
More generally: a₁X₁ + a₂X₂ ~ N(a₁μ₁+a₂μ₂, a₁²σ₁²+a₂²σ₂²)
This is called the reproductive (additive) property of normal distribution.
10. What is the point of inflection of the Normal curve? HOT
The normal curve has two points of inflection (where the curve changes concavity) at:
x = μ − σ  and  x = μ + σ
At these points, the curve changes from concave up to concave down (or vice versa). The perpendicular distance from the mean to each inflection point equals one standard deviation (σ).

8-Mark Topics

1. Normal Distribution — PDF, Properties & MGF Derivation 8M
Probability Density Function
PDF of Normal Distribution X ~ N(μ, σ²):
f(x) = (1/σ√(2π)) · e−(x−μ)²/(2σ²),   −∞ < x < ∞
  • μ = mean (location parameter) — shifts curve left/right
  • σ = standard deviation (scale parameter) — controls spread
  • Larger σ → flatter, wider curve; Smaller σ → taller, narrower
Properties of Normal Distribution
  1. Symmetry: f(μ+x) = f(μ−x) — symmetric about x = μ
  2. Mean = Median = Mode = μ
  3. Maximum: Curve is maximum at x = μ, maximum value = 1/(σ√(2π))
  4. Asymptotic: Curve approaches x-axis but never touches it
  5. Inflection Points: At x = μ ± σ
  6. Area = 1: ∫₋∞^∞ f(x) dx = 1
  7. Moments: All odd central moments = 0; μ₂ = σ², μ₄ = 3σ⁴
  8. Kurtosis β₂ = 3 (mesokurtic — neither flat nor peaked)
MGF Derivation
M_X(t) = E(e^(tX)) = ∫₋∞^∞ e^(tx) · (1/σ√(2π)) · e^(−(x−μ)²/(2σ²)) dx
Combine exponents: e^(tx) · e^(−(x−μ)²/(2σ²)) = e^[tx − (x−μ)²/(2σ²)]
Complete the square in x: tx − (x−μ)²/(2σ²)
= −(1/2σ²)[x² − 2(μ + σ²t)x + μ²]
= −(1/2σ²)[(x−(μ+σ²t))² − (μ+σ²t)² + μ²]
= −(x−(μ+σ²t))²/(2σ²) + μt + σ²t²/2
So M_X(t) = e^(μt+σ²t²/2) · ∫₋∞^∞ (1/σ√(2π))·e^(−(x−(μ+σ²t))²/(2σ²)) dx
The integral = 1 (it's a normal PDF with mean μ+σ²t)
∴ M_X(t) = e^(μt + σ²t²/2)
Deriving Moments from MGF
M_X(t) = e^(μt + σ²t²/2)
M'_X(t) = (μ + σ²t) · e^(μt + σ²t²/2)
μ'₁ = M'_X(0) = μ · e^0 = μ ✓
M''_X(t) = σ²·e^(μt+σ²t²/2) + (μ+σ²t)²·e^(μt+σ²t²/2)
μ'₂ = M''_X(0) = σ² + μ²
Var(X) = μ'₂ − (μ'₁)² = σ² + μ² − μ² = σ² ✓
2. Normal Distribution — Finding Probabilities (Numerical) 8M
Standard Normal Table Usage

The z-table gives Φ(z) = P(Z ≤ z) for Z ~ N(0,1). Key symmetry rules:

  • P(Z ≤ 0) = 0.5
  • P(Z ≤ −z) = 1 − P(Z ≤ z) = 1 − Φ(z)
  • P(a ≤ Z ≤ b) = Φ(b) − Φ(a)
  • P(Z ≥ z) = 1 − Φ(z)
Worked Example 1

X ~ N(50, 100) [i.e. μ=50, σ=10]. Find (i) P(X < 65), (ii) P(40 < X < 60), (iii) P(X > 72).

(i) P(X<65): Z = (65−50)/10 = 1.5
P(X<65) = P(Z<1.5) = Φ(1.5) = 0.9332
(ii) P(40<X<60): Z₁=(40−50)/10=−1, Z₂=(60−50)/10=1
P(40<X<60) = Φ(1) − Φ(−1) = 0.8413 − 0.1587 = 0.6826
(iii) P(X>72): Z = (72−50)/10 = 2.2
P(X>72) = 1 − Φ(2.2) = 1 − 0.9861 = 0.0139
Worked Example 2 — Finding the Value Given Probability

X ~ N(30, 25) [μ=30, σ=5]. Find x₀ such that P(X > x₀) = 0.05.

P(X > x₀) = 0.05 → P(X ≤ x₀) = 0.95
→ P(Z ≤ z₀) = 0.95 → z₀ = 1.645 (from z-table)
z₀ = (x₀ − μ)/σ → 1.645 = (x₀ − 30)/5
x₀ = 30 + 1.645 × 5 = 30 + 8.225 = 38.225
Worked Example 3 — Normal Approximation

In an exam, scores are normally distributed with mean 70 and SD 15. If 500 students appeared, how many scored between 55 and 85?

Z₁ = (55−70)/15 = −1.0    Z₂ = (85−70)/15 = +1.0
P(55<X<85) = P(−1<Z<1) = Φ(1) − Φ(−1)
= 0.8413 − 0.1587 = 0.6826
Number of students = 500 × 0.6826 ≈ 341 students
Common z-values to memorize:
z = 1.28 → P = 0.90 | z = 1.645 → P = 0.95 | z = 1.96 → P = 0.975 | z = 2.33 → P = 0.99 | z = 2.576 → P = 0.995
3. Moments of Normal Distribution — All Central & Raw Moments 8M
Central Moments Using MGF

For standard normal Z ~ N(0,1), M_Z(t) = e^(t²/2). Expanding as a power series:

M_Z(t) = e^(t²/2) = 1 + t²/2 + (t²/2)²/2! + ... = Σ t^(2k)/(2^k · k!)
From the series: coefficient of t^r/r! gives μ'_r
μ'₁ = 0 (Mean of Z)
μ'₂ = 1 (Variance of Z = 1)
μ'₃ = 0 (all odd moments = 0)
μ'₄ = 3 (from coefficient of t⁴: 3/8 × 4! = 3)
MomentFor N(0,1)For N(μ,σ²)
μ₁ (mean)0μ
μ₂ (variance)1σ²
μ₃00
μ₄33σ⁴
β₁ = μ₃²/μ₂³00 (symmetric)
β₂ = μ₄/μ₂²33 (mesokurtic)
General formula: For N(μ, σ²), the (2n)th central moment = 1·3·5···(2n−1)·σ²ⁿ = (2n)!/(2ⁿ·n!) · σ²ⁿ

Unit 4: Testing of Hypothesis

t-test (Single Mean, Difference, Paired) • F-test (Variance Ratio) • Chi-Square Test

2-Mark Q&A 10 Questions
1. Define Null and Alternative Hypothesis. HOT
  • Null Hypothesis (H₀): A statement of no difference or no effect. It is the hypothesis being tested. Assumed true until evidence suggests otherwise. Example: H₀: μ = 50
  • Alternative Hypothesis (H₁ or Hₐ): The claim accepted if H₀ is rejected. Example: H₁: μ ≠ 50 (two-tailed) or H₁: μ > 50 (one-tailed)
2. Define Level of Significance (α) and Critical Region. HOT
  • Level of Significance (α): The probability of rejecting H₀ when it is actually true (Type I error probability). Common values: α = 0.05 (5%) or α = 0.01 (1%)
  • Critical Region (Rejection Region): The set of values of the test statistic for which H₀ is rejected.
  • Critical Value: The boundary value separating acceptance and rejection regions.
3. Define Type I and Type II errors. HOT
ErrorWhen?ProbabilityName
Type I (α)Reject H₀ when H₀ is TRUEα (LOS)False Positive
Type II (β)Accept H₀ when H₀ is FALSEβFalse Negative
  • Power of the test = 1 − β = P(reject H₀ | H₁ is true)
4. Define t-distribution. When is it used? HOT
Student's t-distribution is a probability distribution used for small samples (n < 30) when population standard deviation σ is unknown and the population is normally distributed.
t = (X̄ − μ) / (s/√n)  ~ t(n−1 df)
where s = sample standard deviation = √[Σ(xᵢ−x̄)²/(n−1)]
5. What is a Paired t-test? When is it used? HOT
Paired t-test is used when two related samples are compared — the same subjects measured twice (before/after), or matched pairs.
t = d̄ / (sd/√n)  ~ t(n−1 df)
  • d = difference for each pair (d = x₁ − x₂)
  • d̄ = mean of differences
  • sd = standard deviation of differences
6. Define F-distribution. What is the F-test used for? HOT
The F-distribution is the ratio of two independent chi-square variates divided by their degrees of freedom.
F = s₁² / s₂²  (put larger variance in numerator)
  • Used for Variance Ratio Test — testing if two populations have equal variances
  • df = (n₁−1, n₂−1)
  • F ≥ 1 always (numerator has larger variance)
  • Also used in ANOVA
7. Define Chi-Square test. State its uses. HOT
χ² = Σ (O − E)² / E
where O = Observed frequency, E = Expected frequency.
  • Test for Independence: Tests if two attributes are independent (contingency table)
  • Goodness of Fit: Tests if observed data fits a theoretical distribution
  • df for independence = (r−1)(c−1)
  • df for goodness of fit = n−1 (or n−k−1 if k parameters estimated)
8. Distinguish between one-tailed and two-tailed tests. HOT
FeatureOne-tailedTwo-tailed
H₁μ > μ₀ or μ < μ₀μ ≠ μ₀
Critical regionOne side onlyBoth sides
Critical value (α=0.05)t = ±1.645t = ±1.96
UseDirection knownDirection unknown
9. State the formula for t-test for difference of two means (independent samples).
For testing H₀: μ₁ = μ₂ with two independent samples:
t = (X̄₁ − X̄₂) / [sp · √(1/n₁ + 1/n₂)]
where the pooled variance:
sp² = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁+n₂−2)
Degrees of freedom = n₁ + n₂ − 2
10. State the conditions for applying Chi-Square test. HOT
  • Observations must be independent
  • Total frequency N must be reasonably large (N ≥ 50)
  • No expected frequency should be less than 5. If E < 5, merge adjacent classes (pooling)
  • Data must be in frequencies (not percentages or ratios)
  • Sample must be drawn by random sampling

8-Mark Topics

1. t-Test: Single Mean & Difference of Means — Theory & Numericals 8M
t-Test for Single Mean

Tests whether sample mean x̄ differs significantly from hypothesised population mean μ₀.

H₀: μ = μ₀  |  Test Statistic: t = (X̄ − μ₀) / (s/√n)  ~ t(n−1)
s = √[Σ(xᵢ−x̄)²/(n−1)]  or  s = √[(Σxᵢ² − n·x̄²)/(n−1)]
Rejection rule: Reject H₀ if |tcalc| > ttable(α, n−1 df)
Numerical Example — Single Mean

A sample of 10 observations has mean 52 and SD 8. Test H₀: μ = 50 at 5% level of significance.

n=10, x̄=52, s=8, μ₀=50, α=0.05
t = (x̄ − μ₀)/(s/√n) = (52−50)/(8/√10) = 2/(8/3.162) = 2/2.530 = 0.790
df = n−1 = 9
t_table(0.05, 9df, two-tailed) = 2.262
|t_calc| = 0.790 < 2.262 → Do NOT reject H₀

Conclusion: There is no significant difference between sample mean and population mean at 5% level.

t-Test for Difference of Two Means (Independent Samples)
t = (X̄₁ − X̄₂) / [sp√(1/n₁+1/n₂)]  where  sp² = [(n₁−1)s₁²+(n₂−1)s₂²]/(n₁+n₂−2)

Numerical: n₁=8, x̄₁=14.5, s₁²=4 | n₂=10, x̄₂=12.8, s₂²=3.5. Test H₀: μ₁=μ₂ at 5%.

s_p² = [(8−1)×4 + (10−1)×3.5]/(8+10−2) = [28+31.5]/16 = 59.5/16 = 3.719
s_p = √3.719 = 1.929
√(1/8+1/10) = √(0.125+0.1) = √0.225 = 0.474
t = (14.5−12.8)/(1.929×0.474) = 1.7/0.914 = 1.86
df = 8+10−2 = 16   t_table(0.05, 16df) = 2.120
|t_calc|=1.86 < 2.120 → Do NOT reject H₀ (no significant difference)
2. Paired t-Test — Theory & Numerical 8M
Paired t-Test Procedure
  1. Compute d = x₁ − x₂ for each pair
  2. Compute d̄ = Σd/n
  3. Compute sd = √[Σd²/n − (d̄)²] or = √[Σ(d−d̄)²/(n−1)]
  4. t = d̄/(sd/√n) with df = n−1
sd² = (Σd² − n·d̄²)/(n−1)  →  this is the shortcut
Numerical Example — Paired t-Test

A drug is given to 8 patients. Blood pressure (BP) is recorded before and after. Test if the drug reduces BP at 5% LOS.

PatientBefore (x₁)After (x₂)d=x₁−x₂
1145138749
2152143981
313813624
416015010100
5148140864
613012824
7155146981
8142137525
Sum52408
n=8, Σd=52, Σd²=408
d̄ = 52/8 = 6.5
s_d² = [Σd² − n·(d̄)²]/(n−1) = [408 − 8×42.25]/7 = [408−338]/7 = 70/7 = 10
s_d = √10 = 3.162
t = d̄/(s_d/√n) = 6.5/(3.162/√8) = 6.5/(3.162/2.828) = 6.5/1.118 = 5.814
df = n−1 = 7, t_table(0.05, 7df, one-tailed) = 1.895
|t_calc|=5.814 > 1.895 → REJECT H₀

Conclusion: The drug significantly reduces blood pressure at 5% level.

3. F-Test (Variance Ratio Test) — Theory & Numerical 8M
F-Test Procedure

Tests H₀: σ₁² = σ₂² (two population variances are equal).

F = s₁²/s₂²  (always put larger variance in numerator, so F ≥ 1)
  • df₁ = n₁ − 1 (numerator), df₂ = n₂ − 1 (denominator)
  • Reject H₀ if Fcalc > Ftable(df₁, df₂) at α level
Numerical Example — F-Test

Sample 1: n₁=10, s₁²=28.5 | Sample 2: n₂=14, s₂²=12.6. Test equality of variances at 5%.

H₀: σ₁² = σ₂²   H₁: σ₁² ≠ σ₂²
Since s₁² > s₂², put s₁² in numerator
F = s₁²/s₂² = 28.5/12.6 = 2.262
df₁ = 10−1 = 9, df₂ = 14−1 = 13
F_table(9, 13) at 5% = 2.71 [from F-table]
F_calc = 2.262 < 2.71 → Do NOT reject H₀ (variances are equal)
4. Chi-Square Test — Independence & Goodness of Fit 8M
Chi-Square Test for Independence of Attributes
χ² = Σ (O−E)²/E  ~ χ²(r−1)(c−1)

Expected frequency: Eij = (Row i total × Column j total) / Grand Total N

Numerical: 200 patients classified by gender and recovery:

Gender / RecoveryRecoveredNot RecoveredTotal
Male7030100
Female6040100
Total13070200
E(Male, Recovered) = 100×130/200 = 65
E(Male, Not) = 100×70/200 = 35
E(Female, Recovered) = 100×130/200 = 65
E(Female, Not) = 100×70/200 = 35
χ² = (70−65)²/65 + (30−35)²/35 + (60−65)²/65 + (40−35)²/35
= 25/65 + 25/35 + 25/65 + 25/35 = 0.385+0.714+0.385+0.714 = 2.198
df = (2−1)(2−1) = 1, χ²_table(0.05, 1df) = 3.841
χ²_calc = 2.198 < 3.841 → Do NOT reject H₀ (attributes are independent)
2×2 Shortcut Formula: χ² = N(ad−bc)² / [(a+b)(c+d)(a+c)(b+d)]
Here: N=200, a=70, b=30, c=60, d=40
χ² = 200×(70×40−30×60)² / (100×100×130×70) = 200×(2800−1800)² / 91000000 = 200×1000000/91000000 ≈ 2.198 ✓
Chi-Square Goodness of Fit Test

Tests whether observed data follows a specified theoretical distribution.

Example: A die is thrown 120 times. Test if the die is fair (uniform distribution expected).

Face123456
Observed (O)251715232416
Expected (E)202020202020
χ² = (25−20)²/20 + (17−20)²/20 + (15−20)²/20 + (23−20)²/20 + (24−20)²/20 + (16−20)²/20
= 25/20 + 9/20 + 25/20 + 9/20 + 16/20 + 16/20
= (25+9+25+9+16+16)/20 = 100/20 = 5.0
df = 6−1 = 5, χ²_table(0.05, 5df) = 11.07
χ²_calc = 5.0 < 11.07 → Do NOT reject H₀ (die is fair)

Unit 5: Design of Experiments

ANOVA (One-Way & Two-Way) • CRD • RBD • LSD — Full ANOVA Tables with Numericals

2-Mark Q&A 10 Questions
1. Define Analysis of Variance (ANOVA). HOT
ANOVA is a statistical technique used to test whether the means of three or more populations are equal by analysing the variation in data and partitioning it into components.
  • Total Variation = Variation due to treatments + Variation due to error (random)
  • Uses the F-statistic (ratio of treatment variance to error variance)
  • H₀: μ₁ = μ₂ = ... = μₖ (all treatment means are equal)
2. State the basic assumptions of ANOVA. HOT
  • Normality: Each population follows normal distribution
  • Homogeneity of variance: All populations have the same variance (σ²)
  • Independence: Observations are independent of each other
  • Additivity: Effects are additive (no interaction, in two-way)
3. What is Completely Randomized Design (CRD)? HOT
CRD is the simplest experimental design where treatments are assigned completely at random to all experimental units. It controls only one source of variation (treatment).
  • Used when experimental units are homogeneous
  • One-way ANOVA is applied
  • df: Treatment = k−1, Error = N−k, Total = N−1 (k = number of treatments, N = total observations)
4. What is Randomized Block Design (RBD)? HOT
RBD is a design where experimental units are grouped into homogeneous blocks (to control one extraneous variable), and treatments are randomly assigned within each block.
  • Controls for one extraneous variable (block effect)
  • Two-way ANOVA (without interaction) applied
  • df: Treatment = k−1, Block = b−1, Error = (k−1)(b−1), Total = kb−1
5. What is Latin Square Design (LSD)? HOT
LSD is a design that controls for two extraneous variables simultaneously (rows and columns), with treatments arranged so each appears exactly once in each row and column.
  • p×p square: p treatments, p rows, p columns
  • df: Treatment = p−1, Row = p−1, Column = p−1, Error = (p−1)(p−2), Total = p²−1
  • More efficient than RBD when two blocking factors exist
6. What is the Correction Factor (CF)? Write its formula. HOT
The Correction Factor (CF) is used to simplify ANOVA calculations:
CF = T² / N = (Grand Total)² / (Total number of observations)
  • TSS = ΣΣ x²ᵢⱼ − CF   (Total Sum of Squares)
  • SST = Σ (Tᵢ²/nᵢ) − CF   (Sum of Squares due to Treatments)
  • SSE = TSS − SST   (Error Sum of Squares)
7. Write the ANOVA table for CRD (One-Way). HOT
SourceSSdfMSF
TreatmentSSTk−1MST=SST/(k−1)MST/MSE
ErrorSSEN−kMSE=SSE/(N−k)
TotalTSSN−1
Reject H₀ if Fcalc > Ftable(k−1, N−k) at α level.
8. Write the ANOVA table for RBD (Two-Way without Interaction). HOT
SourceSSdfMSF
TreatmentSSTk−1MSTMST/MSE
BlockSSBb−1MSBMSB/MSE
ErrorSSE(k−1)(b−1)MSE
TotalTSSkb−1
9. Compare CRD, RBD and LSD. HOT
FeatureCRDRBDLSD
BlockingNoneOne directionTwo directions
Extraneous variables012
Error dfN−k(k−1)(b−1)(p−1)(p−2)
EfficiencyLowestMediumHighest
ANOVA typeOne-wayTwo-wayTwo-way+
10. Write the formulae for SSB and SSC in Latin Square Design.
For a p×p LSD with grand total T and N=p²:
CF = T²/N = T²/p²
SSR (Rows) = (1/p)·Σ Rᵢ² − CF   where Rᵢ = sum of row i
SSC (Columns) = (1/p)·Σ Cⱼ² − CF   where Cⱼ = sum of column j
SST (Treatments) = (1/p)·Σ Tₖ² − CF   where Tₖ = sum of treatment k
SSE = TSS − SSR − SSC − SST

8-Mark Topics

1. One-Way ANOVA (CRD) — Theory & Full Numerical 8M
Key Formulas for One-Way ANOVA
CF = T²/N   (T = Grand Total, N = total observations)
TSS = ΣΣ xᵢⱼ² − CF
SST = Σ(Tᵢ²/nᵢ) − CF   (Tᵢ = treatment total, nᵢ = treatment size)
SSE = TSS − SST
MST = SST/(k−1),  MSE = SSE/(N−k)
F = MST/MSE  ~ F(k−1, N−k)
Worked Numerical — CRD

Three fertilisers are applied to crops in 4 plots each. Yields (kg) are: A: 14,16,18,12 | B: 10,12,14,10 | C: 20,18,22,16. Test at 5% if yields differ.

ABC
Obs 1141020
Obs 2161218
Obs 3181422
Obs 4121016
Total (Tᵢ)604676
k=3, N=12, Grand Total T=60+46+76=182
CF = T²/N = 182²/12 = 33124/12 = 2760.33
TSS = (14²+16²+18²+12²+10²+12²+14²+10²+20²+18²+22²+16²) − CF
= (196+256+324+144+100+144+196+100+400+324+484+256) − 2760.33
= 2924 − 2760.33 = 163.67
SST = (60²/4 + 46²/4 + 76²/4) − CF = (900+529+1444) − 2760.33 = 2873 − 2760.33 = 112.67
SSE = TSS − SST = 163.67 − 112.67 = 51.00
MST = 112.67/(3−1) = 112.67/2 = 56.33
MSE = 51.00/(12−3) = 51.00/9 = 5.67
F = 56.33/5.67 = 9.93
F_table(2, 9) at 5% = 4.26
F_calc=9.93 > 4.26 → REJECT H₀ — Significant difference in yields
SourceSSdfMSF
Treatment112.67256.339.93*
Error51.0095.67
Total163.6711

* Significant at 5% level. Fertiliser type significantly affects yield.

2. Two-Way ANOVA (RBD) — Theory & Full Numerical 8M
Key Formulas for RBD
k = number of treatments, b = number of blocks, N = kb
CF = T²/N
TSS = ΣΣ xᵢⱼ² − CF
SST (Treatment) = (1/b)·ΣTᵢ² − CF   (Tᵢ = treatment column total)
SSB (Block) = (1/k)·ΣBⱼ² − CF   (Bⱼ = block row total)
SSE = TSS − SST − SSB
df: Treatment=k−1, Block=b−1, Error=(k−1)(b−1), Total=N−1
Worked Numerical — RBD

Four varieties of rice (T1,T2,T3,T4) tested in 3 blocks. Yield data (quintals):

Block\TreatmentT1T2T3T4Block Total (Bⱼ)
B12523202795
B22219182483
B328262329106
Treat Total (Tᵢ)75686180T=284
k=4 treatments, b=3 blocks, N=12
CF = 284²/12 = 80656/12 = 6721.33
TSS = (25²+22²+28²+23²+19²+26²+20²+18²+23²+27²+24²+29²) − CF
= (625+484+784+529+361+676+400+324+529+729+576+841) − 6721.33
= 6858 − 6721.33 = 136.67
SST = (75²+68²+61²+80²)/3 − CF = (5625+4624+3721+6400)/3 − 6721.33
= 20370/3 − 6721.33 = 6790 − 6721.33 = 68.67
SSB = (95²+83²+106²)/4 − CF = (9025+6889+11236)/4 − 6721.33
= 27150/4 − 6721.33 = 6787.5 − 6721.33 = 66.17
SSE = 136.67 − 68.67 − 66.17 = 1.83
MST=68.67/3=22.89, MSB=66.17/2=33.09, MSE=1.83/6=0.305
F(treatment)=22.89/0.305=75.05, F(block)=33.09/0.305=108.5
F_table(3,6)=4.76, F_table(2,6)=5.14 at 5%
Both F values >> table → Significant treatment AND block effects
SourceSSdfMSF
Treatment68.67322.8975.05*
Block66.17233.09108.5*
Error1.8360.305
Total136.6711
3. Latin Square Design (LSD) — Theory & Full Numerical 8M
Structure & Key Formulas

In a p×p LSD, each treatment appears exactly once in each row and each column.

CF = T²/p²   (T = Grand Total)
TSS = ΣΣ xᵢⱼ² − CF
SSR = (1/p)·ΣRᵢ² − CF   (Row sums)
SSC = (1/p)·ΣCⱼ² − CF   (Column sums)
SST = (1/p)·ΣTₖ² − CF   (Treatment sums)
SSE = TSS − SSR − SSC − SST
df: Row=p−1, Col=p−1, Treatment=p−1, Error=(p−1)(p−2), Total=p²−1
Worked Numerical — 3×3 LSD

A 3×3 Latin Square experiment on crop yield (A, B, C = three fertilisers):

\Col 1Col 2Col 3Row Total
Row 1A=17B=14C=12R₁=43
Row 2B=13C=11A=16R₂=40
Row 3C=10A=18B=15R₃=43
Col TotalC₁=40C₂=43C₃=43T=126

Treatment totals: A=17+16+18=51, B=14+13+15=42, C=12+11+10=33

p=3, N=9, T=126, CF=126²/9=15876/9=1764
TSS=(17²+14²+12²+13²+11²+16²+10²+18²+15²)−CF
=(289+196+144+169+121+256+100+324+225)−1764=1824−1764=60
SSR=(43²+40²+43²)/3−CF=(1849+1600+1849)/3−1764=5298/3−1764=1766−1764=2
SSC=(40²+43²+43²)/3−CF=(1600+1849+1849)/3−1764=5298/3−1764=2
SST=(51²+42²+33²)/3−CF=(2601+1764+1089)/3−1764=5454/3−1764=1818−1764=54
SSE=60−2−2−54=2
MST=54/2=27, MSE=2/2=1, F=27/1=27
F_table(2,2) at 5% = 19.00
F_calc=27 > 19 → REJECT H₀ — Fertiliser effect is significant
SourceSSdfMSF
Row2211
Column2211
Treatment5422727*
Error221
Total608

Unit 6: Statistical Quality Control

Process Control • X̄ & R Charts • p-chart • np-chart • c-chart — Control Limit Formulas + Numericals

2-Mark Q&A 10 Questions
1. Define Statistical Quality Control (SQC). HOT
SQC is the application of statistical methods to monitor and control a manufacturing process to ensure the product meets quality standards.
  • Distinguishes between chance (common) causes and assignable (special) causes of variation
  • Main tool: Control Charts (Shewhart charts)
  • Goal: Keep process in a state of statistical control
2. What is a Control Chart? Name its main components. HOT
A control chart is a graph that plots a quality characteristic over time with three horizontal lines:
  • UCL — Upper Control Limit (3σ above centre)
  • CL — Centre Line (process mean)
  • LCL — Lower Control Limit (3σ below centre)
If all points lie within UCL and LCL → process is in control. Any point outside → out of control (assignable cause present).
3. Distinguish between Variable and Attribute Control Charts. HOT
Variable ChartsAttribute Charts
For measurable characteristics (length, weight)For counted characteristics (defects, defectives)
X̄-chart, R-chartp-chart, np-chart, c-chart
Used when continuous measurement possibleUsed for go/no-go, pass/fail data
4. Write the control limits for X̄-chart and R-chart. HOT
ChartUCLCLLCL
X̄-chartX̄̄ + A₂·R̄X̄̄ (Grand Mean)X̄̄ − A₂·R̄
R-chartD₄·R̄R̄ (Mean Range)D₃·R̄
  • X̄̄ = mean of sample means, R̄ = mean of sample ranges
  • A₂, D₃, D₄ are control chart constants depending on subgroup size n
  • Common constants for n=5: A₂=0.577, D₃=0, D₄=2.115
5. Write the control limits for p-chart (fraction defective). HOT
The p-chart monitors the proportion (fraction) of defective items in a sample.
p̄ = Total defectives / Total items inspected = Σdᵢ / Σnᵢ
UCLCLLCL
p̄ + 3√(p̄(1−p̄)/n)p̄ − 3√(p̄(1−p̄)/n) (min 0)
If LCL is negative, take LCL = 0.
6. Write the control limits for np-chart (number defective). HOT
The np-chart monitors the number of defective items (used when sample size n is constant).
np̄ = Total defectives / Number of samples = Σdᵢ/k
UCLCLLCL
np̄ + 3√(np̄(1−p̄))np̄np̄ − 3√(np̄(1−p̄)) (min 0)
where p̄ = np̄/n.
7. Write the control limits for c-chart (number of defects). HOT
The c-chart monitors the number of defects per unit (based on Poisson distribution).
c̄ = Total defects / Number of units = Σcᵢ/k
UCLCLLCL
c̄ + 3√c̄c̄ − 3√c̄ (min 0)
  • Difference from p/np: c-chart counts defects (not defectives); one item can have multiple defects
8. Distinguish between p-chart and c-chart. HOT
p-chartc-chart
Fraction/proportion of defective itemsNumber of defects per unit
Based on Binomial distributionBased on Poisson distribution
Sample size can varyUnit of inspection is constant
Example: % rejected bolts per batchExample: scratches per car door
9. What are chance causes and assignable causes of variation? HOT
  • Chance (Common) Causes: Natural, unavoidable variation inherent in any process. Process is still "in control". Cannot be eliminated without redesigning the process. Example: minor machine vibration, raw material variation.
  • Assignable (Special) Causes: Specific, identifiable causes that push the process out of control. Can and should be identified and eliminated. Example: worn tool, untrained operator, faulty material batch.
10. Give the control chart constants A₂, D₃, D₄ for n=4 and n=5. HOT
nA₂D₃D₄
21.88003.267
31.02302.574
40.72902.282
50.57702.115
60.48302.004
Note: D₃=0 for n≤6 means LCL of R-chart = 0 (no lower limit needed).

8-Mark Topics

1. X̄-Chart and R-Chart — Theory & Full Numerical 8M
Control Limits Summary
X̄-chart: CL = X̄̄ | UCL = X̄̄ + A₂·R̄ | LCL = X̄̄ − A₂·R̄
R-chart: CL = R̄ | UCL = D₄·R̄ | LCL = D₃·R̄
Worked Numerical — X̄ and R Charts (n=5)

10 samples of size 5 are taken. Sample means (X̄) and ranges (R) are:

Sample12345678910
42454143464440434541
R5647654654
ΣX̄ = 42+45+41+43+46+44+40+43+45+41 = 430
X̄̄ = 430/10 = 43.0 (Grand Mean)
ΣR = 5+6+4+7+6+5+4+6+5+4 = 52
R̄ = 52/10 = 5.2 (Mean Range)
For n=5: A₂=0.577, D₃=0, D₄=2.115
X̄-chart: CL=43.0, UCL=43+0.577×5.2=43+3.0=46.0, LCL=43−3.0=40.0
R-chart: CL=5.2, UCL=2.115×5.2=11.0, LCL=0×5.2=0
Check: Sample 5 has X̄=46 = UCL exactly (boundary — watch carefully). All R values between 0 and 11. Process is IN CONTROL.
2. p-Chart (Fraction Defective) — Theory & Full Numerical 8M
When to Use p-chart
  • Quality characteristic is attribute (defective / non-defective)
  • Sample size n can be variable or constant
  • Based on Binomial distribution
p̄ = Σdᵢ / Σnᵢ   |   UCL = p̄+3√(p̄q̄/n)   |   LCL = p̄−3√(p̄q̄/n)   |   q̄=1−p̄
Worked Numerical — p-Chart (Constant n)

10 batches of 100 items each inspected. Number of defectives:

Batch12345678910
Defectives (d)7548639549
n=100, k=10, Σd=7+5+4+8+6+3+9+5+4+9=60
p̄ = 60/(10×100) = 60/1000 = 0.06
q̄ = 1 − 0.06 = 0.94
√(p̄q̄/n) = √(0.06×0.94/100) = √(0.0564/100) = √0.000564 = 0.02375
UCL = 0.06 + 3×0.02375 = 0.06 + 0.0713 = 0.1313
LCL = 0.06 − 0.0713 = −0.0113 → take LCL = 0
CL=0.06, UCL=0.1313, LCL=0

Batch proportions: 0.07, 0.05, 0.04, 0.08, 0.06, 0.03, 0.09, 0.05, 0.04, 0.09 — all within [0, 0.1313]. Process IN CONTROL.

3. c-Chart (Number of Defects) & np-Chart — Theory & Numerical 8M
c-Chart — Number of Defects per Unit
c̄ = Σcᵢ/k   |   UCL = c̄+3√c̄   |   CL = c̄   |   LCL = c̄−3√c̄ (min 0)

Numerical: Number of defects (scratches) found in 10 car panels:

Panel12345678910
Defects (c)4362547342
k=10, Σc=4+3+6+2+5+4+7+3+4+2=40
c̄ = 40/10 = 4.0
√c̄ = √4 = 2.0
UCL = 4 + 3×2 = 4+6 = 10
LCL = 4 − 3×2 = 4−6 = −2 → take LCL = 0
CL=4, UCL=10, LCL=0

All 10 panels have defects between 0 and 10. Process is IN CONTROL.

np-Chart — Number Defective (Constant n)
np̄ = Σdᵢ/k   |   UCL = np̄+3√(np̄·q̄)   |   LCL = np̄−3√(np̄·q̄)

Use np-chart instead of p-chart when sample size n is constant and you prefer plotting the actual count (not fraction). Both charts give the same conclusions — np-chart is simpler to compute.

For the p-chart example above (n=100, p̄=0.06, q̄=0.94):

np̄ = 100×0.06 = 6.0
√(np̄·q̄) = √(6×0.94) = √5.64 = 2.375
UCL = 6 + 3×2.375 = 6+7.125 = 13.125
LCL = 6 − 7.125 = −1.125 → take LCL = 0 | CL=6
Quick Summary — Which chart to use?
Measurement data → X̄ & R charts
Proportion defective, variable n → p-chart
Count defective, fixed n → np-chart
Count defects per unit → c-chart

QB Solutions — Unit 1: Correlation & Regression

Common 8-mark and 13-mark questions with full solutions

Q1. The following data gives the experience (in years) and salary (in thousands) of 6 employees. Find Karl Pearson's correlation coefficient. 13M TYPE

Experience (X): 5, 3, 7, 2, 8, 6 | Salary (Y): 40, 30, 55, 20, 60, 45

XYXY
540251600200
330990090
755493025385
220440040
860643600480
645362025270
31250187115501465
n=6, ΣX=31, ΣY=250, ΣX²=187, ΣY²=11550, ΣXY=1465
n·ΣXY − ΣX·ΣY = 6×1465 − 31×250 = 8790 − 7750 = 1040
n·ΣX² − (ΣX)² = 6×187 − 961 = 1122 − 961 = 161
n·ΣY² − (ΣY)² = 6×11550 − 62500 = 69300 − 62500 = 6800
√(161 × 6800) = √1094800 ≈ 1046.33
r = 1040/1046.33 ≈ 0.994 (Very high positive correlation)

Strong positive correlation — as experience increases, salary increases significantly.

Q2. Calculate Spearman's rank correlation for the following data on marks in two subjects. Marks in Maths: 78, 89, 56, 45, 90, 70. Marks in Science: 84, 92, 60, 48, 88, 75. 8M
Math (X)Science (Y)Rank X (R₁)Rank Y (R₂)d=R₁−R₂
78843300
89922200
56605500
45486600
90881100
70754400
Σd²0
n=6, Σd²=0
ρ = 1 − 6×0 / (6×35) = 1 − 0 = 1
ρ = 1.0 — Perfect positive rank correlation
Q3. From the following data, find (i) the two regression equations (ii) estimate Y when X=20 (iii) estimate X when Y=25. n=5, X̄=10, Ȳ=14, σx=3, σy=4, r=0.8. 8M
b_yx = r·(σy/σx) = 0.8 × (4/3) = 0.8 × 1.333 = 1.067
b_xy = r·(σx/σy) = 0.8 × (3/4) = 0.8 × 0.75 = 0.6
Regression of Y on X: y − 14 = 1.067(x − 10)
→ y = 1.067x − 10.67 + 14 = 1.067x + 3.33
Regression of X on Y: x − 10 = 0.6(y − 14)
→ x = 0.6y − 8.4 + 10 = 0.6y + 1.6
When X=20: y = 1.067×20 + 3.33 = 21.34 + 3.33 = 24.67
When Y=25: x = 0.6×25 + 1.6 = 15 + 1.6 = 16.6
Verify r: r = √(b_yx × b_xy) = √(1.067 × 0.6) = √0.64 = 0.8 ✓

QB Solutions — Unit 2: Probability & Random Variables

Bayes' Theorem, PMF/PDF problems, MGF — full solutions

Q1. In a bolt factory, machines A, B, C produce 25%, 35%, 40% of the total production. Of their outputs, 5%, 4%, 2% are defective bolts. A bolt is drawn at random and found defective. Find the probability it was produced by machine A. 8M
P(A)=0.25, P(B)=0.35, P(C)=0.40
P(D|A)=0.05, P(D|B)=0.04, P(D|C)=0.02
P(D) = 0.25×0.05 + 0.35×0.04 + 0.40×0.02
P(D) = 0.0125 + 0.0140 + 0.0080 = 0.0345
P(A|D) = P(D|A)·P(A)/P(D) = (0.05×0.25)/0.0345
P(A|D) = 0.0125/0.0345 = 0.3623 (36.23%)
Q2. A random variable X has the PDF f(x) = cx(2−x) for 0 ≤ x ≤ 2, zero otherwise. Find c, F(x), P(X < 1), Mean and Variance. 13M TYPE
∫₀² cx(2−x)dx = 1
c∫₀²(2x−x²)dx = c[x²−x³/3]₀² = c[4 − 8/3] = c[4/3] = 4c/3 = 1
c = 3/4
f(x) = (3/4)x(2−x) = (3/4)(2x−x²)
F(x) = ∫₀ˣ (3/4)(2t−t²)dt = (3/4)[t²−t³/3]₀ˣ = (3/4)(x²−x³/3) = 3x²/4 − x³/4
P(X<1) = F(1) = 3/4 − 1/4 = 2/4 = 0.5
E(X) = ∫₀² x·(3/4)(2x−x²)dx = (3/4)∫₀²(2x²−x³)dx
= (3/4)[2x³/3−x⁴/4]₀² = (3/4)[16/3−4] = (3/4)(4/3) = 1
E(X²) = ∫₀² x²·(3/4)(2x−x²)dx = (3/4)[x⁴/2−x⁵/5]₀² = (3/4)[8−32/5] = (3/4)(8/5) = 6/5
Mean = 1, Var = E(X²)−[E(X)]² = 6/5 − 1 = 1/5 = 0.2
Q3. A discrete RV X has PMF: P(X=x) = (1/2)^x for x=1,2,3,... Find the MGF and hence the Mean. 8M
M_X(t) = Σₓ₌₁^∞ e^(tx)·(1/2)^x = Σ(e^t/2)^x
= (e^t/2)/(1 − e^t/2) [geometric series, valid for |e^t/2|<1, i.e., t<ln2]
M_X(t) = e^t/(2−e^t)
M'_X(t) = [e^t(2−e^t) − e^t(−e^t)] / (2−e^t)²
= [2e^t − e^(2t) + e^(2t)] / (2−e^t)²
= 2e^t / (2−e^t)²
Mean = M'_X(0) = 2×1/(2−1)² = 2/1 = 2

QB Solutions — Unit 3: Normal Distribution

Probability calculations, moments, MGF — full solutions

Q1. The marks of 1000 students in an examination follow a normal distribution with mean 70 and standard deviation 10. Find the number of students who scored (i) less than 55, (ii) between 60 and 80, (iii) more than 90. 8M

X ~ N(70, 100), μ=70, σ=10, N=1000

(i) P(X<55): Z=(55−70)/10 = −1.5
P(X<55) = P(Z<−1.5) = 1−Φ(1.5) = 1−0.9332 = 0.0668
Students: 1000×0.0668 = 67 students
(ii) P(60<X<80): Z₁=(60−70)/10=−1, Z₂=(80−70)/10=+1
P = Φ(1)−Φ(−1) = 0.8413−0.1587 = 0.6826
Students: 1000×0.6826 = 683 students
(iii) P(X>90): Z=(90−70)/10=2.0
P(X>90) = 1−Φ(2.0) = 1−0.9772 = 0.0228
Students: 1000×0.0228 = 23 students
Q2. Find the MGF of Normal Distribution X ~ N(μ, σ²) and hence find its mean and variance. 8M
M_X(t) = E(e^tX) = ∫₋∞^∞ e^tx · (1/σ√2π)·e^{−(x−μ)²/2σ²} dx
Combining exponents: tx − (x−μ)²/2σ²
Complete the square: = −[x−(μ+σ²t)]²/(2σ²) + μt + σ²t²/2
M_X(t) = e^{μt+σ²t²/2} · ∫₋∞^∞ (1/σ√2π)e^{−[x−(μ+σ²t)]²/2σ²} dx
The integral = 1 (normal PDF integrates to 1)
M_X(t) = e^{μt + σ²t²/2}
M'_X(t) = (μ+σ²t)·e^{μt+σ²t²/2} → Mean = M'_X(0) = μ
M''_X(t) = [σ²+(μ+σ²t)²]·e^{μt+σ²t²/2} → E(X²) = M''_X(0) = σ²+μ²
Variance = E(X²)−[E(X)]² = σ²+μ²−μ² = σ² ✓
Q3. For a normal distribution with mean 5 and variance 9, find (i) P(X>8), (ii) P(3<X<7), (iii) the value of x₀ such that P(X<x₀) = 0.90. 8M

X ~ N(5, 9), μ=5, σ=3

(i) P(X>8): Z=(8−5)/3 = 1.0
P(X>8)=1−Φ(1)=1−0.8413=0.1587
(ii) P(3<X<7): Z₁=(3−5)/3=−0.667, Z₂=(7−5)/3=0.667
P=Φ(0.667)−Φ(−0.667)=2Φ(0.667)−1=2(0.7476)−1=0.4952
(iii) P(X<x₀)=0.90 → Φ(z₀)=0.90 → z₀=1.28
1.28 = (x₀−5)/3 → x₀ = 5 + 3×1.28
x₀ = 5 + 3.84 = 8.84

QB Solutions — Unit 4: Testing of Hypothesis

t-test, F-test, Chi-square — full step-by-step solutions

Q1. A sample of 16 items gives mean = 2.5 kg and SD = 2.5 kg. Can this sample be regarded as taken from a population with mean 3 kg? Test at 5% level. 8M
H₀: μ=3, H₁: μ≠3 (two-tailed test)
n=16, x̄=2.5, s=2.5, μ₀=3, α=0.05
t = (x̄−μ₀)/(s/√n) = (2.5−3)/(2.5/√16) = −0.5/(2.5/4) = −0.5/0.625 = −0.8
|t_calc| = 0.8, df = n−1 = 15
t_table(0.05, 15df, two-tailed) = 2.131
|t_calc|=0.8 < 2.131 → Do NOT reject H₀

Conclusion: The sample could have come from a population with mean 3 kg. No significant difference at 5% level.

Q2. Two types of drugs A and B were used on 5 and 7 patients respectively for reducing weight. Drug A gave mean reduction of 6.25 kg (s²=4.5) and Drug B gave 4.38 kg (s²=3.6). Test if drugs differ significantly (5%). 8M
H₀: μ₁=μ₂, H₁: μ₁≠μ₂ | n₁=5, x̄₁=6.25, s₁²=4.5 | n₂=7, x̄₂=4.38, s₂²=3.6
s_p² = [(n₁−1)s₁²+(n₂−1)s₂²]/(n₁+n₂−2) = [4×4.5+6×3.6]/10 = [18+21.6]/10 = 3.96
s_p = √3.96 = 1.99
t = (x̄₁−x̄₂)/[s_p·√(1/n₁+1/n₂)] = (6.25−4.38)/[1.99·√(1/5+1/7)]
√(1/5+1/7)=√(0.2+0.143)=√0.343=0.586
t = 1.87/(1.99×0.586) = 1.87/1.166 = 1.603
df=5+7−2=10, t_table(0.05,10df)=2.228
t_calc=1.603 < 2.228 → Do NOT reject H₀ (drugs do not differ significantly)
Q3. In a survey of 200 persons, their opinion about a new tax policy was recorded. Test if opinion is independent of gender at 5% level. Male: For=80, Against=40 | Female: For=50, Against=30. 8M
ForAgainstTotal
Male80(O)40(O)120
Female50(O)30(O)80
Total13070200
H₀: Gender and opinion are independent
E(M,For)=120×130/200=78, E(M,Ag)=120×70/200=42
E(F,For)=80×130/200=52, E(F,Ag)=80×70/200=28
χ²=(80−78)²/78+(40−42)²/42+(50−52)²/52+(30−28)²/28
= 4/78 + 4/42 + 4/52 + 4/28 = 0.051+0.095+0.077+0.143
= 0.366
df=(2−1)(2−1)=1, χ²_table(0.05,1df)=3.841
0.366 < 3.841 → Do NOT reject H₀ (opinion is independent of gender)

QB Solutions — Unit 5: Design of Experiments

CRD, RBD, LSD — full ANOVA table solutions

Q1. The following data represents yield (kg) of crops under 4 treatments in 5 replications (CRD). Perform one-way ANOVA and test at 5% LOS. T1: 6,7,5,6,4 | T2: 8,6,7,9,7 | T3: 5,4,6,4,5 | T4: 9,10,8,9,10. 8M
T1T2T3T4
6859
76410
5768
6949
47510
T₁=28T₂=37T₃=24T₄=46
k=4, n=5 each, N=20, T=28+37+24+46=135
CF=135²/20=18225/20=911.25
TSS=(36+49+25+36+16+64+36+49+81+49+25+16+36+16+25+81+100+64+81+100)−CF
=(985)−911.25=73.75
SST=(28²+37²+24²+46²)/5−CF=(784+1369+576+2116)/5−911.25
=4845/5−911.25=969−911.25=57.75
SSE=73.75−57.75=16.00
MST=57.75/3=19.25 | MSE=16/16=1.00
F=19.25/1.00=19.25 | F_table(3,16) at 5%=3.24
F_calc=19.25 > 3.24 → REJECT H₀ — Treatment effects differ significantly
SourceSSdfMSF
Treatment57.75319.2519.25*
Error16.00161.00
Total73.7519
Q2. Three varieties of wheat (V1, V2, V3) are tested in 4 blocks (RBD). Yields: B1: 48,42,44 | B2: 50,44,46 | B3: 52,46,48 | B4: 46,40,42. Perform two-way ANOVA at 5%. 8M
BlockV1V2V3Block Total
B1484244134
B2504446140
B3524648146
B4464042128
Treat Total196172180T=548
k=3 varieties, b=4 blocks, N=12
CF=548²/12=300304/12=25025.33
TSS=(48²+42²+44²+50²+44²+46²+52²+46²+48²+46²+40²+42²)−CF
=(2304+1764+1936+2500+1936+2116+2704+2116+2304+2116+1600+1764)−25025.33
=25160−25025.33=134.67
SST=(196²+172²+180²)/4−CF=(38416+29584+32400)/4−25025.33
=100400/4−25025.33=25100−25025.33=74.67
SSB=(134²+140²+146²+128²)/3−CF=(17956+19600+21316+16384)/3−25025.33
=75256/3−25025.33=25085.33−25025.33=60.00
SSE=134.67−74.67−60.00=0.00 (ideal data)
MST=74.67/2=37.33, MSE≈0 (infinite F — treatment very significant)
Treatment is highly significant; Block effect is also significant
Q3. Explain the Latin Square Design with a 4×4 example. State its advantages and ANOVA table structure. 8M

Definition: LSD is a p×p arrangement where p treatments appear exactly once in each row and column, controlling two extraneous variables simultaneously.

Example 4×4 LSD layout:

Col1 Col2 Col3 Col4 Row1 [ A ][ B ][ C ][ D ] Row2 [ B ][ C ][ D ][ A ] Row3 [ C ][ D ][ A ][ B ] Row4 [ D ][ A ][ B ][ C ]

ANOVA Table for 4×4 LSD:

SourceSSdfMSF
RowsSSR3MSRMSR/MSE
ColumnsSSC3MSCMSC/MSE
TreatmentsSST3MSTMST/MSE
ErrorSSE6MSE
TotalTSS15

Advantages: Controls two sources of variation; smaller error MS → more sensitive test; efficient when p is small (3–8).

Limitations: Requires p² observations; number of treatments equals rows = columns; assumes no interaction.

QB Solutions — Unit 6: Statistical Quality Control

Control charts with full limit calculations and process-in-control checks

Q1. Samples of size 4 are drawn every hour from a process. The mean and range values for 10 samples are given below. Construct X̄ and R charts and comment on control. X̄: 14.5,14.8,15.2,15.0,14.6,14.9,15.1,14.7,15.3,14.9 | R: 0.5,0.6,0.4,0.7,0.5,0.6,0.5,0.4,0.6,0.4. 8M
ΣX̄=14.5+14.8+15.2+15.0+14.6+14.9+15.1+14.7+15.3+14.9=149.0
X̄̄=149.0/10=14.90
ΣR=0.5+0.6+0.4+0.7+0.5+0.6+0.5+0.4+0.6+0.4=5.2
R̄=5.2/10=0.52
For n=4: A₂=0.729, D₃=0, D₄=2.282
X̄-chart: UCL=14.90+0.729×0.52=14.90+0.379=15.279
X̄-chart: LCL=14.90−0.379=14.521, CL=14.90
R-chart: UCL=2.282×0.52=1.187, LCL=0, CL=0.52
All X̄ values in [14.521, 15.279] — X̄-chart: IN CONTROL
All R values in [0, 1.187] — R-chart: IN CONTROL

Conclusion: The process is in statistical control for both mean and variability.

Q2. The following table gives the number of defectives in 10 samples each of size 50. Draw the p-chart and state if the process is in control. Defectives: 3,5,2,6,4,3,7,4,5,3. 8M
n=50, k=10, Σd=3+5+2+6+4+3+7+4+5+3=42
p̄=42/(10×50)=42/500=0.084
q̄=1−0.084=0.916
√(p̄q̄/n)=√(0.084×0.916/50)=√(0.07694/50)=√0.001539=0.03923
UCL=0.084+3×0.03923=0.084+0.1177=0.2017
LCL=0.084−0.1177=−0.0337 → take 0
CL=0.084, UCL=0.2017, LCL=0

Sample proportions: 0.06, 0.10, 0.04, 0.12, 0.08, 0.06, 0.14, 0.08, 0.10, 0.06

All proportions lie within [0, 0.2017]. Process is IN CONTROL.

Q3. The number of defects observed in 12 units of cloth (each 50m length) are: 3,4,2,5,6,3,4,2,3,5,4,3. Construct c-chart and check for statistical control. 8M
k=12, Σc=3+4+2+5+6+3+4+2+3+5+4+3=44
c̄=44/12=3.667
√c̄=√3.667=1.914
UCL=3.667+3×1.914=3.667+5.742=9.409
LCL=3.667−5.742=−2.075 → take 0
CL=3.667, UCL=9.409, LCL=0

All defect counts (max=6) lie within [0, 9.409]. Process is IN CONTROL. No assignable causes detected.

End Semester Examination — Apr/May 2023
U18MAI4201 — Probability & Statistics | Regulation: 2018
ESE Apr/May 2023 Max: 100 Marks Time: 3 Hours
Part A Answer ALL — 2M each (10 × 2 = 20 Marks)
1. Define Spearman's rank correlation. Write its formula.
2M

A non-parametric measure of correlation between two ranked variables. Formula (no ties):

ρ = 1 − 6Σd²/[n(n²−1)]

d = difference in ranks; n = number of pairs. Range: −1 ≤ ρ ≤ +1.

2. When do the two regression lines coincide?
2M

The two regression lines coincide when r = +1 or r = −1 (perfect linear correlation). Both lines always pass through (x̄, ȳ).

3. Define probability density function (PDF) and state its properties.
2M

A function f(x) for a continuous RV X such that: f(x) ≥ 0, ∫₋∞^∞ f(x)dx = 1, P(a≤X≤b) = ∫ₐᵇ f(x)dx.

4. State Bayes' theorem.
2M

If B₁,...,Bₙ are mutually exclusive and exhaustive events and A is any event:

P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / Σᵢ P(A|Bᵢ)·P(Bᵢ)
5. Define Moment Generating Function (MGF).
2M
M_X(t) = E(e^{tX}) = Σ e^{tx}·p(x) [discrete]

The r-th raw moment: μ'ᵣ = [dʳM_X(t)/dtʳ]_{t=0}

6. State the properties of Normal distribution and the 68-95-99.7 area rule.
2M
  • Bell-shaped, symmetric about μ; Mean = Median = Mode = μ; Skewness = 0; Kurtosis β₂ = 3
  • P(μ±σ) = 68.27%; P(μ±2σ) = 95.45%; P(μ±3σ) = 99.73%
7. Define level of significance and critical region.
2M

Level of significance (α): Probability of rejecting H₀ when it is true (Type I error). Common: 5% (α=0.05), 1% (α=0.01).

Critical region: Set of values of test statistic for which H₀ is rejected.

8. Define Latin Square Design (LSD). State its uses.
2M

An experimental design with k² units in k rows and k columns where each treatment appears exactly once in each row and column. Used to eliminate two sources of variation simultaneously.

9. Write the control limits for X̄ chart and R chart.
2M
ChartUCLCLLCL
X̄̄ + A₂R̄X̄̄X̄̄ − A₂R̄
RD₄R̄D₃R̄
10. Distinguish between assignable causes and chance causes in SQC.
2M
Chance CausesAssignable Causes
Always present; random; cannot be eliminatedSpecific, identifiable; can be detected and removed
Small, inevitable variationLarge, avoidable variation
Process under controlProcess out of control
Part B Answer any FIVE — 8M/16M each (5 × 16 = 80 Marks)
11a. Calculate Spearman's rank correlation (X: 10,15,12,17,13,16,24,14,22 | Y: 30,42,45,46,33,34,40,35,39)
8M

Q: Calculate Spearman's rank correlation for the data:
X: 10, 15, 12, 17, 13, 16, 24, 14, 22
Y: 30, 42, 45, 46, 33, 34, 40, 35, 39

XYRank X (Rx)Rank Y (Ry)d = Rx−Ry
10301100
154257-24
124528-636
174679-24
13333211
16346339
24409639
14354400
22398539
Σd²72
n = 9, Σd² = 72 (no ties — all ranks distinct)
ρ = 1 − 6Σd² / [n(n²−1)] = 1 − 6×72 / [9×(81−1)]
= 1 − 432/720 = 1 − 0.6
ρ = 0.4 (Moderate positive correlation)
12a. Bayes' Theorem — Machines A, B, C produce 25%, 35%, 40% output. Defective rates 5%, 4%, 2%. Find P(machine | defective item).
8M

Q: Machines A, B, C produce 25%, 35%, 40% of output respectively. Defective rates: A→5%, B→4%, C→2%. An item drawn at random is found defective. Find probability it came from A, B, or C.

MachineP(M)P(D|M)P(M)·P(D|M)
A0.250.050.0125
B0.350.040.0140
C0.400.020.0080
P(D) = Total0.0345
P(D) = 0.0125 + 0.0140 + 0.0080 = 0.0345
P(A|D) = 0.0125 / 0.0345 = 0.3623 (36.23%)
P(B|D) = 0.0140 / 0.0345 = 0.4058 (40.58%)
P(C|D) = 0.0080 / 0.0345 = 0.2319 (23.19%)
Most likely from Machine B (40.58%)
12b. MGF of Geometric Distribution — P(X=x) = (1/2)^x. Find MGF and mean.
8M

Q: A RV X has PMF P(X=x) = 1/2ˣ, x = 1, 2, 3, … Find the MGF and hence the mean.

M_X(t) = E(e^{tX}) = Σ_{x=1}^{∞} e^{tx} · (1/2^x)
= Σ_{x=1}^{∞} (e^t/2)^x
This is a geometric series with first term r = e^t/2. Converges for |e^t/2| < 1 i.e. t < ln2.
Sum = (e^t/2) / (1 − e^t/2) = e^t / (2 − e^t)
M_X(t) = e^t / (2 − e^t)
Mean: μ'₁ = M'_X(0)
M'_X(t) = [(2−e^t)e^t − e^t(−e^t)] / (2−e^t)²
= [2e^t − e^{2t} + e^{2t}] / (2−e^t)² = 2e^t / (2−e^t)²
M'_X(0) = 2(1) / (2−1)² = 2/1
Mean = E(X) = 2
13. Two-Sample t-Test — Sample 1: n₁=10, x̄₁=15, SS₁=90 | Sample 2: n₂=12, x̄₂=14, SS₂=108. Test at 5%.
16M

Q: Two independent samples are drawn. Test if they come from the same population (α = 5%).
Sample 1: n₁=10, x̄₁=15, Σ(x₁−x̄₁)²=90
Sample 2: n₂=12, x̄₂=14, Σ(x₂−x̄₂)²=108

H₀: μ₁ = μ₂ (no significant difference)
H₁: μ₁ ≠ μ₂ (two-tailed test)
Pooled variance: s_p² = (SS₁ + SS₂) / (n₁+n₂−2) = (90+108) / (10+12−2) = 198/20 = 9.9
s_p = √9.9 = 3.146
SE = s_p × √(1/n₁ + 1/n₂) = 3.146 × √(1/10 + 1/12) = 3.146 × √(0.1833) = 3.146 × 0.4282 = 1.347
t = (x̄₁ − x̄₂) / SE = (15 − 14) / 1.347 = 1/1.347 = 0.742
df = n₁+n₂−2 = 20; t_table (0.05, 20 df, two-tail) = 2.086
t_calc = 0.742 < t_table = 2.086 → Fail to reject H₀
Conclusion: No significant difference between the two population means.
14. Latin Square Design — 4×4 LSD, paddy yield (T=1950, N=16). Construct ANOVA table and test at 5%.
16M

Q: Paddy yield (kg/plot) under 4 treatments in 4×4 LSD. Grand total T = 1950, N = 16. Construct ANOVA table and test at 5%.

CF = T²/N = 1950²/16 = 3,802,500/16 = 237,656.25
TSS = ΣΣy² − CF = (computed from data) = 35.75
SSR (Rows) = (sum of row totals²)/k − CF = 24.75
SSC (Columns) = (sum of col totals²)/k − CF = 2.75
SST (Treatments) = (sum of treatment totals²)/k − CF = 4.25
SSE = TSS − SSR − SSC − SST = 35.75 − 24.75 − 2.75 − 4.25 = 4.0
SourceSSdfMSF_calc
Rows24.7538.25
Columns2.7530.917
Treatments4.2531.4171.417/0.667 = 2.124
Error4.0060.667
Total35.7515
F_table (0.05; df₁=3, df₂=6) = 4.76
F_calc = 2.124 < F_table = 4.76 → Accept H₀
Conclusion: No significant difference among the 4 treatments.
15. X̄ and R Control Charts — 10 samples of n=5. Construct charts; identify out-of-control samples.
16M

Q: The following data gives the means and ranges of 10 samples (n=5 per sample). Construct X̄ and R charts. State which samples are out of control.

Sample12345678910
43.044.040.843.043.543.043.044.042.042.1
R3343323333
ΣX̄ = 428.4, X̄̄ = 428.4/10 = 42.84
ΣR = 30, R̄ = 30/10 = 3.0
For n=5: A₂ = 0.577, D₃ = 0, D₄ = 2.114
X̄ Chart: UCL = 42.84 + 0.577×3.0 = 44.571
CL = 42.84; LCL = 42.84 − 1.731 = 41.109
R Chart: UCL = 2.114×3.0 = 6.342; CL = 3.0; LCL = 0
Sample 3: X̄ = 40.8 < LCL = 41.109 → OUT OF CONTROL
All R values ≤ 4 < UCL=6.342 → R chart: all in control
Action: Investigate Sample 3 — assignable cause present
16a. Normal Distribution — Bulb lifetime N(μ=2040, σ=60). Find count in ranges from 2000 bulbs.
8M

Q: The lifetime of electric bulbs follows N(μ=2040, σ=60) hours. From 2000 bulbs, find: (i) bulbs lasting >2150 hrs, (ii) bulbs lasting <1950 hrs, (iii) bulbs lasting 1920–2160 hrs.

Standardise: z = (X − μ)/σ = (X − 2040)/60
(i) P(X > 2150): z = (2150−2040)/60 = 110/60 = 1.833
P(z > 1.833) = 0.5 − φ(1.83) = 0.5 − 0.4664 = 0.0336
No. of bulbs = 2000 × 0.0336 ≈ 67 bulbs
(ii) P(X < 1950): z = (1950−2040)/60 = −90/60 = −1.5
P(z < −1.5) = 0.5 − φ(1.5) = 0.5 − 0.4332 = 0.0668
No. of bulbs = 2000 × 0.0668 ≈ 134 bulbs
(iii) P(1920 < X < 2160):
z₁ = (1920−2040)/60 = −2.0; z₂ = (2160−2040)/60 = +2.0
P(−2 < z < 2) = 2×φ(2) = 2×0.4772 = 0.9544
No. of bulbs = 2000 × 0.9544 ≈ 1909 bulbs
End Semester Examination — Nov/Dec 2023
U18MAI4201 — Probability & Statistics | Regulation: 2018
ESE Nov/Dec 2023 Max: 100 Marks Time: 3 Hours
Part A Answer ALL — 2M each (10 × 2 = 20 Marks)
1. State the relationship between regression coefficients and correlation coefficient.
2M
r = ±√(b_{yx}·b_{xy})  where  b_{yx} = r·σ_y/σ_x, b_{xy} = r·σ_x/σ_y

Both b values have same sign as r. AM of regression coeff ≥ |r|.

2. Define conditional probability and state multiplication theorem.
2M
P(A|B) = P(A∩B)/P(B), P(B)≠0

Multiplication theorem: P(A∩B) = P(A)·P(B|A) = P(B)·P(A|B)

3. Define CDF of a discrete random variable. State its properties.
2M

F(x) = P(X ≤ x) = Σ_{t≤x} p(t). Properties: 0 ≤ F(x) ≤ 1; F(−∞)=0, F(+∞)=1; non-decreasing; right-continuous.

4. Define expected value and variance of a random variable.
2M
E(X) = Σ x·p(x) [discrete]; Var(X) = E(X²) − [E(X)]²
5. State the conditions for applying the t-test.
2M
  • Population must be normally distributed
  • Sample size small (n < 30) — large n use z-test
  • Population SD unknown (estimated from sample)
  • Two-sample: both populations have equal variances
6. Define null hypothesis and alternate hypothesis.
2M

H₀ (Null hypothesis): Statement of no effect or no difference; assumed true initially.

H₁ (Alternate hypothesis): Accepted when H₀ is rejected; what we are trying to prove.

7. State the advantages of Randomized Block Design (RBD) over CRD.
2M
  • RBD eliminates one source of variation (blocks) → more precise results
  • More efficient when experimental material is heterogeneous
  • Handles missing observations more easily
8. Write the ANOVA table for one-way classification (CRD).
2M
SourceSSdfMSF
BetweenSSCk−1MSCMSC/MSE
Within (Error)SSEN−kMSE
TotalSSTN−1
9. Define c-chart. When is it used?
2M

Control chart for the number of defects per unit. Used when multiple defects can occur in a single item and subgroup size is constant. Based on Poisson distribution.

UCL = c̄ + 3√c̄ | CL = c̄ | LCL = c̄ − 3√c̄ (≥0)
10. Distinguish between p-chart and np-chart.
2M
p-chartnp-chart
Plots fraction (proportion) defectivePlots number of defectives
Used when subgroup size variesUsed when subgroup size is constant
UCL = p̄ ± 3√(p̄q̄/n)UCL = np̄ ± 3√(np̄q̄)
Part B Answer any FIVE — 8M/16M each
11b. Regression from Equations — 4x−5y+33=0 and 20x−9y=107, Var(X)=25. Find x̄, ȳ, r, σ_y.
8M

Q: Two regression lines are 4x − 5y + 33 = 0 and 20x − 9y = 107. Variance of X = 25. Find: x̄, ȳ, r, σ_y.

Step 1 — Find means (both lines pass through (x̄, ȳ)):
4x − 5y = −33 … (1); 20x − 9y = 107 … (2)
(1) × 5: 20x − 25y = −165; Subtract (2): −16y = −272 → ȳ = 17
From (1): 4x = 5(17)−33 = 52 → x̄ = 13
Step 2 — Identify regression lines:
Line of y on x: 4x−5y+33=0 → 5y=4x+33 → y=(4/5)x+33/5; b_{yx}=4/5=0.8
Line of x on y: 20x−9y=107 → 20x=9y+107; b_{xy}=9/20=0.45
Step 3 — Correlation: r = ±√(b_{yx}·b_{xy}) = √(0.8×0.45) = √0.36
r = 0.6 (positive, since both b values are positive)
Step 4 — Find σ_y: b_{yx} = r·σ_y/σ_x; σ_x = √25 = 5
0.8 = 0.6 × σ_y/5 → σ_y = 0.8×5/0.6 = 4/0.6
σ_y = 20/3 ≈ 6.67
15. X̄–R Chart (10 samples n=5) + c-Chart (12 lots of 400 items). Construct both, identify out-of-control.
16M

Part i — X̄–R Chart (n=5 per sample):

Sample12345678910
43493744453751464347
R5657748646
X̄̄ = ΣX̄/10 = 442/10 = 44.2; R̄ = ΣR/10 = 58/10 = 5.8
For n=5: A₂=0.577, D₃=0, D₄=2.114
X̄ Chart: UCL = 44.2 + 0.577×5.8 = 44.2+3.347 = 47.547
CL = 44.2; LCL = 44.2 − 3.347 = 40.853
R Chart: UCL = 2.114×5.8 = 12.261; LCL = 0
Out of control (X̄ chart):
Sample 2: X̄=49 > UCL=47.547 ❌
Sample 3: X̄=37 < LCL=40.853 ❌
Sample 6: X̄=37 < LCL=40.853 ❌
Sample 7: X̄=51 > UCL=47.547 ❌

Part ii — c-Chart (defects per lot, k=12 lots of 400 items):

Number of defects: 24, 36, 35, 25, 25, 30, 42, 52, 20, 16, 20, 24

Σc = 24+36+35+25+25+30+42+52+20+16+20+24 = 349
c̄ = 349/12 = 29.08
UCL = c̄ + 3√c̄ = 29.08 + 3×5.393 = 29.08+16.18 = 45.26
LCL = c̄ − 3√c̄ = 29.08 − 16.18 = 12.90
Lot 8: c=52 > UCL=45.26 → OUT OF CONTROL
End Semester Examination — Nov/Dec 2024
U18MAI4201 — Probability & Statistics | Regulation: 2018
ESE Nov/Dec 2024 Max: 100 Marks Time: 3 Hours
Part A Answer ALL — 2M each (10 × 2 = 20 Marks)
1. Define Karl Pearson's coefficient of correlation. State its limits.
2M
r = Σ(x−x̄)(y−ȳ) / √[Σ(x−x̄)²·Σ(y−ȳ)²]

Measures degree and direction of linear relationship. −1 ≤ r ≤ +1.

2. State the properties of regression coefficients.
2M
  • r = ±√(b_{yx}·b_{xy}); both have same sign as r
  • AM of regression coeff ≥ |r|: (b_{yx}+b_{xy})/2 ≥ |r|
  • If one regression coeff >1, the other must be <1
3. Define mutually exclusive events. Give an example.
2M

Events A and B are mutually exclusive if A∩B = φ. P(A∪B) = P(A)+P(B).

Example: Head (H) and Tail (T) in a single coin toss.

4. Define mean and variance of a Poisson distribution.
2M

P(X=x) = e^{−λ}·λˣ/x!, x=0,1,2,…

Mean = λ   Variance = λ   (Mean = Variance)
5. Write the formula for chi-square test and state the degrees of freedom.
2M
χ² = Σ(O−E)²/E   df = (r−1)(c−1) for r×c contingency table

E_{ij} = (Row_i total × Col_j total) / Grand total

6. Define Type I and Type II errors.
2M
H₀ TrueH₀ False
Reject H₀Type I error (α)Correct (Power)
Accept H₀CorrectType II error (β)
7. State the three principles of experimental design.
2M
  • Replication: Repeat each treatment to estimate experimental error
  • Randomization: Assign treatments randomly to eliminate bias
  • Local Control: Group units into homogeneous blocks
8. Define F-test (variance ratio test).
2M

Tests if two populations have equal variances. H₀: σ₁²=σ₂².

F = s₁²/s₂² (s₁² ≥ s₂²); df₁=n₁−1, df₂=n₂−1
9. Define Statistical Process Control (SPC).
2M

Uses statistical methods to monitor and control a manufacturing process. Key tools: control charts, histograms, Pareto charts. Purpose: detect out-of-control conditions early.

10. Write the control limits for the p-chart.
2M
p̄ = Total defectives / Total inspected   q̄ = 1−p̄
UCL = p̄ + 3√(p̄q̄/n)   CL = p̄   LCL = p̄ − 3√(p̄q̄/n) [≥0]
Part B Answer any FIVE — 8M/16M each
16a. Karl Pearson's Correlation — X (age): 65,67,66,71,67,70,68,69 | Y (BP): 67,68,68,70,64,67,72,70
8M

Q: Find the correlation coefficient between X (age) and Y (BP) for 8 persons:

X: 65, 67, 66, 71, 67, 70, 68, 69
Y: 67, 68, 68, 70, 64, 67, 72, 70

XYx=X−x̄y=Y−ȳxy
6567−2.875−1.253.5948.2661.563
6768−0.875−0.250.2190.7660.063
6668−1.875−0.250.4693.5160.063
71703.1251.755.4699.7663.063
6764−0.875−4.253.7190.76618.063
70672.125−1.25−2.6564.5161.563
68720.1253.750.4690.01614.063
69701.1251.751.9691.2663.063
5435460013.2528.87541.5
x̄ = 543/8 = 67.875; ȳ = 546/8 = 68.25
r = Σxy / √(Σx²·Σy²) = 13.25 / √(28.875 × 41.5)
= 13.25 / √(1198.3125) = 13.25 / 34.616
r ≈ 0.383 (Weak positive correlation)
End Semester Examination — Nov/Dec 2025
U18MAI4201 — Probability & Statistics | Regulation: 2018
ESE Nov/Dec 2025 Max: 100 Marks Time: 3 Hours
Part A Answer ALL — 2M each (10 × 2 = 20 Marks)
1. Define correlation. Distinguish positive and negative correlation.
2M

Statistical measure of the linear relationship between two variables. Range: −1 ≤ r ≤ +1.

  • Positive: Both variables increase together (r > 0)
  • Negative: One increases as other decreases (r < 0)
2. Write the normal equations for fitting a straight line y = a + bx.
2M
Σy = na + bΣx    Σxy = aΣx + bΣx²

Solve simultaneously for a (intercept) and b (slope).

3. State addition theorem of probability for two events.
2M
P(A∪B) = P(A) + P(B) − P(A∩B)

For mutually exclusive events: P(A∪B) = P(A) + P(B)

4. Define Binomial distribution. State its mean and variance.
2M

P(X=x) = ⁿCₓ · pˣ · qⁿ⁻ˣ, x=0,1,...,n; q=1−p

Mean = np    Variance = npq    SD = √(npq)
5. Write the formula for standard normal variate (z-score).
2M
z = (X − μ) / σ

Z ~ N(0,1); z-values read from standard normal table. Area under curve = probability.

6. Define paired t-test. When is it used?
2M

Used when two samples are related (before-after studies, matched pairs).

t = d̄/(s_d/√n), df = n−1    where d = difference in pairs
7. Write the ANOVA table for two-way classification (RBD).
2M
SourceSSdfMSF
BlocksSSRr−1MSRMSR/MSE
TreatmentsSSCk−1MSCMSC/MSE
ErrorSSE(r−1)(k−1)MSE
TotalSSTrk−1
8. Define 3-sigma limits. State their importance.
2M

Control limits set at μ ± 3σ. Points outside = OUT OF CONTROL (false alarm probability = 0.27%).

9. Write the control limits for np-chart.
2M
UCL = np̄ + 3√(np̄q̄)    CL = np̄    LCL = np̄ − 3√(np̄q̄)

np̄ = average defectives per subgroup; q̄ = 1 − p̄.

10. Define quality and quality control.
2M

Quality: Degree to which a product meets specified requirements.

Quality Control: Activities to ensure products conform to quality standards.

Part B Answer any FIVE — 8M/16M each (5 × 16 = 80 Marks)
11a. Karl Pearson's Correlation — n=5: x: 1,2,3,4,5 | y: 1,1,2,4,5
8M
xyxy
11111
21241
32694
44161616
55252525
1513505547
Numerator: nΣxy − ΣxΣy = 5×50 − 15×13 = 55
√[(nΣx²−(Σx)²)(nΣy²−(Σy)²)] = √[(50)(66)] = √3300 = 57.45
r = 55/57.45 ≈ 0.957 (Strong positive correlation)
15. p-Chart — 10 samples of n=50, defectives: 5,4,6,8,7,5,6,8,7,5
16M
Sample12345678910
Defectives5468756875
p=d/n.10.08.12.16.14.10.12.16.14.10
p̄ = 61/(10×50) = 0.122; q̄ = 0.878
UCL = 0.122 + 3×0.04632 = 0.261
CL = 0.122; LCL = max(0, 0.122−0.139) = 0
All sample p-values (max=0.16) within limits. Process is in control.
Continuous Assessment Test I — April 2023
U18MAI4201 — Probability & Statistics | Regulation: 2018
CAT-I Apr 2023 Max: 50 Marks Time: 2 Hours
Part A Answer ALL — 2M each (5 × 2 = 10 Marks)
1. Define Karl Pearson's coefficient of correlation.
2M
r = Σ(x−x̄)(y−ȳ) / √[Σ(x−x̄)²·Σ(y−ȳ)²]

Measures degree and direction of linear relationship. −1 ≤ r ≤ +1.

2. State the properties of regression lines.
2M
  • Both lines pass through (x̄, ȳ)
  • Lines coincide if |r|=1; perpendicular if r=0
  • r = ±√(b_yx · b_xy)
3. Distinguish between rank correlation and Karl Pearson's correlation.
2M
Karl Pearson'sSpearman's Rank
Parametric; requires normalityNon-parametric; no assumption
Works on actual valuesWorks on ranks
4. Write the normal equations for fitting y = a + bx² (parabola).
2M
Σy = na + bΣx²    Σx²y = aΣx² + bΣx⁴
5. State any two properties of correlation coefficient.
2M
  • r is a pure number; −1 ≤ r ≤ +1
  • r is independent of change of origin and scale
  • r = ±1 for perfect linear; r = 0 for no linear relationship
Part B Answer any TWO — 8M/16M each
Q. Regression from Two Equations — 8x−10y+66=0 and 40x−18y=214, Var(X)=9. Find x̄, ȳ, b_yx, b_xy, r, σ_y.
16M
Step 1 — Find means (intersection = (x̄, ȳ)):
(1)×5: 40x−50y=−330; subtract (2): −32y=−544 → ȳ = 17
From (1): 8x=10(17)−66=104 → x̄ = 13
Step 2 — b_yx: y on x: 8x−10y+66=0 → y=(4/5)x+33/5 → b_yx = 0.8
Step 3 — b_xy: x on y: 40x=18y+214 → x=(9/20)y+107/20 → b_xy = 0.45
Step 4 — r: r = √(0.8×0.45) = √0.36
r = 0.6 (positive)
Step 5 — σ_y: b_yx = r·σ_y/σ_x → 0.8 = 0.6×σ_y/3
σ_y = 4
Summary: x̄=13, ȳ=17, b_yx=0.8, b_xy=0.45, r=0.6, σ_y=4
Continuous Assessment Test II — Winter 2024
U18MAI4201 — Probability & Statistics | Regulation: 2018
CAT-II Winter 2024 Max: 50 Marks Time: 2 Hours
Part A MCQ — 1M each (selected questions shown)
MCQ 2. P(53 Mondays in a leap year) = ?
1M
Leap year = 52 weeks + 2 extra days → 7 possible extra day pairs
Pairs with Monday: (Sun,Mon), (Mon,Tue) → 2 pairs
P(53 Mondays) = 2/7
MCQ 3. If f(x) = k(1+x), 2 ≤ x ≤ 5 is a PDF, find k.
1M
∫₂⁵ k(1+x) dx = 1 → k[x + x²/2]₂⁵ = k×13.5 = 1
k = 2/27
MCQ 5. Ram P(hit)=2/5, Sam P(hit)=3/4. Both fire. P(target hit) = ?
1M
P(hit) = 1 − P(both miss) = 1 − (3/5)×(1/4) = 1 − 3/20
P(target hit) = 17/20
MCQ. P(X=x) = x/15 for x=1,2,3,4,5. Find P(X=1 or 2).
1M
P(X=1)=1/15, P(X=2)=2/15
P(X=1 or 2) = 3/15 = 1/5
MCQ. Var(X)=2, Var(Y)=3 (independent). Find Var(3X+4Y).
1M
Var(3X+4Y) = 9×Var(X) + 16×Var(Y) = 9×2 + 16×3 = 18+48
Var(3X+4Y) = 66
Part B Answer any TWO — 8M/16M each
Q. Discrete PMF — P(X=x) given for x=1..7. Find k, P(X<6), P(X≥6), min a s.t. P(X≤a)≥0.51
8M
X1234567
P(X=x)k2k2k3k2k²7k²+k
(i) k: 9k + 10k² = 1 → 10k²+9k−1=0 → k = (−9+11)/20
k = 0.1
P-table: P(1)=0.1, P(2)=0.2, P(3)=0.2, P(4)=0.3, P(5)=0.01, P(6)=0.02, P(7)=0.17
(ii) P(X<6) = 0.81   (iii) P(X≥6) = 0.19   (iv) min a = 4
Q. c-Chart — 15 subgroups, defects: 6,4,9,10,11,12,10,9,15,10,15,20,15,10,12
16M
Σc = 168; k=15; c̄ = 168/15 = 11.2; √c̄ = 3.347
UCL = 11.2 + 3×3.347 = 21.24
CL = 11.2; LCL = 11.2 − 10.04 = 1.16
All counts (max=20 < 21.24; min=4 > 1.16) within limits. Process is in control.
End Semester Examination — Apr/May 2025
U18MAI4201 — Probability & Statistics | Regulation: 2018
ESE Apr/May 2025 Max: 100 Marks Time: 3 Hours
Part A Answer ALL — 2M each (10 × 2 = 20 Marks)
1. Given regression lines 4x+3y=27 and 3x+4y=28. Find the correlation coefficient.
2M
Solve for (x̄, ȳ): 4x̄+3ȳ=27, 3x̄+4ȳ=28 → x̄=3, ȳ=5
b_yx from line 1: 4x+3y=27 → y = (−4/3)x+9 → b_yx = −4/3
b_xy from line 2: 3x+4y=28 → x = (−4/3)y+28/3 → b_xy = −4/3
r = −√(b_yx × b_xy) = −√(16/9)
r = −4/3... check: since |r|≤1, use actual regression identification: r ≈ −0.75
Both lines have negative slopes → negative correlation. r = −√(b_yx·b_xy)
2. State the formula for Spearman's rank correlation coefficient.
2M
ρ = 1 − (6Σd²) / (n(n²−1))

d = difference in ranks of corresponding values; n = number of pairs.

3. State any two properties of the Normal distribution.
2M
  • Bell-shaped, symmetric about the mean μ
  • Mean = Median = Mode; total area under curve = 1
  • 68-95-99.7% rule: μ±σ covers 68%, μ±2σ covers 95%, μ±3σ covers 99.7%
4. Define Type I and Type II errors in hypothesis testing.
2M
  • Type I error (α): Rejecting H₀ when it is actually true (false positive).
  • Type II error (β): Failing to reject H₀ when it is actually false (false negative).
5. A sample of 9 students has mean height 165 cm, SD=9 cm. Test if it differs from population mean 160 cm — give t statistic.
2M
t = (x̄ − μ) / (s/√n) = (165 − 160) / (9/√9) = 5/3
t = 1.667, df = 8
6. State the three basic principles of experimental design.
2M
  • Replication: Repeating the experiment to estimate error
  • Randomisation: Random allocation to eliminate bias
  • Local Control: Reducing experimental error by grouping similar units
7. Is 2×2 Latin Square Design possible? Why?
2M

No. In a 2×2 LSD there are no degrees of freedom for error (df_error = (p−1)(p−2) = 0 for p=2). This means no valid F-test can be conducted.

8. Differentiate between control charts for variables and control charts for attributes.
2M
Variables ChartsAttributes Charts
Measurable data (length, weight)Countable data (defects, defectives)
X̄ chart, R chart, s chartp-chart, np-chart, c-chart, u-chart
9. Define Statistical Process Control (SPC) and its purpose.
2M

SPC is the application of statistical methods to monitor and control a manufacturing process. Purpose: detect and eliminate special (assignable) causes of variation to maintain process in a state of statistical control.

10. Differentiate between a process under control and a process out of control.
2M
  • In control: Only common (chance) causes; all points within control limits; variation is random and predictable.
  • Out of control: Assignable causes present; one or more points outside control limits or non-random patterns visible.
Part B Answer any FIVE — 16M each (5 × 16 = 80 Marks)
11a. Correlation — Study hours vs exam marks: X: 5,8,9,10,5 | Y: 50,80,80,70,75
8M
XYXY
550250252500
880640646400
980720816400
10707001004900
575375255625
37355268529525825
n=5; Numerator = 5×2685 − 37×355 = 13425 − 13135 = 290
√[(5×295−37²)(5×25825−355²)] = √[(1475−1369)(129125−126025)]
= √[106 × 3100] = √328600 = 573.07
r = 290/573.07 ≈ 0.506 (Moderate positive correlation)
11b. Regression — Age (X) vs BP (Y): X: 18,30,40,50,60 | Y: 120,125,130,135,140. Predict BP at age 42.
8M
n=5; ΣX=198, ΣY=650, ΣXY=26540, ΣX²=8504
b = (nΣXY−ΣXΣY)/(nΣX²−(ΣX)²) = (5×26540−198×650)/(5×8504−198²)
= (132700−128700)/(42520−39204) = 4000/3316 ≈ 0.5
a = ȳ − b·x̄ = 130 − 0.5×39.6 = 130 − 19.8 = 110.2
Regression line: Y = 110.2 + 0.5X
At X=42: Y = 110.2 + 0.5×42 = 110.2 + 21 = 131.2 mm Hg
12a. Bayes' Theorem — Machines A(50%), B(30%), C(20%). Defect rates 2%, 3%, 5%. Item found defective — find P(machine | defective).
8M
P(D) = 0.5×0.02 + 0.3×0.03 + 0.2×0.05 = 0.01+0.009+0.01 = 0.029
P(A|D) = (0.5×0.02)/0.029 = 0.01/0.029 ≈ 0.345
P(B|D) = (0.3×0.03)/0.029 = 0.009/0.029 ≈ 0.310
P(C|D) = (0.2×0.05)/0.029 = 0.01/0.029 ≈ 0.345
Machines A and C are equally likely sources of the defective item.
12b. PMF of support calls: P(0)=0.1, P(1)=0.2, P(2)=0.3, P(3)=0.2, P(4)=0.15, P(5)=0.05. Find E(X) and Var(X).
8M
E(X) = 0×0.1+1×0.2+2×0.3+3×0.2+4×0.15+5×0.05
= 0+0.2+0.6+0.6+0.6+0.25 = 2.25
E(X²) = 0+0.2+1.2+1.8+2.4+1.25 = 6.85
Var(X) = E(X²) − [E(X)]² = 6.85 − 2.25² = 6.85 − 5.0625
Var(X) = 1.7875
13a. Exponential Distribution — f(x) = (1/4)e^(−x/4), x≥0. P(lasts >15 hrs) and mean life.
8M
P(X>15) = ∫₁₅^∞ (1/4)e^(−x/4) dx = e^(−15/4) = e^(−3.75)
P(X>15) = e^(−3.75) ≈ 0.0235
For Exponential with param λ=1/4: Mean = 1/λ = 4
Mean life = 4 hours
13b. Normal Distribution — Marks N(60, 16²). Find % scoring >72, between 50–70, and count above 87 in 1000.
8M
μ=60, σ=16
(i) P(X>72): z=(72−60)/16=0.75 → P(z>0.75) = 1−0.7734 = 22.66%
(ii) P(50<X<70): z₁=(50−60)/16=−0.625, z₂=(70−60)/16=0.625
P = P(−0.625<z<0.625) = 2×0.2340 = 46.8%
(iii) P(X>87): z=(87−60)/16=1.6875 → P = 1−0.9545 = 0.0455
Count = 0.0455 × 1000 ≈ 46 students
14a. Two-sample t-test — Method A: 23,40,60,78,80 | Method B: 18,30,60,68,75. Test at 5% significance.
8M
x̄₁ = (23+40+60+78+80)/5 = 281/5 = 56.2
x̄₂ = (18+30+60+68+75)/5 = 251/5 = 50.2
SS₁ = Σ(x−x̄₁)² = 2079.2; SS₂ = Σ(x−x̄₂)² = 2219.2
Sp² = (SS₁+SS₂)/(n₁+n₂−2) = 4298.4/8 = 537.3 → Sp = 23.18
t = (x̄₁−x̄₂) / (Sp√(1/n₁+1/n₂)) = 6/(23.18×√0.4) = 6/14.66
t = 0.409; t_table(8df, 5%) = 2.306 → Fail to reject H₀. No significant difference.
14b. Chi-square test — Gender vs product preference: Male(Prefer 80, Dislike 20) | Female(Prefer 70, Dislike 30). Test independence at 5%.
8M
Total = 200; Expected for each cell = (Row total × Col total)/Grand total
E(Male,Prefer) = 100×150/200=75; E(Male,Dislike)=25; E(Female,Prefer)=75; E(Female,Dislike)=25
χ² = (80−75)²/75 + (20−25)²/25 + (70−75)²/75 + (30−25)²/25
= 25/75 + 25/25 + 25/75 + 25/25 = 0.333+1+0.333+1 = 2.667
χ² = 2.667; χ²_table(1df, 5%) = 3.841 → Fail to reject H₀. Gender and preference are independent.
Continuous Assessment Test II — June 2023
U18MAI4201 — Probability & Statistics | AY 2022-23 | Regulation: 2018
CAT-II Jun 2023 Max: 50 Marks Time: 2 Hours
Part A MCQ / Objective — 1M each (10 × 1 = 10 Marks)
1. Match List — E(9)=9, E(3X+4)=3E(X)+4, Var(9)=0, Var(3x+4)=9Var(x). Find correct sequence.
1M

List-I: A. E(9)=? B. E(3X+4)=? C. Var(9)=? D. Var(3x+4)=?

List-II: i. 3E(X)+4   ii. 9   iii. 9Var(x)   iv. 0

Correct: A-ii, B-i, C-iv, D-iii (sequence: ii, i, iv, iii)
2. Correct sequence of steps for commenting on process state using Control Charts.
1M

1. Calculate CL, LCL, UCL   2. Comment on state of control viewing plotted points   3. Sketch graph with data points on XOY axis   4. Draw CL, LCL, UCL lines.

Correct sequence: 1-4-2-3 (Calculate limits → Draw lines → Comment → Sketch)
3. Assertion: For Normal distribution, mean=0 and variance=1 is called Standard Normal. Reason: Normal with mean=0, variance=1 is called Standard Normal.
1M
Both A and R are true, and R is the correct reason for A.
4. If A and B are independent events, P(A∩B) is equal to:
1M
P(A∩B) = P(A)·P(B) — definition of independence.
5. Range of 10, 20, 40, 50, 129, 20 is:
1M
Range = Max − Min = 129 − 10
Range = 119
6. Write the control limits for the p-chart.
1M
UCL = p̄ + 3√(p̄q̄/n)   CL = p̄   LCL = p̄ − 3√(p̄q̄/n) [≥0]
7. If A and B are mutually exclusive, P(A∩B) = ?
1M
P(A∩B) = 0 — mutually exclusive events cannot occur simultaneously.
8. Which chart is used to plot the actual number of defects per unit?
1M
c-chart — used to monitor the count of defects (nonconformities) per inspection unit.
Part B Answer ALL — ≤40 words each (5 × 2 = 10 Marks)
11. A hits target in 2/5 shots; B hits in 3/4 shots. Both fire. Find P(both hit).
2M
P(both hit) = P(A) × P(B) = (2/5) × (3/4) = 6/20
P(both hit) = 3/10 = 0.3
12. Write the LCL, UCL, CL formula for R-chart.
2M
UCL_R = D₄R̄    CL_R = R̄    LCL_R = D₃R̄

D₃, D₄ are control chart constants depending on sample size n.

13. A die is thrown. X = number that turns up. Find E(X²).
2M
E(X²) = Σx²·P(x) = (1+4+9+16+25+36)/6 = 91/6
E(X²) = 91/6 ≈ 15.17
14. Continuous RV X has PDF f(x)=3x², 0≤x≤1. Find b such that P(X>b)=0.64.
2M
P(X>b) = ∫_b^1 3x² dx = [x³]_b^1 = 1 − b³ = 0.64
b³ = 0.36 → b = ∛0.36
b ≈ 0.711
15. Explain the different types of control charts.
2M
  • Variable charts: X̄-chart (mean), R-chart (range) — for measurable data
  • Attribute charts: p-chart (fraction defective), np-chart (number defective), c-chart (defects per unit), u-chart (defects per unit, variable n)
Part C Answer any THREE — ≤300 words each (3 × 10 = 30 Marks)
16a. Urns I, II, III have (1W,2B,3R), (2W,1B,3R), (4W,5B,3R) balls. Urn chosen at random, 2 balls drawn (both white). Find P(Urn I, II, or III).
10M
P(each urn) = 1/3
P(2W|I) = C(1,2)/C(6,2) = 0/15 = 0 (only 1 white in urn I)
P(2W|II) = C(2,2)/C(6,2) = 1/15
P(2W|III) = C(4,2)/C(12,2) = 6/66 = 1/11
P(2W) = (1/3)[0 + 1/15 + 1/11] = (1/3)[(11+15)/165] = 26/495
P(I|2W) = 0; P(II|2W) = (1/3×1/15)/(26/495) = (33/495)/(26/495)
P(Urn II|2W) = 33/78 ≈ 0.423; P(Urn III|2W) = 45/78 ≈ 0.577
16b. c-Chart — 20 pieces of cloth from different rolls; 8 total imperfections. Check if process is in statistical control.
10M
c̄ = 8/20 = 0.4 (average defects per piece)
UCL = c̄ + 3√c̄ = 0.4 + 3×0.632 = 0.4 + 1.897 = 2.297
CL = 0.4; LCL = max(0, 0.4−1.897) = 0
Check if any individual piece exceeds UCL of 2.297 defects.
16c. Leap year selected at random. Will it contain 53 Sundays? Find probability.
10M
Leap year = 366 days = 52 complete weeks + 2 extra days
Possible extra pairs: (Mon,Tue),(Tue,Wed),(Wed,Thu),(Thu,Fri),(Fri,Sat),(Sat,Sun),(Sun,Mon)
Pairs containing Sunday: (Sat,Sun) and (Sun,Mon) → 2 pairs out of 7
P(53 Sundays in a leap year) = 2/7
17a. Discrete RV with PMF f(x) = k(x+1)/2^x, x=0,1,2,3,... Find k and mean.
10M
Σf(x) = k Σ(x+1)/2^x = 1. Using series: Σ(x+1)/2^x = 4 (for x=0,1,2,...)
4k = 1 → k = 1/4
E(X) = Σ x·(1/4)(x+1)/2^x — computed using moment generating techniques
E(X) = 2 (Mean = 2)
17b. Normal Distribution — Bulbs: mean=800 hrs, SD=40 hrs. P(burn out >834 hrs) and P(778<X<834).
10M
μ=800, σ=40
(i) z=(834−800)/40=0.85 → P(X>834) = 1−P(z<0.85) = 1−0.8023
P(burn out >834 hrs) ≈ 0.1977
(ii) z₁=(778−800)/40=−0.55; z₂=(834−800)/40=0.85
P = P(z<0.85)−P(z<−0.55) = 0.8023−0.2912
P(778<X<834) ≈ 0.5111
18a. X̄ and R Chart — 5 samples each of 5 items. Measurements given. Construct charts and comment on control.
10M
Sample12345
Measurements46,41,40,42,4345,41,40,43,4444,44,40,42,4743,42,43,43,4742,46,41,44,45
x̄ values: 42.4, 42.6, 43.4, 43.6, 43.6 → X̄̄ = 43.12
R values: 6, 5, 7, 5, 5 → R̄ = 5.6
For n=5: A₂=0.577, D₃=0, D₄=2.114
UCL_X̄ = 43.12+0.577×5.6=46.35; LCL_X̄=43.12−3.23=39.89
UCL_R = 2.114×5.6=11.84; LCL_R=0
All sample means and ranges within limits → Process is in control.
19a. CDF F(x) = 1 − (1+x)e^(−x), x>0. Find the PDF f(x).
10M
f(x) = F'(x) = d/dx [1 − (1+x)e^(−x)]
= −[e^(−x) + (1+x)(−e^(−x))] = −e^(−x) + (1+x)e^(−x)
= e^(−x)[−1+1+x] = xe^(−x)
f(x) = xe^(−x), x>0 (Gamma distribution with α=2, β=1)
19c. c-Chart — 12 lots of 400 items, defectives: 24,16,36,25,25,30,42,52,20,16,20,24. Plot and comment.
10M
Σc = 24+16+36+25+25+30+42+52+20+16+20+24 = 330
c̄ = 330/12 = 27.5; √c̄ = 5.244
UCL = 27.5 + 3×5.244 = 27.5 + 15.73 = 43.23
CL = 27.5; LCL = 27.5 − 15.73 = 11.77
Lot 7 (42) — within UCL barely; Lot 8 (52) — EXCEEDS UCL of 43.23
Lot 8 (52 defects) exceeds UCL (43.23). Process is OUT OF CONTROL. Investigate lot 8.