U18MAI4201 — Probability & Statistics

Unit 1: Correlation & Regression

Karl Pearson's Coefficient • Spearman's Rank Correlation • Regression Lines • Regression Coefficients

2-Mark Q&A 10 Questions

1. Define Correlation. HOT

Correlation is a statistical measure that expresses the strength and direction of the linear relationship between two variables X and Y.

If both variables change in the same direction → Positive correlation (r > 0)
If variables change in opposite directions → Negative correlation (r < 0)
No relationship → Zero correlation (r = 0)
Range: −1 ≤ r ≤ +1

2. Define Karl Pearson's Coefficient of Correlation. Give its formula. HOT

Karl Pearson's coefficient of correlation (r) measures the degree of linear association between two variables X and Y.

Computation Formula (preferred for calculation):

r = (n·ΣXY − ΣX·ΣY) / √[(n·ΣX² − (ΣX)²)(n·ΣY² − (ΣY)²)]

n = number of pairs of observations
Range: −1 ≤ r ≤ +1; r = ±1 means perfect correlation
r is dimensionless (no units)

3. Define Spearman's Rank Correlation Coefficient. Give its formula. HOT

Spearman's Rank Correlation (ρ) measures correlation when data is qualitative or when the exact values are replaced by ranks.

ρ = 1 − (6·Σd²) / (n(n²−1))

d = difference in ranks of corresponding values (d = R₁ − R₂)
n = number of pairs
Range: −1 ≤ ρ ≤ +1
Used when data is ordinal or distribution is non-normal

4. State the properties of Karl Pearson's Correlation Coefficient.

Range: −1 ≤ r ≤ +1
r = +1: Perfect positive linear correlation
r = −1: Perfect negative linear correlation
r = 0: No linear correlation
r is independent of change of origin and scale
r is symmetric: rXY = rYX
r is a pure number (dimensionless)
|r| > 0.7 → High correlation; 0.5 < |r| ≤ 0.7 → Moderate; |r| ≤ 0.5 → Low

5. What are Regression Lines? Why are there two regression lines? HOT

Regression lines are lines of best fit that describe the relationship between two variables.

Line 1 — Regression of Y on X: Used to estimate Y for a given X. Minimises sum of squared errors in the Y direction.
Line 2 — Regression of X on Y: Used to estimate X for a given Y. Minimises sum of squared errors in the X direction.

There are two lines because each minimises a different type of error. They coincide only when r = ±1 (perfect correlation).

6. Define Regression Coefficients. Give their formulas. HOT

b_yx (regression coefficient of Y on X): b_yx = r · (σ_y/σ_x) or b_yx = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²)
b_xy (regression coefficient of X on Y): b_xy = r · (σ_x/σ_y) or b_xy = (n·ΣXY − ΣX·ΣY) / (n·ΣY² − (ΣY)²)

r² = b_yx × b_xy ⟹ r = ±√(b_yx · b_xy)

Both regression coefficients must have the same sign as r

7. When do the two Regression Lines coincide?

The two regression lines coincide (become the same line) when r = +1 or r = −1 (perfect correlation). In this case there is a perfect linear relationship between X and Y — knowing one variable completely determines the other, so both estimation lines are identical.

8. What is the Coefficient of Determination? HOT

The coefficient of determination is r². It measures the proportion of the total variation in Y that is explained by the linear regression on X.

r² = 0.81 means 81% of variation in Y is explained by X
Range: 0 ≤ r² ≤ 1
Closer to 1 → better fit of regression line

9. Give the equations of the two Regression Lines. HOT

Regression of Y on X:

y − ȳ = b_yx(x − x̄)

Regression of X on Y:

x − x̄ = b_xy(y − ȳ)

Both lines pass through the point (x̄, ȳ) — the means of X and Y.

10. What is the formula for Spearman's Rank Correlation when ranks are tied? HOT

When two or more observations have the same value (tied ranks), each tied observation gets the average of the positions they would have occupied. A correction factor is added:

ρ = 1 − 6[Σd² + Σm(m²−1)/12] / [n(n²−1)]

where m = number of observations tied at a particular rank. Add one correction factor for each group of tied ranks.

8-Mark Topics

1. Karl Pearson's Coefficient of Correlation — Theory & Numerical 8M

Definition & Formula

Karl Pearson's correlation coefficient r is the ratio of the covariance of X and Y to the product of their standard deviations.

Definition form:

r = Cov(X,Y) / (σ_x · σ_y) = Σ(X−X̄)(Y−Ȳ) / √[Σ(X−X̄)² · Σ(Y−Ȳ)²]

Computation form (use this in exam):

r = (n·ΣXY − ΣX·ΣY) / √[(n·ΣX² − (ΣX)²)(n·ΣY² − (ΣY)²)]

Properties

−1 ≤ r ≤ +1 (always)
r = +1: Perfect positive linear correlation
r = −1: Perfect negative linear correlation
r = 0: No linear relationship (variables may still be related non-linearly)
Independent of origin & scale: If U = (X−a)/h and V = (Y−b)/k, then r(X,Y) = r(U,V)
Symmetric: r_XY = r_YX
r is a pure number (no units)

Worked Numerical Example

Find the correlation coefficient between X and Y:

X	Y	X²	Y²	XY
1	5	1	25	5
2	4	4	16	8
3	3	9	9	9
4	2	16	4	8
5	1	25	1	5
ΣX=15	ΣY=15	ΣX²=55	ΣY²=55	ΣXY=35

n = 5, ΣX=15, ΣY=15, ΣX²=55, ΣY²=55, ΣXY=35

n·ΣXY − ΣX·ΣY = 5×35 − 15×15 = 175 − 225 = −50

n·ΣX² − (ΣX)² = 5×55 − 225 = 275 − 225 = 50

n·ΣY² − (ΣY)² = 5×55 − 225 = 50

√(50 × 50) = √2500 = 50

r = −50 / 50 = −1.0 (Perfect negative correlation)

Change of Origin & Scale Method (Shortcut)

When values are large, substitute U = X − A, V = Y − B (or U = (X−A)/h etc.) to simplify calculations. The value of r remains unchanged.

Tip: In exams, always make a table with columns X, Y, X², Y², XY and sum all columns. Then substitute directly into the formula.

2. Spearman's Rank Correlation — Theory & Numerical 8M

When to Use Spearman's Rank Correlation?

Data is qualitative (e.g., rankings of beauty, intelligence)
Data does not follow normal distribution
Data has extreme outliers (rank correlation is more robust)
Actual values are not available but ranks are given

Formula

Without tied ranks: ρ = 1 − (6·Σd²) / (n(n²−1))

With tied ranks: ρ = 1 − 6[Σd² + m₁(m₁²−1)/12 + m₂(m₂²−1)/12 + ...] / [n(n²−1)]

Tied ranks: assign each tied item the mean of their ranks
m = number of items tied at a rank

Worked Numerical Example

10 students are ranked in Mathematics (X) and Physics (Y). Find Spearman's ρ:

Student	X (Math)	Y (Phys)	d = X−Y	d²
A	1	3	−2	4
B	2	5	−3	9
C	3	4	−1	1
D	4	1	3	9
E	5	2	3	9
F	6	7	−1	1
G	7	9	−2	4
H	8	6	2	4
I	9	8	1	1
J	10	10	0	0
Σd²				42

n = 10, Σd² = 42

ρ = 1 − 6×42 / (10×(100−1))

ρ = 1 − 252 / (10×99)

ρ = 1 − 252/990

ρ = 1 − 0.2545 = 0.7455 (High positive correlation)

Tied Ranks Example

If 3 values are tied at positions 4, 5, 6 → each gets rank = (4+5+6)/3 = 5. Correction = m(m²−1)/12 = 3(9−1)/12 = 2. Add this to Σd² in numerator.

3. Lines of Regression — Theory & Numerical 8M

Two Lines of Regression

Regression of Y on X (estimate Y given X):

y − ȳ = b_yx(x − x̄) where b_yx = r·(σ_y/σ_x)

Regression of X on Y (estimate X given Y):

x − x̄ = b_xy(y − ȳ) where b_xy = r·(σ_x/σ_y)

Key Relationships & Properties

Both lines pass through (x̄, ȳ)
r² = b_yx · b_xy → r = ±√(b_yx · b_xy)
b_yx and b_xy have the same sign as r
If r = 0 → lines are perpendicular to each other
If r = ±1 → lines coincide
AM of regression coefficients ≥ |r|: (b_yx + b_xy)/2 ≥ |r|

Angle Between the Two Regression Lines

tan θ = [(1−r²)/r] · [σ_x·σ_y/(σ_x²+σ_y²)]

θ = 0° when r = ±1 (lines coincide)
θ = 90° when r = 0 (lines perpendicular)

Worked Numerical Example

Given data: n=5, ΣX=25, ΣY=30, ΣX²=145, ΣY²=220, ΣXY=158. Find the two regression lines.

x̄ = ΣX/n = 25/5 = 5 ȳ = ΣY/n = 30/5 = 6

b_yx = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²)

b_yx = (5×158 − 25×30) / (5×145 − 625)

b_yx = (790 − 750) / (725 − 625) = 40/100 = 0.4

b_xy = (n·ΣXY − ΣX·ΣY) / (n·ΣY² − (ΣY)²)

b_xy = 40 / (5×220 − 900) = 40/(1100−900) = 40/200 = 0.2

r = √(b_yx × b_xy) = √(0.4 × 0.2) = √0.08 ≈ 0.283

Regression of Y on X: y − 6 = 0.4(x − 5) → y = 0.4x + 4

Regression of X on Y: x − 5 = 0.2(y − 6) → x = 0.2y + 3.8

To estimate Y when X=7: y = 0.4(7) + 4 = 6.8
To estimate X when Y=8: x = 0.2(8) + 3.8 = 5.4

Unit 2: Probability & Random Variables

Axioms • Conditional Probability • Total Probability • Bayes' Theorem • PMF • PDF • CDF • Moments • MGF

2-Mark Q&A 10 Questions

1. State the Axioms of Probability (Kolmogorov's Axioms). HOT

For a sample space S and any event A:

Axiom 1 (Non-negativity): P(A) ≥ 0
Axiom 2 (Certainty): P(S) = 1 (probability of the entire sample space is 1)
Axiom 3 (Additivity): If A and B are mutually exclusive (A∩B = ∅), then P(A∪B) = P(A) + P(B)

From these, all other probability rules are derived.

2. Define Conditional Probability. HOT

The conditional probability of event A given that event B has already occurred is:

P(A|B) = P(A∩B) / P(B) , provided P(B) > 0

Restricts the sample space to event B
Similarly, P(B|A) = P(A∩B) / P(A)
Multiplication rule: P(A∩B) = P(A|B)·P(B) = P(B|A)·P(A)

3. State the Theorem of Total Probability. HOT

If B₁, B₂, ..., Bₙ are mutually exclusive and exhaustive events (a partition of S) with P(Bᵢ) > 0, then for any event A:

P(A) = P(A|B₁)·P(B₁) + P(A|B₂)·P(B₂) + ... + P(A|Bₙ)·P(Bₙ) = Σᵢ P(A|Bᵢ)·P(Bᵢ)

This is used to compute P(A) by conditioning on the partition events.

4. State Bayes' Theorem. HOT

If B₁, B₂, ..., Bₙ form a partition of S, and A is any event with P(A) > 0, then:

P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / [Σᵢ P(A|Bᵢ)·P(Bᵢ)]

P(Bₖ) = Prior probability (before observing A)
P(Bₖ|A) = Posterior probability (after observing A)
Used to update probability based on new evidence

5. Define a Random Variable. HOT

A random variable X is a real-valued function defined on a sample space S that assigns a numerical value to each outcome of a random experiment.

Discrete RV: Takes countable values (e.g., number of heads in 3 tosses: 0, 1, 2, 3)
Continuous RV: Takes any value in a continuous interval (e.g., height, temperature)

6. Define Probability Mass Function (PMF). State its properties. HOT

The PMF of a discrete random variable X is a function p(x) such that:

p(x) = P(X = x) for each value x that X can take
Property 1: p(x) ≥ 0 for all x
Property 2: Σ p(x) = 1 (sum over all possible values)
P(a ≤ X ≤ b) = Σ p(x) for x in [a, b]

7. Define Probability Density Function (PDF). State its properties. HOT

The PDF of a continuous random variable X is a function f(x) such that:

Property 1: f(x) ≥ 0 for all x
Property 2: ∫₋∞^∞ f(x) dx = 1
P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
Note: P(X = a) = 0 for any specific value a (continuous distribution)

8. Define the Distribution Function (CDF) and state its properties.

The Cumulative Distribution Function F(x) = P(X ≤ x) for all x ∈ ℝ.

0 ≤ F(x) ≤ 1
F(−∞) = 0 and F(+∞) = 1
F(x) is non-decreasing: if a < b then F(a) ≤ F(b)
F(x) is right-continuous: F(x⁺) = F(x)
For continuous RV: f(x) = dF(x)/dx
P(a ≤ X ≤ b) = F(b) − F(a)

9. Define Moments of a Random Variable. HOT

r-th Raw Moment (about origin): μ'ᵣ = E(Xʳ)
r-th Central Moment (about mean): μᵣ = E[(X − μ)ʳ] where μ = E(X)
μ'₁ = E(X) = Mean
μ₂ = E[(X−μ)²] = Variance = E(X²) − [E(X)]²
μ₃ used for skewness; μ₄ for kurtosis
Relation: μ₂ = μ'₂ − (μ'₁)²

10. Define the Moment Generating Function (MGF). HOT

The MGF of a random variable X is:

M_X(t) = E(e^tX) = Σ e^tx·p(x) (discrete) or ∫ e^tx·f(x) dx (continuous)

The r-th moment: E(Xʳ) = d^rM_X(t)/dt^r evaluated at t=0
M_X(0) = 1 always
Uniqueness: if two RVs have the same MGF, they have the same distribution

8-Mark Topics

1. Bayes' Theorem — Theory, Proof & Numerical 8M

Statement

Let B₁, B₂, ..., Bₙ be mutually exclusive and exhaustive events with P(Bᵢ) > 0. If A is any event with P(A) > 0, then:

P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / Σᵢ[P(A|Bᵢ)·P(Bᵢ)] for k = 1, 2, ..., n

Proof

By definition of conditional probability:

P(Bₖ|A) = P(Bₖ∩A) / P(A) ... (i)

P(Bₖ∩A) = P(A|Bₖ)·P(Bₖ) ... (ii) [multiplication rule]

By Total Probability Theorem:

P(A) = Σᵢ P(A|Bᵢ)·P(Bᵢ) ... (iii)

Substituting (ii) and (iii) into (i):

P(Bₖ|A) = P(A|Bₖ)·P(Bₖ) / Σᵢ[P(A|Bᵢ)·P(Bᵢ)] ∎

Numerical Example

Three machines I, II, III produce 50%, 30%, 20% of total output. The percentage of defectives are 3%, 4%, 5% respectively. An item is selected and found defective. Find the probability it came from machine II.

Let B₁=Machine I, B₂=Machine II, B₃=Machine III, A=Defective

P(B₁)=0.5, P(B₂)=0.3, P(B₃)=0.2

P(A|B₁)=0.03, P(A|B₂)=0.04, P(A|B₃)=0.05

P(A) = 0.5×0.03 + 0.3×0.04 + 0.2×0.05

P(A) = 0.015 + 0.012 + 0.010 = 0.037

P(B₂|A) = P(A|B₂)·P(B₂) / P(A)

P(B₂|A) = (0.04 × 0.3) / 0.037 = 0.012/0.037 ≈ 0.3243

There is a 32.43% probability the defective item came from Machine II.

2. Random Variables — PMF, PDF, CDF & Numerical 8M

Discrete RV — PMF Example

A RV X has PMF p(x) = kx for x = 1,2,3,4. Find k, P(X≤2), Mean, Variance.

Σp(x)=1: k(1+2+3+4)=1 → 10k=1 → k=0.1

PMF: p(1)=0.1, p(2)=0.2, p(3)=0.3, p(4)=0.4

P(X≤2) = p(1)+p(2) = 0.1+0.2 = 0.3

E(X) = 1×0.1 + 2×0.2 + 3×0.3 + 4×0.4 = 0.1+0.4+0.9+1.6 = 3.0

E(X²) = 1×0.1 + 4×0.2 + 9×0.3 + 16×0.4 = 0.1+0.8+2.7+6.4 = 10.0

Var(X) = E(X²)−[E(X)]² = 10 − 9 = 1.0

Continuous RV — PDF Example

A RV X has PDF f(x) = kx² for 0 ≤ x ≤ 1, 0 otherwise. Find k, P(0.2 ≤ X ≤ 0.5), Mean, Variance.

∫₀¹ kx² dx = 1 → k[x³/3]₀¹ = 1 → k/3 = 1 → k = 3

f(x) = 3x² for 0 ≤ x ≤ 1

P(0.2≤X≤0.5) = ∫₀.₂^0.5 3x² dx = [x³]₀.₂^0.5 = 0.125 − 0.008 = 0.117

E(X) = ∫₀¹ x·3x² dx = 3∫₀¹ x³ dx = 3[x⁴/4]₀¹ = 3/4 = 0.75

E(X²) = ∫₀¹ x²·3x² dx = 3∫₀¹ x⁴ dx = 3/5 = 0.6

Var(X) = E(X²)−[E(X)]² = 0.6 − 0.5625 = 0.0375

CDF from PDF

For f(x) = 3x² on [0,1]: F(x) = ∫₀ˣ 3t² dt = x³ for 0 ≤ x ≤ 1

Check: F(0)=0, F(1)=1 ✓

P(X > 0.7) = 1 − F(0.7) = 1 − 0.343 = 0.657

3. Moments & Moment Generating Function (MGF) 8M

Raw Moments & Central Moments

Moment	Discrete	Continuous	Meaning
μ'₁ (1st raw)	Σx·p(x)	∫x·f(x)dx	Mean (μ)
μ'₂ (2nd raw)	Σx²·p(x)	∫x²·f(x)dx	E(X²)
μ₂ (2nd central)	E[(X−μ)²] = μ'₂ − (μ'₁)²		Variance (σ²)
μ₃ (3rd central)	E[(X−μ)³]		Skewness
μ₄ (4th central)	E[(X−μ)⁴]		Kurtosis

Key relations:

μ₂ = μ'₂ − (μ'₁)² | μ₃ = μ'₃ − 3μ'₂μ'₁ + 2(μ'₁)³ | μ₄ = μ'₄ − 4μ'₃μ'₁ + 6μ'₂(μ'₁)² − 3(μ'₁)⁴

Moment Generating Function (MGF)

M_X(t) = E(e^tX)

Expanding e^tX as a series: e^tX = 1 + tX + t²X²/2! + t³X³/3! + ...

So: M_X(t) = 1 + tμ'₁ + t²μ'₂/2! + t³μ'₃/3! + ...

Therefore: μ'ᵣ = [d^rM_X(t)/dt^r]_t=0

MGF Numerical Example

X has PMF: p(0)=1/4, p(1)=1/2, p(2)=1/4. Find MGF and first two moments.

M_X(t) = Σ e^(tx)·p(x) = e^0·(1/4) + e^t·(1/2) + e^(2t)·(1/4)

M_X(t) = 1/4 + (1/2)e^t + (1/4)e^(2t)

M'_X(t) = (1/2)e^t + (1/2)e^(2t)

μ'₁ = M'_X(0) = 1/2 + 1/2 = 1 → Mean = 1

M''_X(t) = (1/2)e^t + e^(2t)

μ'₂ = M''_X(0) = 1/2 + 1 = 3/2 → Var = 3/2 − 1² = 1/2

Properties of MGF

M_X(0) = 1 (always)
If Y = aX + b, then M_Y(t) = e^bt·M_X(at)
If X and Y are independent: M_X+Y(t) = M_X(t)·M_Y(t)
MGF uniquely determines the distribution

Unit 3: Normal Distribution

Normal Distribution • PDF • Properties • Area Rule • Moments • Moment Generating Function

2-Mark Q&A 10 Questions

1. Define Normal Distribution. HOT

A continuous random variable X is said to follow a Normal Distribution with mean μ and variance σ² (written X ~ N(μ, σ²)) if its PDF is:

f(x) = (1 / (σ√(2π))) · exp[−(x−μ)² / (2σ²)], −∞ < x < ∞

Parameters: μ (mean), σ² (variance), σ (standard deviation)
Bell-shaped, symmetric about x = μ
Also called the Gaussian Distribution

2. Define Standard Normal Distribution. HOT

The Standard Normal Distribution is a special case of the normal distribution with mean = 0 and variance = 1, denoted Z ~ N(0,1).

φ(z) = (1/√(2π)) · exp(−z²/2), −∞ < z < ∞

Standardisation: Z = (X − μ) / σ

Probabilities are found using the Standard Normal Table (z-table).

3. State the properties of the Normal Distribution. HOT

Mean = Median = Mode = μ (perfectly symmetric)
Curve is bell-shaped and symmetric about x = μ
Total area under curve = 1
The curve is asymptotic to the x-axis (never touches)
Skewness β₁ = 0; Kurtosis β₂ = 3 (mesokurtic)
All odd central moments = 0
Linear combination of independent normal RVs is also normal

4. State the Area (68-95-99.7) Rule for Normal Distribution. HOT

For X ~ N(μ, σ²):

Interval	Probability	Percentage
μ − σ to μ + σ	P(μ−σ < X < μ+σ)	68.27%
μ − 2σ to μ + 2σ	P(μ−2σ < X < μ+2σ)	95.45%
μ − 3σ to μ + 3σ	P(μ−3σ < X < μ+3σ)	99.73%

5. What is the MGF of Normal Distribution? HOT

The Moment Generating Function of X ~ N(μ, σ²) is:

M_X(t) = exp(μt + σ²t²/2)

For Standard Normal Z ~ N(0,1):

M_Z(t) = exp(t²/2)

This is used to derive all moments of the normal distribution.

6. Write the Central Moments of Normal Distribution. HOT

For X ~ N(μ, σ²):

μ₁ = 0 (all odd moments = 0 due to symmetry)
μ₂ = σ² (variance)
μ₃ = 0 (zero skewness)
μ₄ = 3σ⁴
In general: μ₂ₙ₊₁ = 0 and μ₂ₙ = 1·3·5···(2n−1)·σ²ⁿ

β₁ = μ₃²/μ₂³ = 0 (Skewness) | β₂ = μ₄/μ₂² = 3 (Kurtosis)

7. What are the Raw Moments of Normal Distribution?

The raw moments are obtained by differentiating the MGF M_X(t) = e^{μt + σ²t²/2}:

μ'₁ = M'_X(0) = μ (Mean)
μ'₂ = M''_X(0) = μ² + σ² (Second raw moment)
Var(X) = μ'₂ − (μ'₁)² = (μ²+σ²) − μ² = σ² ✓

8. How do you find P(a < X < b) for a Normal Distribution? HOT

Step 1: Convert to standard normal: Z = (X−μ)/σ

P(a < X < b) = P((a−μ)/σ < Z < (b−μ)/σ) = Φ(z₂) − Φ(z₁)

where Φ(z) = P(Z ≤ z) is read from the Standard Normal table.

P(Z < 0) = 0.5 (by symmetry)
P(Z < −z) = 1 − P(Z < z) = 1 − Φ(z)

9. State the reproductive property of Normal Distribution.

If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²) are independent, then:

X₁ + X₂ ~ N(μ₁+μ₂, σ₁²+σ₂²)

More generally: a₁X₁ + a₂X₂ ~ N(a₁μ₁+a₂μ₂, a₁²σ₁²+a₂²σ₂²)
This is called the reproductive (additive) property of normal distribution.

10. What is the point of inflection of the Normal curve? HOT

The normal curve has two points of inflection (where the curve changes concavity) at:

x = μ − σ and x = μ + σ

At these points, the curve changes from concave up to concave down (or vice versa). The perpendicular distance from the mean to each inflection point equals one standard deviation (σ).

8-Mark Topics

1. Normal Distribution — PDF, Properties & MGF Derivation 8M

Probability Density Function

PDF of Normal Distribution X ~ N(μ, σ²):

f(x) = (1/σ√(2π)) · e^{−(x−μ)²/(2σ²)}, −∞ < x < ∞

μ = mean (location parameter) — shifts curve left/right
σ = standard deviation (scale parameter) — controls spread
Larger σ → flatter, wider curve; Smaller σ → taller, narrower

Properties of Normal Distribution

Symmetry: f(μ+x) = f(μ−x) — symmetric about x = μ
Mean = Median = Mode = μ
Maximum: Curve is maximum at x = μ, maximum value = 1/(σ√(2π))
Asymptotic: Curve approaches x-axis but never touches it
Inflection Points: At x = μ ± σ
Area = 1: ∫₋∞^∞ f(x) dx = 1
Moments: All odd central moments = 0; μ₂ = σ², μ₄ = 3σ⁴
Kurtosis β₂ = 3 (mesokurtic — neither flat nor peaked)

MGF Derivation

M_X(t) = E(e^(tX)) = ∫₋∞^∞ e^(tx) · (1/σ√(2π)) · e^(−(x−μ)²/(2σ²)) dx

Combine exponents: e^(tx) · e^(−(x−μ)²/(2σ²)) = e^[tx − (x−μ)²/(2σ²)]

Complete the square in x: tx − (x−μ)²/(2σ²)

= −(1/2σ²)[x² − 2(μ + σ²t)x + μ²]

= −(1/2σ²)[(x−(μ+σ²t))² − (μ+σ²t)² + μ²]

= −(x−(μ+σ²t))²/(2σ²) + μt + σ²t²/2

So M_X(t) = e^(μt+σ²t²/2) · ∫₋∞^∞ (1/σ√(2π))·e^(−(x−(μ+σ²t))²/(2σ²)) dx

The integral = 1 (it's a normal PDF with mean μ+σ²t)

∴ M_X(t) = e^(μt + σ²t²/2)

Deriving Moments from MGF

M_X(t) = e^(μt + σ²t²/2)

M'_X(t) = (μ + σ²t) · e^(μt + σ²t²/2)

μ'₁ = M'_X(0) = μ · e^0 = μ ✓

M''_X(t) = σ²·e^(μt+σ²t²/2) + (μ+σ²t)²·e^(μt+σ²t²/2)

μ'₂ = M''_X(0) = σ² + μ²

Var(X) = μ'₂ − (μ'₁)² = σ² + μ² − μ² = σ² ✓

2. Normal Distribution — Finding Probabilities (Numerical) 8M

Standard Normal Table Usage

The z-table gives Φ(z) = P(Z ≤ z) for Z ~ N(0,1). Key symmetry rules:

P(Z ≤ 0) = 0.5
P(Z ≤ −z) = 1 − P(Z ≤ z) = 1 − Φ(z)
P(a ≤ Z ≤ b) = Φ(b) − Φ(a)
P(Z ≥ z) = 1 − Φ(z)

Worked Example 1

X ~ N(50, 100) [i.e. μ=50, σ=10]. Find (i) P(X < 65), (ii) P(40 < X < 60), (iii) P(X > 72).

(i) P(X<65): Z = (65−50)/10 = 1.5

P(X<65) = P(Z<1.5) = Φ(1.5) = 0.9332

(ii) P(40<X<60): Z₁=(40−50)/10=−1, Z₂=(60−50)/10=1

P(40<X<60) = Φ(1) − Φ(−1) = 0.8413 − 0.1587 = 0.6826

(iii) P(X>72): Z = (72−50)/10 = 2.2

P(X>72) = 1 − Φ(2.2) = 1 − 0.9861 = 0.0139

Worked Example 2 — Finding the Value Given Probability

X ~ N(30, 25) [μ=30, σ=5]. Find x₀ such that P(X > x₀) = 0.05.

P(X > x₀) = 0.05 → P(X ≤ x₀) = 0.95

→ P(Z ≤ z₀) = 0.95 → z₀ = 1.645 (from z-table)

z₀ = (x₀ − μ)/σ → 1.645 = (x₀ − 30)/5

x₀ = 30 + 1.645 × 5 = 30 + 8.225 = 38.225

Worked Example 3 — Normal Approximation

In an exam, scores are normally distributed with mean 70 and SD 15. If 500 students appeared, how many scored between 55 and 85?

Z₁ = (55−70)/15 = −1.0 Z₂ = (85−70)/15 = +1.0

P(55<X<85) = P(−1<Z<1) = Φ(1) − Φ(−1)

= 0.8413 − 0.1587 = 0.6826

Number of students = 500 × 0.6826 ≈ 341 students

Common z-values to memorize:
z = 1.28 → P = 0.90 | z = 1.645 → P = 0.95 | z = 1.96 → P = 0.975 | z = 2.33 → P = 0.99 | z = 2.576 → P = 0.995

3. Moments of Normal Distribution — All Central & Raw Moments 8M

Central Moments Using MGF

For standard normal Z ~ N(0,1), M_Z(t) = e^(t²/2). Expanding as a power series:

M_Z(t) = e^(t²/2) = 1 + t²/2 + (t²/2)²/2! + ... = Σ t^(2k)/(2^k · k!)

From the series: coefficient of t^r/r! gives μ'_r

μ'₁ = 0 (Mean of Z)

μ'₂ = 1 (Variance of Z = 1)

μ'₃ = 0 (all odd moments = 0)

μ'₄ = 3 (from coefficient of t⁴: 3/8 × 4! = 3)

Moment	For N(0,1)	For N(μ,σ²)
μ₁ (mean)	0	μ
μ₂ (variance)	1	σ²
μ₃	0	0
μ₄	3	3σ⁴
β₁ = μ₃²/μ₂³	0	0 (symmetric)
β₂ = μ₄/μ₂²	3	3 (mesokurtic)

General formula: For N(μ, σ²), the (2n)th central moment = 1·3·5···(2n−1)·σ²ⁿ = (2n)!/(2ⁿ·n!) · σ²ⁿ

Unit 4: Testing of Hypothesis

t-test (Single Mean, Difference, Paired) • F-test (Variance Ratio) • Chi-Square Test

2-Mark Q&A 10 Questions

1. Define Null and Alternative Hypothesis. HOT

Null Hypothesis (H₀): A statement of no difference or no effect. It is the hypothesis being tested. Assumed true until evidence suggests otherwise. Example: H₀: μ = 50
Alternative Hypothesis (H₁ or Hₐ): The claim accepted if H₀ is rejected. Example: H₁: μ ≠ 50 (two-tailed) or H₁: μ > 50 (one-tailed)

2. Define Level of Significance (α) and Critical Region. HOT

Level of Significance (α): The probability of rejecting H₀ when it is actually true (Type I error probability). Common values: α = 0.05 (5%) or α = 0.01 (1%)
Critical Region (Rejection Region): The set of values of the test statistic for which H₀ is rejected.
Critical Value: The boundary value separating acceptance and rejection regions.

3. Define Type I and Type II errors. HOT

Error	When?	Probability	Name
Type I (α)	Reject H₀ when H₀ is TRUE	α (LOS)	False Positive
Type II (β)	Accept H₀ when H₀ is FALSE	β	False Negative

Power of the test = 1 − β = P(reject H₀ | H₁ is true)

4. Define t-distribution. When is it used? HOT

Student's t-distribution is a probability distribution used for small samples (n < 30) when population standard deviation σ is unknown and the population is normally distributed.

t = (X̄ − μ) / (s/√n) ~ t(n−1 df)

where s = sample standard deviation = √[Σ(xᵢ−x̄)²/(n−1)]

5. What is a Paired t-test? When is it used? HOT

Paired t-test is used when two related samples are compared — the same subjects measured twice (before/after), or matched pairs.

t = d̄ / (s_d/√n) ~ t(n−1 df)

d = difference for each pair (d = x₁ − x₂)
d̄ = mean of differences
s_d = standard deviation of differences

6. Define F-distribution. What is the F-test used for? HOT

The F-distribution is the ratio of two independent chi-square variates divided by their degrees of freedom.

F = s₁² / s₂² (put larger variance in numerator)

Used for Variance Ratio Test — testing if two populations have equal variances
df = (n₁−1, n₂−1)
F ≥ 1 always (numerator has larger variance)
Also used in ANOVA

7. Define Chi-Square test. State its uses. HOT

χ² = Σ (O − E)² / E

where O = Observed frequency, E = Expected frequency.

Test for Independence: Tests if two attributes are independent (contingency table)
Goodness of Fit: Tests if observed data fits a theoretical distribution
df for independence = (r−1)(c−1)
df for goodness of fit = n−1 (or n−k−1 if k parameters estimated)

8. Distinguish between one-tailed and two-tailed tests. HOT

Feature	One-tailed	Two-tailed
H₁	μ > μ₀ or μ < μ₀	μ ≠ μ₀
Critical region	One side only	Both sides
Critical value (α=0.05)	t = ±1.645	t = ±1.96
Use	Direction known	Direction unknown

9. State the formula for t-test for difference of two means (independent samples).

For testing H₀: μ₁ = μ₂ with two independent samples:

t = (X̄₁ − X̄₂) / [s_p · √(1/n₁ + 1/n₂)]

where the pooled variance:

s_p² = [(n₁−1)s₁² + (n₂−1)s₂²] / (n₁+n₂−2)

Degrees of freedom = n₁ + n₂ − 2

10. State the conditions for applying Chi-Square test. HOT

Observations must be independent
Total frequency N must be reasonably large (N ≥ 50)
No expected frequency should be less than 5. If E < 5, merge adjacent classes (pooling)
Data must be in frequencies (not percentages or ratios)
Sample must be drawn by random sampling

8-Mark Topics

1. t-Test: Single Mean & Difference of Means — Theory & Numericals 8M

t-Test for Single Mean

Tests whether sample mean x̄ differs significantly from hypothesised population mean μ₀.

H₀: μ = μ₀ | Test Statistic: t = (X̄ − μ₀) / (s/√n) ~ t_(n−1)

s = √[Σ(xᵢ−x̄)²/(n−1)] or s = √[(Σxᵢ² − n·x̄²)/(n−1)]

Rejection rule: Reject H₀ if |t_calc| > t_table(α, n−1 df)

Numerical Example — Single Mean

A sample of 10 observations has mean 52 and SD 8. Test H₀: μ = 50 at 5% level of significance.

n=10, x̄=52, s=8, μ₀=50, α=0.05

t = (x̄ − μ₀)/(s/√n) = (52−50)/(8/√10) = 2/(8/3.162) = 2/2.530 = 0.790

df = n−1 = 9

t_table(0.05, 9df, two-tailed) = 2.262

|t_calc| = 0.790 < 2.262 → Do NOT reject H₀

Conclusion: There is no significant difference between sample mean and population mean at 5% level.

t-Test for Difference of Two Means (Independent Samples)

t = (X̄₁ − X̄₂) / [s_p√(1/n₁+1/n₂)] where s_p² = [(n₁−1)s₁²+(n₂−1)s₂²]/(n₁+n₂−2)

Numerical: n₁=8, x̄₁=14.5, s₁²=4 | n₂=10, x̄₂=12.8, s₂²=3.5. Test H₀: μ₁=μ₂ at 5%.

s_p² = [(8−1)×4 + (10−1)×3.5]/(8+10−2) = [28+31.5]/16 = 59.5/16 = 3.719

s_p = √3.719 = 1.929

√(1/8+1/10) = √(0.125+0.1) = √0.225 = 0.474

t = (14.5−12.8)/(1.929×0.474) = 1.7/0.914 = 1.86

df = 8+10−2 = 16 t_table(0.05, 16df) = 2.120

|t_calc|=1.86 < 2.120 → Do NOT reject H₀ (no significant difference)

2. Paired t-Test — Theory & Numerical 8M

Paired t-Test Procedure

Compute d = x₁ − x₂ for each pair
Compute d̄ = Σd/n
Compute s_d = √[Σd²/n − (d̄)²] or = √[Σ(d−d̄)²/(n−1)]
t = d̄/(s_d/√n) with df = n−1

s_d² = (Σd² − n·d̄²)/(n−1) → this is the shortcut

Numerical Example — Paired t-Test

A drug is given to 8 patients. Blood pressure (BP) is recorded before and after. Test if the drug reduces BP at 5% LOS.

Patient	Before (x₁)	After (x₂)	d=x₁−x₂	d²
1	145	138	7	49
2	152	143	9	81
3	138	136	2	4
4	160	150	10	100
5	148	140	8	64
6	130	128	2	4
7	155	146	9	81
8	142	137	5	25
Sum			52	408

n=8, Σd=52, Σd²=408

d̄ = 52/8 = 6.5

s_d² = [Σd² − n·(d̄)²]/(n−1) = [408 − 8×42.25]/7 = [408−338]/7 = 70/7 = 10

s_d = √10 = 3.162

t = d̄/(s_d/√n) = 6.5/(3.162/√8) = 6.5/(3.162/2.828) = 6.5/1.118 = 5.814

df = n−1 = 7, t_table(0.05, 7df, one-tailed) = 1.895

|t_calc|=5.814 > 1.895 → REJECT H₀

Conclusion: The drug significantly reduces blood pressure at 5% level.

3. F-Test (Variance Ratio Test) — Theory & Numerical 8M

F-Test Procedure

Tests H₀: σ₁² = σ₂² (two population variances are equal).

F = s₁²/s₂² (always put larger variance in numerator, so F ≥ 1)

df₁ = n₁ − 1 (numerator), df₂ = n₂ − 1 (denominator)
Reject H₀ if F_calc > F_table(df₁, df₂) at α level

Numerical Example — F-Test

Sample 1: n₁=10, s₁²=28.5 | Sample 2: n₂=14, s₂²=12.6. Test equality of variances at 5%.

H₀: σ₁² = σ₂² H₁: σ₁² ≠ σ₂²

Since s₁² > s₂², put s₁² in numerator

F = s₁²/s₂² = 28.5/12.6 = 2.262

df₁ = 10−1 = 9, df₂ = 14−1 = 13

F_table(9, 13) at 5% = 2.71 [from F-table]

F_calc = 2.262 < 2.71 → Do NOT reject H₀ (variances are equal)

4. Chi-Square Test — Independence & Goodness of Fit 8M

Chi-Square Test for Independence of Attributes

χ² = Σ (O−E)²/E ~ χ²_{(r−1)(c−1)}

Expected frequency: E_ij = (Row i total × Column j total) / Grand Total N

Numerical: 200 patients classified by gender and recovery:

Gender / Recovery	Recovered	Not Recovered	Total
Male	70	30	100
Female	60	40	100
Total	130	70	200

E(Male, Recovered) = 100×130/200 = 65

E(Male, Not) = 100×70/200 = 35

E(Female, Recovered) = 100×130/200 = 65

E(Female, Not) = 100×70/200 = 35

χ² = (70−65)²/65 + (30−35)²/35 + (60−65)²/65 + (40−35)²/35

= 25/65 + 25/35 + 25/65 + 25/35 = 0.385+0.714+0.385+0.714 = 2.198

df = (2−1)(2−1) = 1, χ²_table(0.05, 1df) = 3.841

χ²_calc = 2.198 < 3.841 → Do NOT reject H₀ (attributes are independent)

2×2 Shortcut Formula: χ² = N(ad−bc)² / [(a+b)(c+d)(a+c)(b+d)]
Here: N=200, a=70, b=30, c=60, d=40
χ² = 200×(70×40−30×60)² / (100×100×130×70) = 200×(2800−1800)² / 91000000 = 200×1000000/91000000 ≈ 2.198 ✓

Chi-Square Goodness of Fit Test

Tests whether observed data follows a specified theoretical distribution.

Example: A die is thrown 120 times. Test if the die is fair (uniform distribution expected).

Face	1	2	3	4	5	6
Observed (O)	25	17	15	23	24	16
Expected (E)	20	20	20	20	20	20

χ² = (25−20)²/20 + (17−20)²/20 + (15−20)²/20 + (23−20)²/20 + (24−20)²/20 + (16−20)²/20

= 25/20 + 9/20 + 25/20 + 9/20 + 16/20 + 16/20

= (25+9+25+9+16+16)/20 = 100/20 = 5.0

df = 6−1 = 5, χ²_table(0.05, 5df) = 11.07

χ²_calc = 5.0 < 11.07 → Do NOT reject H₀ (die is fair)

Unit 5: Design of Experiments

ANOVA (One-Way & Two-Way) • CRD • RBD • LSD — Full ANOVA Tables with Numericals

2-Mark Q&A 10 Questions

1. Define Analysis of Variance (ANOVA). HOT

ANOVA is a statistical technique used to test whether the means of three or more populations are equal by analysing the variation in data and partitioning it into components.

Total Variation = Variation due to treatments + Variation due to error (random)
Uses the F-statistic (ratio of treatment variance to error variance)
H₀: μ₁ = μ₂ = ... = μₖ (all treatment means are equal)

2. State the basic assumptions of ANOVA. HOT

Normality: Each population follows normal distribution
Homogeneity of variance: All populations have the same variance (σ²)
Independence: Observations are independent of each other
Additivity: Effects are additive (no interaction, in two-way)

3. What is Completely Randomized Design (CRD)? HOT

CRD is the simplest experimental design where treatments are assigned completely at random to all experimental units. It controls only one source of variation (treatment).

Used when experimental units are homogeneous
One-way ANOVA is applied
df: Treatment = k−1, Error = N−k, Total = N−1 (k = number of treatments, N = total observations)

4. What is Randomized Block Design (RBD)? HOT

RBD is a design where experimental units are grouped into homogeneous blocks (to control one extraneous variable), and treatments are randomly assigned within each block.

Controls for one extraneous variable (block effect)
Two-way ANOVA (without interaction) applied
df: Treatment = k−1, Block = b−1, Error = (k−1)(b−1), Total = kb−1

5. What is Latin Square Design (LSD)? HOT

LSD is a design that controls for two extraneous variables simultaneously (rows and columns), with treatments arranged so each appears exactly once in each row and column.

p×p square: p treatments, p rows, p columns
df: Treatment = p−1, Row = p−1, Column = p−1, Error = (p−1)(p−2), Total = p²−1
More efficient than RBD when two blocking factors exist

6. What is the Correction Factor (CF)? Write its formula. HOT

The Correction Factor (CF) is used to simplify ANOVA calculations:

CF = T² / N = (Grand Total)² / (Total number of observations)

TSS = ΣΣ x²ᵢⱼ − CF (Total Sum of Squares)
SST = Σ (Tᵢ²/nᵢ) − CF (Sum of Squares due to Treatments)
SSE = TSS − SST (Error Sum of Squares)

7. Write the ANOVA table for CRD (One-Way). HOT

Source	SS	df	MS	F
Treatment	SST	k−1	MST=SST/(k−1)	MST/MSE
Error	SSE	N−k	MSE=SSE/(N−k)	—
Total	TSS	N−1	—	—

Reject H₀ if F_calc > F_table(k−1, N−k) at α level.

8. Write the ANOVA table for RBD (Two-Way without Interaction). HOT

Source	SS	df	MS	F
Treatment	SST	k−1	MST	MST/MSE
Block	SSB	b−1	MSB	MSB/MSE
Error	SSE	(k−1)(b−1)	MSE	—
Total	TSS	kb−1	—	—

9. Compare CRD, RBD and LSD. HOT

Feature	CRD	RBD	LSD
Blocking	None	One direction	Two directions
Extraneous variables	0	1	2
Error df	N−k	(k−1)(b−1)	(p−1)(p−2)
Efficiency	Lowest	Medium	Highest
ANOVA type	One-way	Two-way	Two-way+

10. Write the formulae for SSB and SSC in Latin Square Design.

For a p×p LSD with grand total T and N=p²:

CF = T²/N = T²/p²

SSR (Rows) = (1/p)·Σ Rᵢ² − CF where Rᵢ = sum of row i

SSC (Columns) = (1/p)·Σ Cⱼ² − CF where Cⱼ = sum of column j

SST (Treatments) = (1/p)·Σ Tₖ² − CF where Tₖ = sum of treatment k

SSE = TSS − SSR − SSC − SST

8-Mark Topics

1. One-Way ANOVA (CRD) — Theory & Full Numerical 8M

Key Formulas for One-Way ANOVA

CF = T²/N (T = Grand Total, N = total observations)

TSS = ΣΣ xᵢⱼ² − CF

SST = Σ(Tᵢ²/nᵢ) − CF (Tᵢ = treatment total, nᵢ = treatment size)

SSE = TSS − SST

MST = SST/(k−1), MSE = SSE/(N−k)

F = MST/MSE ~ F(k−1, N−k)

Worked Numerical — CRD

Three fertilisers are applied to crops in 4 plots each. Yields (kg) are: A: 14,16,18,12 | B: 10,12,14,10 | C: 20,18,22,16. Test at 5% if yields differ.

	A	B	C
Obs 1	14	10	20
Obs 2	16	12	18
Obs 3	18	14	22
Obs 4	12	10	16
Total (Tᵢ)	60	46	76

k=3, N=12, Grand Total T=60+46+76=182

CF = T²/N = 182²/12 = 33124/12 = 2760.33

TSS = (14²+16²+18²+12²+10²+12²+14²+10²+20²+18²+22²+16²) − CF

= (196+256+324+144+100+144+196+100+400+324+484+256) − 2760.33

= 2924 − 2760.33 = 163.67

SST = (60²/4 + 46²/4 + 76²/4) − CF = (900+529+1444) − 2760.33 = 2873 − 2760.33 = 112.67

SSE = TSS − SST = 163.67 − 112.67 = 51.00

MST = 112.67/(3−1) = 112.67/2 = 56.33

MSE = 51.00/(12−3) = 51.00/9 = 5.67

F = 56.33/5.67 = 9.93

F_table(2, 9) at 5% = 4.26

F_calc=9.93 > 4.26 → REJECT H₀ — Significant difference in yields

Source	SS	df	MS	F
Treatment	112.67	2	56.33	9.93*
Error	51.00	9	5.67	—
Total	163.67	11	—	—

* Significant at 5% level. Fertiliser type significantly affects yield.

2. Two-Way ANOVA (RBD) — Theory & Full Numerical 8M

Key Formulas for RBD

k = number of treatments, b = number of blocks, N = kb

CF = T²/N

TSS = ΣΣ xᵢⱼ² − CF

SST (Treatment) = (1/b)·ΣTᵢ² − CF (Tᵢ = treatment column total)

SSB (Block) = (1/k)·ΣBⱼ² − CF (Bⱼ = block row total)

SSE = TSS − SST − SSB

df: Treatment=k−1, Block=b−1, Error=(k−1)(b−1), Total=N−1

Worked Numerical — RBD

Four varieties of rice (T1,T2,T3,T4) tested in 3 blocks. Yield data (quintals):

Block\Treatment	T1	T2	T3	T4	Block Total (Bⱼ)
B1	25	23	20	27	95
B2	22	19	18	24	83
B3	28	26	23	29	106
Treat Total (Tᵢ)	75	68	61	80	T=284

k=4 treatments, b=3 blocks, N=12

CF = 284²/12 = 80656/12 = 6721.33

TSS = (25²+22²+28²+23²+19²+26²+20²+18²+23²+27²+24²+29²) − CF

= (625+484+784+529+361+676+400+324+529+729+576+841) − 6721.33

= 6858 − 6721.33 = 136.67

SST = (75²+68²+61²+80²)/3 − CF = (5625+4624+3721+6400)/3 − 6721.33

= 20370/3 − 6721.33 = 6790 − 6721.33 = 68.67

SSB = (95²+83²+106²)/4 − CF = (9025+6889+11236)/4 − 6721.33

= 27150/4 − 6721.33 = 6787.5 − 6721.33 = 66.17

SSE = 136.67 − 68.67 − 66.17 = 1.83

MST=68.67/3=22.89, MSB=66.17/2=33.09, MSE=1.83/6=0.305

F(treatment)=22.89/0.305=75.05, F(block)=33.09/0.305=108.5

F_table(3,6)=4.76, F_table(2,6)=5.14 at 5%

Both F values >> table → Significant treatment AND block effects

Source	SS	df	MS	F
Treatment	68.67	3	22.89	75.05*
Block	66.17	2	33.09	108.5*
Error	1.83	6	0.305	—
Total	136.67	11	—	—

3. Latin Square Design (LSD) — Theory & Full Numerical 8M

Structure & Key Formulas

In a p×p LSD, each treatment appears exactly once in each row and each column.

CF = T²/p² (T = Grand Total)

TSS = ΣΣ xᵢⱼ² − CF

SSR = (1/p)·ΣRᵢ² − CF (Row sums)

SSC = (1/p)·ΣCⱼ² − CF (Column sums)

SST = (1/p)·ΣTₖ² − CF (Treatment sums)

SSE = TSS − SSR − SSC − SST

df: Row=p−1, Col=p−1, Treatment=p−1, Error=(p−1)(p−2), Total=p²−1

Worked Numerical — 3×3 LSD

A 3×3 Latin Square experiment on crop yield (A, B, C = three fertilisers):

\	Col 1	Col 2	Col 3	Row Total
Row 1	A=17	B=14	C=12	R₁=43
Row 2	B=13	C=11	A=16	R₂=40
Row 3	C=10	A=18	B=15	R₃=43
Col Total	C₁=40	C₂=43	C₃=43	T=126

Treatment totals: A=17+16+18=51, B=14+13+15=42, C=12+11+10=33

p=3, N=9, T=126, CF=126²/9=15876/9=1764

TSS=(17²+14²+12²+13²+11²+16²+10²+18²+15²)−CF

=(289+196+144+169+121+256+100+324+225)−1764=1824−1764=60

SSR=(43²+40²+43²)/3−CF=(1849+1600+1849)/3−1764=5298/3−1764=1766−1764=2

SSC=(40²+43²+43²)/3−CF=(1600+1849+1849)/3−1764=5298/3−1764=2

SST=(51²+42²+33²)/3−CF=(2601+1764+1089)/3−1764=5454/3−1764=1818−1764=54

SSE=60−2−2−54=2

MST=54/2=27, MSE=2/2=1, F=27/1=27

F_table(2,2) at 5% = 19.00

F_calc=27 > 19 → REJECT H₀ — Fertiliser effect is significant

Source	SS	df	MS	F
Row	2	2	1	1
Column	2	2	1	1
Treatment	54	2	27	27*
Error	2	2	1	—
Total	60	8	—	—

Unit 6: Statistical Quality Control

Process Control • X̄ & R Charts • p-chart • np-chart • c-chart — Control Limit Formulas + Numericals

2-Mark Q&A 10 Questions

1. Define Statistical Quality Control (SQC). HOT

SQC is the application of statistical methods to monitor and control a manufacturing process to ensure the product meets quality standards.

Distinguishes between chance (common) causes and assignable (special) causes of variation
Main tool: Control Charts (Shewhart charts)
Goal: Keep process in a state of statistical control

2. What is a Control Chart? Name its main components. HOT

A control chart is a graph that plots a quality characteristic over time with three horizontal lines:

UCL — Upper Control Limit (3σ above centre)
CL — Centre Line (process mean)
LCL — Lower Control Limit (3σ below centre)

If all points lie within UCL and LCL → process is in control. Any point outside → out of control (assignable cause present).

3. Distinguish between Variable and Attribute Control Charts. HOT

Variable Charts	Attribute Charts
For measurable characteristics (length, weight)	For counted characteristics (defects, defectives)
X̄-chart, R-chart	p-chart, np-chart, c-chart
Used when continuous measurement possible	Used for go/no-go, pass/fail data

4. Write the control limits for X̄-chart and R-chart. HOT

Chart	UCL	CL	LCL
X̄-chart	X̄̄ + A₂·R̄	X̄̄ (Grand Mean)	X̄̄ − A₂·R̄
R-chart	D₄·R̄	R̄ (Mean Range)	D₃·R̄

X̄̄ = mean of sample means, R̄ = mean of sample ranges
A₂, D₃, D₄ are control chart constants depending on subgroup size n
Common constants for n=5: A₂=0.577, D₃=0, D₄=2.115

5. Write the control limits for p-chart (fraction defective). HOT

The p-chart monitors the proportion (fraction) of defective items in a sample.

p̄ = Total defectives / Total items inspected = Σdᵢ / Σnᵢ

UCL	CL	LCL
p̄ + 3√(p̄(1−p̄)/n)	p̄	p̄ − 3√(p̄(1−p̄)/n) (min 0)

If LCL is negative, take LCL = 0.

6. Write the control limits for np-chart (number defective). HOT

The np-chart monitors the number of defective items (used when sample size n is constant).

np̄ = Total defectives / Number of samples = Σdᵢ/k

UCL	CL	LCL
np̄ + 3√(np̄(1−p̄))	np̄	np̄ − 3√(np̄(1−p̄)) (min 0)

where p̄ = np̄/n.

7. Write the control limits for c-chart (number of defects). HOT

The c-chart monitors the number of defects per unit (based on Poisson distribution).

c̄ = Total defects / Number of units = Σcᵢ/k

UCL	CL	LCL
c̄ + 3√c̄	c̄	c̄ − 3√c̄ (min 0)

Difference from p/np: c-chart counts defects (not defectives); one item can have multiple defects

8. Distinguish between p-chart and c-chart. HOT

p-chart	c-chart
Fraction/proportion of defective items	Number of defects per unit
Based on Binomial distribution	Based on Poisson distribution
Sample size can vary	Unit of inspection is constant
Example: % rejected bolts per batch	Example: scratches per car door

9. What are chance causes and assignable causes of variation? HOT

Chance (Common) Causes: Natural, unavoidable variation inherent in any process. Process is still "in control". Cannot be eliminated without redesigning the process. Example: minor machine vibration, raw material variation.
Assignable (Special) Causes: Specific, identifiable causes that push the process out of control. Can and should be identified and eliminated. Example: worn tool, untrained operator, faulty material batch.

10. Give the control chart constants A₂, D₃, D₄ for n=4 and n=5. HOT

n	A₂	D₄
2	1.880	3.267
3	1.023	2.574
4	0.729	2.282
5	0.577	2.115
6	0.483	2.004

Note: D₃=0 for n≤6 means LCL of R-chart = 0 (no lower limit needed).

8-Mark Topics

1. X̄-Chart and R-Chart — Theory & Full Numerical 8M

Control Limits Summary

X̄-chart: CL = X̄̄ | UCL = X̄̄ + A₂·R̄ | LCL = X̄̄ − A₂·R̄

R-chart: CL = R̄ | UCL = D₄·R̄ | LCL = D₃·R̄

Worked Numerical — X̄ and R Charts (n=5)

10 samples of size 5 are taken. Sample means (X̄) and ranges (R) are:

Sample	1	2	3	4	5	6	7	8	9	10
X̄	42	45	41	43	46	44	40	43	45	41
R	5	6	4	7	6	5	4	6	5	4

ΣX̄ = 42+45+41+43+46+44+40+43+45+41 = 430

X̄̄ = 430/10 = 43.0 (Grand Mean)

ΣR = 5+6+4+7+6+5+4+6+5+4 = 52

R̄ = 52/10 = 5.2 (Mean Range)

For n=5: A₂=0.577, D₃=0, D₄=2.115

X̄-chart: CL=43.0, UCL=43+0.577×5.2=43+3.0=46.0, LCL=43−3.0=40.0

R-chart: CL=5.2, UCL=2.115×5.2=11.0, LCL=0×5.2=0

Check: Sample 5 has X̄=46 = UCL exactly (boundary — watch carefully). All R values between 0 and 11. Process is IN CONTROL.

2. p-Chart (Fraction Defective) — Theory & Full Numerical 8M

When to Use p-chart

Quality characteristic is attribute (defective / non-defective)
Sample size n can be variable or constant
Based on Binomial distribution

p̄ = Σdᵢ / Σnᵢ | UCL = p̄+3√(p̄q̄/n) | LCL = p̄−3√(p̄q̄/n) | q̄=1−p̄

Worked Numerical — p-Chart (Constant n)

10 batches of 100 items each inspected. Number of defectives:

Batch	1	2	3	4	5	6	7	8	9	10
Defectives (d)	7	5	4	8	6	3	9	5	4	9

n=100, k=10, Σd=7+5+4+8+6+3+9+5+4+9=60

p̄ = 60/(10×100) = 60/1000 = 0.06

q̄ = 1 − 0.06 = 0.94

√(p̄q̄/n) = √(0.06×0.94/100) = √(0.0564/100) = √0.000564 = 0.02375

UCL = 0.06 + 3×0.02375 = 0.06 + 0.0713 = 0.1313

LCL = 0.06 − 0.0713 = −0.0113 → take LCL = 0

CL=0.06, UCL=0.1313, LCL=0

Batch proportions: 0.07, 0.05, 0.04, 0.08, 0.06, 0.03, 0.09, 0.05, 0.04, 0.09 — all within [0, 0.1313]. Process IN CONTROL.

3. c-Chart (Number of Defects) & np-Chart — Theory & Numerical 8M

c-Chart — Number of Defects per Unit

c̄ = Σcᵢ/k | UCL = c̄+3√c̄ | CL = c̄ | LCL = c̄−3√c̄ (min 0)

Numerical: Number of defects (scratches) found in 10 car panels:

Panel	1	2	3	4	5	6	7	8	9	10
Defects (c)	4	3	6	2	5	4	7	3	4	2

k=10, Σc=4+3+6+2+5+4+7+3+4+2=40

c̄ = 40/10 = 4.0

√c̄ = √4 = 2.0

UCL = 4 + 3×2 = 4+6 = 10

LCL = 4 − 3×2 = 4−6 = −2 → take LCL = 0

CL=4, UCL=10, LCL=0

All 10 panels have defects between 0 and 10. Process is IN CONTROL.

np-Chart — Number Defective (Constant n)

np̄ = Σdᵢ/k | UCL = np̄+3√(np̄·q̄) | LCL = np̄−3√(np̄·q̄)

Use np-chart instead of p-chart when sample size n is constant and you prefer plotting the actual count (not fraction). Both charts give the same conclusions — np-chart is simpler to compute.

For the p-chart example above (n=100, p̄=0.06, q̄=0.94):

np̄ = 100×0.06 = 6.0

√(np̄·q̄) = √(6×0.94) = √5.64 = 2.375

UCL = 6 + 3×2.375 = 6+7.125 = 13.125

LCL = 6 − 7.125 = −1.125 → take LCL = 0 | CL=6

Quick Summary — Which chart to use?
Measurement data → X̄ & R charts
Proportion defective, variable n → p-chart
Count defective, fixed n → np-chart
Count defects per unit → c-chart

Q1. The following data gives the experience (in years) and salary (in thousands) of 6 employees. Find Karl Pearson's correlation coefficient. 13M TYPE

Experience (X): 5, 3, 7, 2, 8, 6 | Salary (Y): 40, 30, 55, 20, 60, 45

X	Y	X²	Y²	XY
5	40	25	1600	200
3	30	9	900	90
7	55	49	3025	385
2	20	4	400	40
8	60	64	3600	480
6	45	36	2025	270
31	250	187	11550	1465

n=6, ΣX=31, ΣY=250, ΣX²=187, ΣY²=11550, ΣXY=1465

n·ΣXY − ΣX·ΣY = 6×1465 − 31×250 = 8790 − 7750 = 1040

n·ΣX² − (ΣX)² = 6×187 − 961 = 1122 − 961 = 161

n·ΣY² − (ΣY)² = 6×11550 − 62500 = 69300 − 62500 = 6800

√(161 × 6800) = √1094800 ≈ 1046.33

r = 1040/1046.33 ≈ 0.994 (Very high positive correlation)

Strong positive correlation — as experience increases, salary increases significantly.

Q2. Calculate Spearman's rank correlation for the following data on marks in two subjects. Marks in Maths: 78, 89, 56, 45, 90, 70. Marks in Science: 84, 92, 60, 48, 88, 75. 8M

Math (X)	Science (Y)	Rank X (R₁)	Rank Y (R₂)	d=R₁−R₂
78	84	3	3	0
89	92	2	2	0
56	60	5	5	0
45	48	6	6	0
90	88	1	1	0
70	75	4	4	0
Σd²

n=6, Σd²=0

ρ = 1 − 6×0 / (6×35) = 1 − 0 = 1

ρ = 1.0 — Perfect positive rank correlation

Q3. From the following data, find (i) the two regression equations (ii) estimate Y when X=20 (iii) estimate X when Y=25. n=5, X̄=10, Ȳ=14, σx=3, σy=4, r=0.8. 8M

b_yx = r·(σy/σx) = 0.8 × (4/3) = 0.8 × 1.333 = 1.067

b_xy = r·(σx/σy) = 0.8 × (3/4) = 0.8 × 0.75 = 0.6

Regression of Y on X: y − 14 = 1.067(x − 10)

→ y = 1.067x − 10.67 + 14 = 1.067x + 3.33

Regression of X on Y: x − 10 = 0.6(y − 14)

→ x = 0.6y − 8.4 + 10 = 0.6y + 1.6

When X=20: y = 1.067×20 + 3.33 = 21.34 + 3.33 = 24.67

When Y=25: x = 0.6×25 + 1.6 = 15 + 1.6 = 16.6

Verify r: r = √(b_yx × b_xy) = √(1.067 × 0.6) = √0.64 = 0.8 ✓

Q1. In a bolt factory, machines A, B, C produce 25%, 35%, 40% of the total production. Of their outputs, 5%, 4%, 2% are defective bolts. A bolt is drawn at random and found defective. Find the probability it was produced by machine A. 8M

P(A)=0.25, P(B)=0.35, P(C)=0.40

P(D|A)=0.05, P(D|B)=0.04, P(D|C)=0.02

P(D) = 0.25×0.05 + 0.35×0.04 + 0.40×0.02

P(D) = 0.0125 + 0.0140 + 0.0080 = 0.0345

P(A|D) = P(D|A)·P(A)/P(D) = (0.05×0.25)/0.0345

P(A|D) = 0.0125/0.0345 = 0.3623 (36.23%)

Q2. A random variable X has the PDF f(x) = cx(2−x) for 0 ≤ x ≤ 2, zero otherwise. Find c, F(x), P(X < 1), Mean and Variance. 13M TYPE

∫₀² cx(2−x)dx = 1

c∫₀²(2x−x²)dx = c[x²−x³/3]₀² = c[4 − 8/3] = c[4/3] = 4c/3 = 1

c = 3/4

f(x) = (3/4)x(2−x) = (3/4)(2x−x²)

F(x) = ∫₀ˣ (3/4)(2t−t²)dt = (3/4)[t²−t³/3]₀ˣ = (3/4)(x²−x³/3) = 3x²/4 − x³/4

P(X<1) = F(1) = 3/4 − 1/4 = 2/4 = 0.5

E(X) = ∫₀² x·(3/4)(2x−x²)dx = (3/4)∫₀²(2x²−x³)dx

= (3/4)[2x³/3−x⁴/4]₀² = (3/4)[16/3−4] = (3/4)(4/3) = 1

E(X²) = ∫₀² x²·(3/4)(2x−x²)dx = (3/4)[x⁴/2−x⁵/5]₀² = (3/4)[8−32/5] = (3/4)(8/5) = 6/5

Mean = 1, Var = E(X²)−[E(X)]² = 6/5 − 1 = 1/5 = 0.2

Q3. A discrete RV X has PMF: P(X=x) = (1/2)^x for x=1,2,3,... Find the MGF and hence the Mean. 8M

M_X(t) = Σₓ₌₁^∞ e^(tx)·(1/2)^x = Σ(e^t/2)^x

= (e^t/2)/(1 − e^t/2) [geometric series, valid for |e^t/2|<1, i.e., t<ln2]

M_X(t) = e^t/(2−e^t)

M'_X(t) = [e^t(2−e^t) − e^t(−e^t)] / (2−e^t)²

= [2e^t − e^(2t) + e^(2t)] / (2−e^t)²

= 2e^t / (2−e^t)²

Mean = M'_X(0) = 2×1/(2−1)² = 2/1 = 2

Q1. The marks of 1000 students in an examination follow a normal distribution with mean 70 and standard deviation 10. Find the number of students who scored (i) less than 55, (ii) between 60 and 80, (iii) more than 90. 8M

X ~ N(70, 100), μ=70, σ=10, N=1000

(i) P(X<55): Z=(55−70)/10 = −1.5

P(X<55) = P(Z<−1.5) = 1−Φ(1.5) = 1−0.9332 = 0.0668

Students: 1000×0.0668 = 67 students

(ii) P(60<X<80): Z₁=(60−70)/10=−1, Z₂=(80−70)/10=+1

P = Φ(1)−Φ(−1) = 0.8413−0.1587 = 0.6826

Students: 1000×0.6826 = 683 students

(iii) P(X>90): Z=(90−70)/10=2.0

P(X>90) = 1−Φ(2.0) = 1−0.9772 = 0.0228

Students: 1000×0.0228 = 23 students

Q2. Find the MGF of Normal Distribution X ~ N(μ, σ²) and hence find its mean and variance. 8M

M_X(t) = E(e^tX) = ∫₋∞^∞ e^tx · (1/σ√2π)·e^{−(x−μ)²/2σ²} dx

Combining exponents: tx − (x−μ)²/2σ²

Complete the square: = −[x−(μ+σ²t)]²/(2σ²) + μt + σ²t²/2

M_X(t) = e^{μt+σ²t²/2} · ∫₋∞^∞ (1/σ√2π)e^{−[x−(μ+σ²t)]²/2σ²} dx

The integral = 1 (normal PDF integrates to 1)

M_X(t) = e^{μt + σ²t²/2}

M'_X(t) = (μ+σ²t)·e^{μt+σ²t²/2} → Mean = M'_X(0) = μ

M''_X(t) = [σ²+(μ+σ²t)²]·e^{μt+σ²t²/2} → E(X²) = M''_X(0) = σ²+μ²

Variance = E(X²)−[E(X)]² = σ²+μ²−μ² = σ² ✓

Q3. For a normal distribution with mean 5 and variance 9, find (i) P(X>8), (ii) P(3<X<7), (iii) the value of x₀ such that P(X<x₀) = 0.90. 8M

X ~ N(5, 9), μ=5, σ=3

(i) P(X>8): Z=(8−5)/3 = 1.0

P(X>8)=1−Φ(1)=1−0.8413=0.1587

(ii) P(3<X<7): Z₁=(3−5)/3=−0.667, Z₂=(7−5)/3=0.667

P=Φ(0.667)−Φ(−0.667)=2Φ(0.667)−1=2(0.7476)−1=0.4952

(iii) P(X<x₀)=0.90 → Φ(z₀)=0.90 → z₀=1.28

1.28 = (x₀−5)/3 → x₀ = 5 + 3×1.28

x₀ = 5 + 3.84 = 8.84

Q1. A sample of 16 items gives mean = 2.5 kg and SD = 2.5 kg. Can this sample be regarded as taken from a population with mean 3 kg? Test at 5% level. 8M

H₀: μ=3, H₁: μ≠3 (two-tailed test)

n=16, x̄=2.5, s=2.5, μ₀=3, α=0.05

t = (x̄−μ₀)/(s/√n) = (2.5−3)/(2.5/√16) = −0.5/(2.5/4) = −0.5/0.625 = −0.8

|t_calc| = 0.8, df = n−1 = 15

t_table(0.05, 15df, two-tailed) = 2.131

|t_calc|=0.8 < 2.131 → Do NOT reject H₀

Conclusion: The sample could have come from a population with mean 3 kg. No significant difference at 5% level.

Q2. Two types of drugs A and B were used on 5 and 7 patients respectively for reducing weight. Drug A gave mean reduction of 6.25 kg (s²=4.5) and Drug B gave 4.38 kg (s²=3.6). Test if drugs differ significantly (5%). 8M

H₀: μ₁=μ₂, H₁: μ₁≠μ₂ | n₁=5, x̄₁=6.25, s₁²=4.5 | n₂=7, x̄₂=4.38, s₂²=3.6

s_p² = [(n₁−1)s₁²+(n₂−1)s₂²]/(n₁+n₂−2) = [4×4.5+6×3.6]/10 = [18+21.6]/10 = 3.96

s_p = √3.96 = 1.99

t = (x̄₁−x̄₂)/[s_p·√(1/n₁+1/n₂)] = (6.25−4.38)/[1.99·√(1/5+1/7)]

√(1/5+1/7)=√(0.2+0.143)=√0.343=0.586

t = 1.87/(1.99×0.586) = 1.87/1.166 = 1.603

df=5+7−2=10, t_table(0.05,10df)=2.228

t_calc=1.603 < 2.228 → Do NOT reject H₀ (drugs do not differ significantly)

Q3. In a survey of 200 persons, their opinion about a new tax policy was recorded. Test if opinion is independent of gender at 5% level. Male: For=80, Against=40 | Female: For=50, Against=30. 8M

	For	Against	Total
Male	80(O)	40(O)	120
Female	50(O)	30(O)	80
Total	130	70	200

H₀: Gender and opinion are independent

E(M,For)=120×130/200=78, E(M,Ag)=120×70/200=42

E(F,For)=80×130/200=52, E(F,Ag)=80×70/200=28

χ²=(80−78)²/78+(40−42)²/42+(50−52)²/52+(30−28)²/28

= 4/78 + 4/42 + 4/52 + 4/28 = 0.051+0.095+0.077+0.143

= 0.366

df=(2−1)(2−1)=1, χ²_table(0.05,1df)=3.841

0.366 < 3.841 → Do NOT reject H₀ (opinion is independent of gender)

Q1. The following data represents yield (kg) of crops under 4 treatments in 5 replications (CRD). Perform one-way ANOVA and test at 5% LOS. T1: 6,7,5,6,4 | T2: 8,6,7,9,7 | T3: 5,4,6,4,5 | T4: 9,10,8,9,10. 8M

T1	T2	T3	T4
6	8	5	9
7	6	4	10
5	7	6	8
6	9	4	9
4	7	5	10
T₁=28	T₂=37	T₃=24	T₄=46

k=4, n=5 each, N=20, T=28+37+24+46=135

CF=135²/20=18225/20=911.25

TSS=(36+49+25+36+16+64+36+49+81+49+25+16+36+16+25+81+100+64+81+100)−CF

=(985)−911.25=73.75

SST=(28²+37²+24²+46²)/5−CF=(784+1369+576+2116)/5−911.25

=4845/5−911.25=969−911.25=57.75

SSE=73.75−57.75=16.00

MST=57.75/3=19.25 | MSE=16/16=1.00

F=19.25/1.00=19.25 | F_table(3,16) at 5%=3.24

F_calc=19.25 > 3.24 → REJECT H₀ — Treatment effects differ significantly

Source	SS	df	MS	F
Treatment	57.75	3	19.25	19.25*
Error	16.00	16	1.00	—
Total	73.75	19	—	—

Q2. Three varieties of wheat (V1, V2, V3) are tested in 4 blocks (RBD). Yields: B1: 48,42,44 | B2: 50,44,46 | B3: 52,46,48 | B4: 46,40,42. Perform two-way ANOVA at 5%. 8M

Block	V1	V2	V3	Block Total
B1	48	42	44	134
B2	50	44	46	140
B3	52	46	48	146
B4	46	40	42	128
Treat Total	196	172	180	T=548

k=3 varieties, b=4 blocks, N=12

CF=548²/12=300304/12=25025.33

TSS=(48²+42²+44²+50²+44²+46²+52²+46²+48²+46²+40²+42²)−CF

=(2304+1764+1936+2500+1936+2116+2704+2116+2304+2116+1600+1764)−25025.33

=25160−25025.33=134.67

SST=(196²+172²+180²)/4−CF=(38416+29584+32400)/4−25025.33

=100400/4−25025.33=25100−25025.33=74.67

SSB=(134²+140²+146²+128²)/3−CF=(17956+19600+21316+16384)/3−25025.33

=75256/3−25025.33=25085.33−25025.33=60.00

SSE=134.67−74.67−60.00=0.00 (ideal data)

MST=74.67/2=37.33, MSE≈0 (infinite F — treatment very significant)

Treatment is highly significant; Block effect is also significant

Q3. Explain the Latin Square Design with a 4×4 example. State its advantages and ANOVA table structure. 8M

Definition: LSD is a p×p arrangement where p treatments appear exactly once in each row and column, controlling two extraneous variables simultaneously.

Example 4×4 LSD layout:

Col1 Col2 Col3 Col4 Row1 [ A ][ B ][ C ][ D ] Row2 [ B ][ C ][ D ][ A ] Row3 [ C ][ D ][ A ][ B ] Row4 [ D ][ A ][ B ][ C ]

ANOVA Table for 4×4 LSD:

Source	SS	df	MS	F
Rows	SSR	3	MSR	MSR/MSE
Columns	SSC	3	MSC	MSC/MSE
Treatments	SST	3	MST	MST/MSE
Error	SSE	6	MSE	—
Total	TSS	15	—	—

Advantages: Controls two sources of variation; smaller error MS → more sensitive test; efficient when p is small (3–8).

Limitations: Requires p² observations; number of treatments equals rows = columns; assumes no interaction.

Q1. Samples of size 4 are drawn every hour from a process. The mean and range values for 10 samples are given below. Construct X̄ and R charts and comment on control. X̄: 14.5,14.8,15.2,15.0,14.6,14.9,15.1,14.7,15.3,14.9 | R: 0.5,0.6,0.4,0.7,0.5,0.6,0.5,0.4,0.6,0.4. 8M

ΣX̄=14.5+14.8+15.2+15.0+14.6+14.9+15.1+14.7+15.3+14.9=149.0

X̄̄=149.0/10=14.90

ΣR=0.5+0.6+0.4+0.7+0.5+0.6+0.5+0.4+0.6+0.4=5.2

R̄=5.2/10=0.52

For n=4: A₂=0.729, D₃=0, D₄=2.282

X̄-chart: UCL=14.90+0.729×0.52=14.90+0.379=15.279

X̄-chart: LCL=14.90−0.379=14.521, CL=14.90

R-chart: UCL=2.282×0.52=1.187, LCL=0, CL=0.52

All X̄ values in [14.521, 15.279] — X̄-chart: IN CONTROL

All R values in [0, 1.187] — R-chart: IN CONTROL

Conclusion: The process is in statistical control for both mean and variability.

Q2. The following table gives the number of defectives in 10 samples each of size 50. Draw the p-chart and state if the process is in control. Defectives: 3,5,2,6,4,3,7,4,5,3. 8M

n=50, k=10, Σd=3+5+2+6+4+3+7+4+5+3=42

p̄=42/(10×50)=42/500=0.084

q̄=1−0.084=0.916

√(p̄q̄/n)=√(0.084×0.916/50)=√(0.07694/50)=√0.001539=0.03923

UCL=0.084+3×0.03923=0.084+0.1177=0.2017

LCL=0.084−0.1177=−0.0337 → take 0

CL=0.084, UCL=0.2017, LCL=0

Sample proportions: 0.06, 0.10, 0.04, 0.12, 0.08, 0.06, 0.14, 0.08, 0.10, 0.06

All proportions lie within [0, 0.2017]. Process is IN CONTROL.

Q3. The number of defects observed in 12 units of cloth (each 50m length) are: 3,4,2,5,6,3,4,2,3,5,4,3. Construct c-chart and check for statistical control. 8M

k=12, Σc=3+4+2+5+6+3+4+2+3+5+4+3=44

c̄=44/12=3.667

√c̄=√3.667=1.914

UCL=3.667+3×1.914=3.667+5.742=9.409

LCL=3.667−5.742=−2.075 → take 0

CL=3.667, UCL=9.409, LCL=0

All defect counts (max=6) lie within [0, 9.409]. Process is IN CONTROL. No assignable causes detected.

Chance Causes	Assignable Causes
Always present; random; cannot be eliminated	Specific, identifiable; can be detected and removed
Small, inevitable variation	Large, avoidable variation
Process under control	Process out of control

p-chart	np-chart
Plots fraction (proportion) defective	Plots number of defectives
Used when subgroup size varies	Used when subgroup size is constant
UCL = p̄ ± 3√(p̄q̄/n)	UCL = np̄ ± 3√(np̄q̄)

	H₀ True	H₀ False
Reject H₀	Type I error (α)	Correct (Power)
Accept H₀	Correct	Type II error (β)

X	Y	x=X−x̄	y=Y−ȳ	xy	x²	y²
65	67	−2.875	−1.25	3.594	8.266	1.563
67	68	−0.875	−0.25	0.219	0.766	0.063
66	68	−1.875	−0.25	0.469	3.516	0.063
71	70	3.125	1.75	5.469	9.766	3.063
67	64	−0.875	−4.25	3.719	0.766	18.063
70	67	2.125	−1.25	−2.656	4.516	1.563
68	72	0.125	3.75	0.469	0.016	14.063
69	70	1.125	1.75	1.969	1.266	3.063
543	546	0	0	13.25	28.875	41.5

Karl Pearson's	Spearman's Rank
Parametric; requires normality	Non-parametric; no assumption
Works on actual values	Works on ranks

U18MAI4201 — Probability & Statistics

Unit 1: Correlation & Regression

Unit 2: Probability & Random Variables

Unit 3: Normal Distribution

Unit 4: Testing of Hypothesis

Unit 5: Design of Experiments

Unit 6: Statistical Quality Control

QB Solutions — Unit 1: Correlation & Regression

QB Solutions — Unit 2: Probability & Random Variables

QB Solutions — Unit 3: Normal Distribution

QB Solutions — Unit 4: Testing of Hypothesis

QB Solutions — Unit 5: Design of Experiments

QB Solutions — Unit 6: Statistical Quality Control

Machine	P(M)	P(D\|M)	P(M)·P(D\|M)
A	0.25	0.05	0.0125
B	0.35	0.04	0.0140
C	0.40	0.02	0.0080
P(D) = Total			0.0345

Source	SS	df	MS	F_calc
Rows	24.75	3	8.25	—
Columns	2.75	3	0.917	—
Treatments	4.25	3	1.417	1.417/0.667 = 2.124
Error	4.00	6	0.667	—
Total	35.75	15	—	—

Source	SS	df	MS	F
Between	SSC	k−1	MSC	MSC/MSE
Within (Error)	SSE	N−k	MSE	—
Total	SST	N−1	—	—

Source	SS	df	MS	F
Blocks	SSR	r−1	MSR	MSR/MSE
Treatments	SSC	k−1	MSC	MSC/MSE
Error	SSE	(r−1)(k−1)	MSE	—
Total	SST	rk−1	—	—

Variables Charts	Attributes Charts
Measurable data (length, weight)	Countable data (defects, defectives)
X̄ chart, R chart, s chart	p-chart, np-chart, c-chart, u-chart

X	Y	XY	X²	Y²
5	50	250	25	2500
8	80	640	64	6400
9	80	720	81	6400
10	70	700	100	4900
5	75	375	25	5625
37	355	2685	295	25825