Unit 1: Correlation & Regression
Karl Pearson's Coefficient • Spearman's Rank Correlation • Regression Lines • Regression Coefficients
- If both variables change in the same direction → Positive correlation (r > 0)
- If variables change in opposite directions → Negative correlation (r < 0)
- No relationship → Zero correlation (r = 0)
- Range: −1 ≤ r ≤ +1
- n = number of pairs of observations
- Range: −1 ≤ r ≤ +1; r = ±1 means perfect correlation
- r is dimensionless (no units)
- d = difference in ranks of corresponding values (d = R₁ − R₂)
- n = number of pairs
- Range: −1 ≤ ρ ≤ +1
- Used when data is ordinal or distribution is non-normal
- Range: −1 ≤ r ≤ +1
- r = +1: Perfect positive linear correlation
- r = −1: Perfect negative linear correlation
- r = 0: No linear correlation
- r is independent of change of origin and scale
- r is symmetric: rXY = rYX
- r is a pure number (dimensionless)
- |r| > 0.7 → High correlation; 0.5 < |r| ≤ 0.7 → Moderate; |r| ≤ 0.5 → Low
- Line 1 — Regression of Y on X: Used to estimate Y for a given X. Minimises sum of squared errors in the Y direction.
- Line 2 — Regression of X on Y: Used to estimate X for a given Y. Minimises sum of squared errors in the X direction.
- byx (regression coefficient of Y on X): byx = r · (σy/σx) or byx = (n·ΣXY − ΣX·ΣY) / (n·ΣX² − (ΣX)²)
- bxy (regression coefficient of X on Y): bxy = r · (σx/σy) or bxy = (n·ΣXY − ΣX·ΣY) / (n·ΣY² − (ΣY)²)
- Both regression coefficients must have the same sign as r
- r² = 0.81 means 81% of variation in Y is explained by X
- Range: 0 ≤ r² ≤ 1
- Closer to 1 → better fit of regression line
8-Mark Topics
Karl Pearson's correlation coefficient r is the ratio of the covariance of X and Y to the product of their standard deviations.
- −1 ≤ r ≤ +1 (always)
- r = +1: Perfect positive linear correlation
- r = −1: Perfect negative linear correlation
- r = 0: No linear relationship (variables may still be related non-linearly)
- Independent of origin & scale: If U = (X−a)/h and V = (Y−b)/k, then r(X,Y) = r(U,V)
- Symmetric: rXY = rYX
- r is a pure number (no units)
Find the correlation coefficient between X and Y:
| X | Y | X² | Y² | XY |
|---|---|---|---|---|
| 1 | 5 | 1 | 25 | 5 |
| 2 | 4 | 4 | 16 | 8 |
| 3 | 3 | 9 | 9 | 9 |
| 4 | 2 | 16 | 4 | 8 |
| 5 | 1 | 25 | 1 | 5 |
| ΣX=15 | ΣY=15 | ΣX²=55 | ΣY²=55 | ΣXY=35 |
When values are large, substitute U = X − A, V = Y − B (or U = (X−A)/h etc.) to simplify calculations. The value of r remains unchanged.
- Data is qualitative (e.g., rankings of beauty, intelligence)
- Data does not follow normal distribution
- Data has extreme outliers (rank correlation is more robust)
- Actual values are not available but ranks are given
- Tied ranks: assign each tied item the mean of their ranks
- m = number of items tied at a rank
10 students are ranked in Mathematics (X) and Physics (Y). Find Spearman's ρ:
| Student | X (Math) | Y (Phys) | d = X−Y | d² |
|---|---|---|---|---|
| A | 1 | 3 | −2 | 4 |
| B | 2 | 5 | −3 | 9 |
| C | 3 | 4 | −1 | 1 |
| D | 4 | 1 | 3 | 9 |
| E | 5 | 2 | 3 | 9 |
| F | 6 | 7 | −1 | 1 |
| G | 7 | 9 | −2 | 4 |
| H | 8 | 6 | 2 | 4 |
| I | 9 | 8 | 1 | 1 |
| J | 10 | 10 | 0 | 0 |
| Σd² | 42 | |||
If 3 values are tied at positions 4, 5, 6 → each gets rank = (4+5+6)/3 = 5. Correction = m(m²−1)/12 = 3(9−1)/12 = 2. Add this to Σd² in numerator.
- Both lines pass through (x̄, ȳ)
- r² = byx · bxy → r = ±√(byx · bxy)
- byx and bxy have the same sign as r
- If r = 0 → lines are perpendicular to each other
- If r = ±1 → lines coincide
- AM of regression coefficients ≥ |r|: (byx + bxy)/2 ≥ |r|
- θ = 0° when r = ±1 (lines coincide)
- θ = 90° when r = 0 (lines perpendicular)
Given data: n=5, ΣX=25, ΣY=30, ΣX²=145, ΣY²=220, ΣXY=158. Find the two regression lines.
To estimate X when Y=8: x = 0.2(8) + 3.8 = 5.4
Unit 2: Probability & Random Variables
Axioms • Conditional Probability • Total Probability • Bayes' Theorem • PMF • PDF • CDF • Moments • MGF
- Axiom 1 (Non-negativity): P(A) ≥ 0
- Axiom 2 (Certainty): P(S) = 1 (probability of the entire sample space is 1)
- Axiom 3 (Additivity): If A and B are mutually exclusive (A∩B = ∅), then P(A∪B) = P(A) + P(B)
- Restricts the sample space to event B
- Similarly, P(B|A) = P(A∩B) / P(A)
- Multiplication rule: P(A∩B) = P(A|B)·P(B) = P(B|A)·P(A)
- P(Bₖ) = Prior probability (before observing A)
- P(Bₖ|A) = Posterior probability (after observing A)
- Used to update probability based on new evidence
- Discrete RV: Takes countable values (e.g., number of heads in 3 tosses: 0, 1, 2, 3)
- Continuous RV: Takes any value in a continuous interval (e.g., height, temperature)
- p(x) = P(X = x) for each value x that X can take
- Property 1: p(x) ≥ 0 for all x
- Property 2: Σ p(x) = 1 (sum over all possible values)
- P(a ≤ X ≤ b) = Σ p(x) for x in [a, b]
- Property 1: f(x) ≥ 0 for all x
- Property 2: ∫₋∞^∞ f(x) dx = 1
- P(a ≤ X ≤ b) = ∫ₐᵇ f(x) dx
- Note: P(X = a) = 0 for any specific value a (continuous distribution)
- 0 ≤ F(x) ≤ 1
- F(−∞) = 0 and F(+∞) = 1
- F(x) is non-decreasing: if a < b then F(a) ≤ F(b)
- F(x) is right-continuous: F(x⁺) = F(x)
- For continuous RV: f(x) = dF(x)/dx
- P(a ≤ X ≤ b) = F(b) − F(a)
- r-th Raw Moment (about origin): μ'ᵣ = E(Xʳ)
- r-th Central Moment (about mean): μᵣ = E[(X − μ)ʳ] where μ = E(X)
- μ'₁ = E(X) = Mean
- μ₂ = E[(X−μ)²] = Variance = E(X²) − [E(X)]²
- μ₃ used for skewness; μ₄ for kurtosis
- Relation: μ₂ = μ'₂ − (μ'₁)²
- The r-th moment: E(Xʳ) = drMX(t)/dtr evaluated at t=0
- MX(0) = 1 always
- Uniqueness: if two RVs have the same MGF, they have the same distribution
8-Mark Topics
Let B₁, B₂, ..., Bₙ be mutually exclusive and exhaustive events with P(Bᵢ) > 0. If A is any event with P(A) > 0, then:
Three machines I, II, III produce 50%, 30%, 20% of total output. The percentage of defectives are 3%, 4%, 5% respectively. An item is selected and found defective. Find the probability it came from machine II.
There is a 32.43% probability the defective item came from Machine II.
A RV X has PMF p(x) = kx for x = 1,2,3,4. Find k, P(X≤2), Mean, Variance.
A RV X has PDF f(x) = kx² for 0 ≤ x ≤ 1, 0 otherwise. Find k, P(0.2 ≤ X ≤ 0.5), Mean, Variance.
For f(x) = 3x² on [0,1]: F(x) = ∫₀ˣ 3t² dt = x³ for 0 ≤ x ≤ 1
Check: F(0)=0, F(1)=1 ✓
P(X > 0.7) = 1 − F(0.7) = 1 − 0.343 = 0.657
| Moment | Discrete | Continuous | Meaning |
|---|---|---|---|
| μ'₁ (1st raw) | Σx·p(x) | ∫x·f(x)dx | Mean (μ) |
| μ'₂ (2nd raw) | Σx²·p(x) | ∫x²·f(x)dx | E(X²) |
| μ₂ (2nd central) | E[(X−μ)²] = μ'₂ − (μ'₁)² | Variance (σ²) | |
| μ₃ (3rd central) | E[(X−μ)³] | Skewness | |
| μ₄ (4th central) | E[(X−μ)⁴] | Kurtosis | |
Expanding etX as a series: etX = 1 + tX + t²X²/2! + t³X³/3! + ...
So: MX(t) = 1 + tμ'₁ + t²μ'₂/2! + t³μ'₃/3! + ...
Therefore: μ'ᵣ = [drMX(t)/dtr]t=0
X has PMF: p(0)=1/4, p(1)=1/2, p(2)=1/4. Find MGF and first two moments.
- MX(0) = 1 (always)
- If Y = aX + b, then MY(t) = ebt·MX(at)
- If X and Y are independent: MX+Y(t) = MX(t)·MY(t)
- MGF uniquely determines the distribution
Unit 3: Normal Distribution
Normal Distribution • PDF • Properties • Area Rule • Moments • Moment Generating Function
- Parameters: μ (mean), σ² (variance), σ (standard deviation)
- Bell-shaped, symmetric about x = μ
- Also called the Gaussian Distribution
- Mean = Median = Mode = μ (perfectly symmetric)
- Curve is bell-shaped and symmetric about x = μ
- Total area under curve = 1
- The curve is asymptotic to the x-axis (never touches)
- Skewness β₁ = 0; Kurtosis β₂ = 3 (mesokurtic)
- All odd central moments = 0
- Linear combination of independent normal RVs is also normal
| Interval | Probability | Percentage |
|---|---|---|
| μ − σ to μ + σ | P(μ−σ < X < μ+σ) | 68.27% |
| μ − 2σ to μ + 2σ | P(μ−2σ < X < μ+2σ) | 95.45% |
| μ − 3σ to μ + 3σ | P(μ−3σ < X < μ+3σ) | 99.73% |
- μ₁ = 0 (all odd moments = 0 due to symmetry)
- μ₂ = σ² (variance)
- μ₃ = 0 (zero skewness)
- μ₄ = 3σ⁴
- In general: μ₂ₙ₊₁ = 0 and μ₂ₙ = 1·3·5···(2n−1)·σ²ⁿ
- μ'₁ = M'X(0) = μ (Mean)
- μ'₂ = M''X(0) = μ² + σ² (Second raw moment)
- Var(X) = μ'₂ − (μ'₁)² = (μ²+σ²) − μ² = σ² ✓
- P(Z < 0) = 0.5 (by symmetry)
- P(Z < −z) = 1 − P(Z < z) = 1 − Φ(z)
This is called the reproductive (additive) property of normal distribution.
8-Mark Topics
- μ = mean (location parameter) — shifts curve left/right
- σ = standard deviation (scale parameter) — controls spread
- Larger σ → flatter, wider curve; Smaller σ → taller, narrower
- Symmetry: f(μ+x) = f(μ−x) — symmetric about x = μ
- Mean = Median = Mode = μ
- Maximum: Curve is maximum at x = μ, maximum value = 1/(σ√(2π))
- Asymptotic: Curve approaches x-axis but never touches it
- Inflection Points: At x = μ ± σ
- Area = 1: ∫₋∞^∞ f(x) dx = 1
- Moments: All odd central moments = 0; μ₂ = σ², μ₄ = 3σ⁴
- Kurtosis β₂ = 3 (mesokurtic — neither flat nor peaked)
The z-table gives Φ(z) = P(Z ≤ z) for Z ~ N(0,1). Key symmetry rules:
- P(Z ≤ 0) = 0.5
- P(Z ≤ −z) = 1 − P(Z ≤ z) = 1 − Φ(z)
- P(a ≤ Z ≤ b) = Φ(b) − Φ(a)
- P(Z ≥ z) = 1 − Φ(z)
X ~ N(50, 100) [i.e. μ=50, σ=10]. Find (i) P(X < 65), (ii) P(40 < X < 60), (iii) P(X > 72).
X ~ N(30, 25) [μ=30, σ=5]. Find x₀ such that P(X > x₀) = 0.05.
In an exam, scores are normally distributed with mean 70 and SD 15. If 500 students appeared, how many scored between 55 and 85?
z = 1.28 → P = 0.90 | z = 1.645 → P = 0.95 | z = 1.96 → P = 0.975 | z = 2.33 → P = 0.99 | z = 2.576 → P = 0.995
For standard normal Z ~ N(0,1), M_Z(t) = e^(t²/2). Expanding as a power series:
| Moment | For N(0,1) | For N(μ,σ²) |
|---|---|---|
| μ₁ (mean) | 0 | μ |
| μ₂ (variance) | 1 | σ² |
| μ₃ | 0 | 0 |
| μ₄ | 3 | 3σ⁴ |
| β₁ = μ₃²/μ₂³ | 0 | 0 (symmetric) |
| β₂ = μ₄/μ₂² | 3 | 3 (mesokurtic) |
Unit 4: Testing of Hypothesis
t-test (Single Mean, Difference, Paired) • F-test (Variance Ratio) • Chi-Square Test
- Null Hypothesis (H₀): A statement of no difference or no effect. It is the hypothesis being tested. Assumed true until evidence suggests otherwise. Example: H₀: μ = 50
- Alternative Hypothesis (H₁ or Hₐ): The claim accepted if H₀ is rejected. Example: H₁: μ ≠ 50 (two-tailed) or H₁: μ > 50 (one-tailed)
- Level of Significance (α): The probability of rejecting H₀ when it is actually true (Type I error probability). Common values: α = 0.05 (5%) or α = 0.01 (1%)
- Critical Region (Rejection Region): The set of values of the test statistic for which H₀ is rejected.
- Critical Value: The boundary value separating acceptance and rejection regions.
| Error | When? | Probability | Name |
|---|---|---|---|
| Type I (α) | Reject H₀ when H₀ is TRUE | α (LOS) | False Positive |
| Type II (β) | Accept H₀ when H₀ is FALSE | β | False Negative |
- Power of the test = 1 − β = P(reject H₀ | H₁ is true)
- d = difference for each pair (d = x₁ − x₂)
- d̄ = mean of differences
- sd = standard deviation of differences
- Used for Variance Ratio Test — testing if two populations have equal variances
- df = (n₁−1, n₂−1)
- F ≥ 1 always (numerator has larger variance)
- Also used in ANOVA
- Test for Independence: Tests if two attributes are independent (contingency table)
- Goodness of Fit: Tests if observed data fits a theoretical distribution
- df for independence = (r−1)(c−1)
- df for goodness of fit = n−1 (or n−k−1 if k parameters estimated)
| Feature | One-tailed | Two-tailed |
|---|---|---|
| H₁ | μ > μ₀ or μ < μ₀ | μ ≠ μ₀ |
| Critical region | One side only | Both sides |
| Critical value (α=0.05) | t = ±1.645 | t = ±1.96 |
| Use | Direction known | Direction unknown |
- Observations must be independent
- Total frequency N must be reasonably large (N ≥ 50)
- No expected frequency should be less than 5. If E < 5, merge adjacent classes (pooling)
- Data must be in frequencies (not percentages or ratios)
- Sample must be drawn by random sampling
8-Mark Topics
Tests whether sample mean x̄ differs significantly from hypothesised population mean μ₀.
A sample of 10 observations has mean 52 and SD 8. Test H₀: μ = 50 at 5% level of significance.
Conclusion: There is no significant difference between sample mean and population mean at 5% level.
Numerical: n₁=8, x̄₁=14.5, s₁²=4 | n₂=10, x̄₂=12.8, s₂²=3.5. Test H₀: μ₁=μ₂ at 5%.
- Compute d = x₁ − x₂ for each pair
- Compute d̄ = Σd/n
- Compute sd = √[Σd²/n − (d̄)²] or = √[Σ(d−d̄)²/(n−1)]
- t = d̄/(sd/√n) with df = n−1
A drug is given to 8 patients. Blood pressure (BP) is recorded before and after. Test if the drug reduces BP at 5% LOS.
| Patient | Before (x₁) | After (x₂) | d=x₁−x₂ | d² |
|---|---|---|---|---|
| 1 | 145 | 138 | 7 | 49 |
| 2 | 152 | 143 | 9 | 81 |
| 3 | 138 | 136 | 2 | 4 |
| 4 | 160 | 150 | 10 | 100 |
| 5 | 148 | 140 | 8 | 64 |
| 6 | 130 | 128 | 2 | 4 |
| 7 | 155 | 146 | 9 | 81 |
| 8 | 142 | 137 | 5 | 25 |
| Sum | 52 | 408 | ||
Conclusion: The drug significantly reduces blood pressure at 5% level.
Tests H₀: σ₁² = σ₂² (two population variances are equal).
- df₁ = n₁ − 1 (numerator), df₂ = n₂ − 1 (denominator)
- Reject H₀ if Fcalc > Ftable(df₁, df₂) at α level
Sample 1: n₁=10, s₁²=28.5 | Sample 2: n₂=14, s₂²=12.6. Test equality of variances at 5%.
Expected frequency: Eij = (Row i total × Column j total) / Grand Total N
Numerical: 200 patients classified by gender and recovery:
| Gender / Recovery | Recovered | Not Recovered | Total |
|---|---|---|---|
| Male | 70 | 30 | 100 |
| Female | 60 | 40 | 100 |
| Total | 130 | 70 | 200 |
Here: N=200, a=70, b=30, c=60, d=40
χ² = 200×(70×40−30×60)² / (100×100×130×70) = 200×(2800−1800)² / 91000000 = 200×1000000/91000000 ≈ 2.198 ✓
Tests whether observed data follows a specified theoretical distribution.
Example: A die is thrown 120 times. Test if the die is fair (uniform distribution expected).
| Face | 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|---|
| Observed (O) | 25 | 17 | 15 | 23 | 24 | 16 |
| Expected (E) | 20 | 20 | 20 | 20 | 20 | 20 |
Unit 5: Design of Experiments
ANOVA (One-Way & Two-Way) • CRD • RBD • LSD — Full ANOVA Tables with Numericals
- Total Variation = Variation due to treatments + Variation due to error (random)
- Uses the F-statistic (ratio of treatment variance to error variance)
- H₀: μ₁ = μ₂ = ... = μₖ (all treatment means are equal)
- Normality: Each population follows normal distribution
- Homogeneity of variance: All populations have the same variance (σ²)
- Independence: Observations are independent of each other
- Additivity: Effects are additive (no interaction, in two-way)
- Used when experimental units are homogeneous
- One-way ANOVA is applied
- df: Treatment = k−1, Error = N−k, Total = N−1 (k = number of treatments, N = total observations)
- Controls for one extraneous variable (block effect)
- Two-way ANOVA (without interaction) applied
- df: Treatment = k−1, Block = b−1, Error = (k−1)(b−1), Total = kb−1
- p×p square: p treatments, p rows, p columns
- df: Treatment = p−1, Row = p−1, Column = p−1, Error = (p−1)(p−2), Total = p²−1
- More efficient than RBD when two blocking factors exist
- TSS = ΣΣ x²ᵢⱼ − CF (Total Sum of Squares)
- SST = Σ (Tᵢ²/nᵢ) − CF (Sum of Squares due to Treatments)
- SSE = TSS − SST (Error Sum of Squares)
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Treatment | SST | k−1 | MST=SST/(k−1) | MST/MSE |
| Error | SSE | N−k | MSE=SSE/(N−k) | — |
| Total | TSS | N−1 | — | — |
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Treatment | SST | k−1 | MST | MST/MSE |
| Block | SSB | b−1 | MSB | MSB/MSE |
| Error | SSE | (k−1)(b−1) | MSE | — |
| Total | TSS | kb−1 | — | — |
| Feature | CRD | RBD | LSD |
|---|---|---|---|
| Blocking | None | One direction | Two directions |
| Extraneous variables | 0 | 1 | 2 |
| Error df | N−k | (k−1)(b−1) | (p−1)(p−2) |
| Efficiency | Lowest | Medium | Highest |
| ANOVA type | One-way | Two-way | Two-way+ |
8-Mark Topics
Three fertilisers are applied to crops in 4 plots each. Yields (kg) are: A: 14,16,18,12 | B: 10,12,14,10 | C: 20,18,22,16. Test at 5% if yields differ.
| A | B | C | |
|---|---|---|---|
| Obs 1 | 14 | 10 | 20 |
| Obs 2 | 16 | 12 | 18 |
| Obs 3 | 18 | 14 | 22 |
| Obs 4 | 12 | 10 | 16 |
| Total (Tᵢ) | 60 | 46 | 76 |
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Treatment | 112.67 | 2 | 56.33 | 9.93* |
| Error | 51.00 | 9 | 5.67 | — |
| Total | 163.67 | 11 | — | — |
* Significant at 5% level. Fertiliser type significantly affects yield.
Four varieties of rice (T1,T2,T3,T4) tested in 3 blocks. Yield data (quintals):
| Block\Treatment | T1 | T2 | T3 | T4 | Block Total (Bⱼ) |
|---|---|---|---|---|---|
| B1 | 25 | 23 | 20 | 27 | 95 |
| B2 | 22 | 19 | 18 | 24 | 83 |
| B3 | 28 | 26 | 23 | 29 | 106 |
| Treat Total (Tᵢ) | 75 | 68 | 61 | 80 | T=284 |
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Treatment | 68.67 | 3 | 22.89 | 75.05* |
| Block | 66.17 | 2 | 33.09 | 108.5* |
| Error | 1.83 | 6 | 0.305 | — |
| Total | 136.67 | 11 | — | — |
In a p×p LSD, each treatment appears exactly once in each row and each column.
A 3×3 Latin Square experiment on crop yield (A, B, C = three fertilisers):
| \ | Col 1 | Col 2 | Col 3 | Row Total |
|---|---|---|---|---|
| Row 1 | A=17 | B=14 | C=12 | R₁=43 |
| Row 2 | B=13 | C=11 | A=16 | R₂=40 |
| Row 3 | C=10 | A=18 | B=15 | R₃=43 |
| Col Total | C₁=40 | C₂=43 | C₃=43 | T=126 |
Treatment totals: A=17+16+18=51, B=14+13+15=42, C=12+11+10=33
| Source | SS | df | MS | F |
|---|---|---|---|---|
| Row | 2 | 2 | 1 | 1 |
| Column | 2 | 2 | 1 | 1 |
| Treatment | 54 | 2 | 27 | 27* |
| Error | 2 | 2 | 1 | — |
| Total | 60 | 8 | — | — |
Unit 6: Statistical Quality Control
Process Control • X̄ & R Charts • p-chart • np-chart • c-chart — Control Limit Formulas + Numericals
- Distinguishes between chance (common) causes and assignable (special) causes of variation
- Main tool: Control Charts (Shewhart charts)
- Goal: Keep process in a state of statistical control
- UCL — Upper Control Limit (3σ above centre)
- CL — Centre Line (process mean)
- LCL — Lower Control Limit (3σ below centre)
| Variable Charts | Attribute Charts |
|---|---|
| For measurable characteristics (length, weight) | For counted characteristics (defects, defectives) |
| X̄-chart, R-chart | p-chart, np-chart, c-chart |
| Used when continuous measurement possible | Used for go/no-go, pass/fail data |
| Chart | UCL | CL | LCL |
|---|---|---|---|
| X̄-chart | X̄̄ + A₂·R̄ | X̄̄ (Grand Mean) | X̄̄ − A₂·R̄ |
| R-chart | D₄·R̄ | R̄ (Mean Range) | D₃·R̄ |
- X̄̄ = mean of sample means, R̄ = mean of sample ranges
- A₂, D₃, D₄ are control chart constants depending on subgroup size n
- Common constants for n=5: A₂=0.577, D₃=0, D₄=2.115
| UCL | CL | LCL |
|---|---|---|
| p̄ + 3√(p̄(1−p̄)/n) | p̄ | p̄ − 3√(p̄(1−p̄)/n) (min 0) |
| UCL | CL | LCL |
|---|---|---|
| np̄ + 3√(np̄(1−p̄)) | np̄ | np̄ − 3√(np̄(1−p̄)) (min 0) |
| UCL | CL | LCL |
|---|---|---|
| c̄ + 3√c̄ | c̄ | c̄ − 3√c̄ (min 0) |
- Difference from p/np: c-chart counts defects (not defectives); one item can have multiple defects
| p-chart | c-chart |
|---|---|
| Fraction/proportion of defective items | Number of defects per unit |
| Based on Binomial distribution | Based on Poisson distribution |
| Sample size can vary | Unit of inspection is constant |
| Example: % rejected bolts per batch | Example: scratches per car door |
- Chance (Common) Causes: Natural, unavoidable variation inherent in any process. Process is still "in control". Cannot be eliminated without redesigning the process. Example: minor machine vibration, raw material variation.
- Assignable (Special) Causes: Specific, identifiable causes that push the process out of control. Can and should be identified and eliminated. Example: worn tool, untrained operator, faulty material batch.
| n | A₂ | D₃ | D₄ |
|---|---|---|---|
| 2 | 1.880 | 0 | 3.267 |
| 3 | 1.023 | 0 | 2.574 |
| 4 | 0.729 | 0 | 2.282 |
| 5 | 0.577 | 0 | 2.115 |
| 6 | 0.483 | 0 | 2.004 |
8-Mark Topics
10 samples of size 5 are taken. Sample means (X̄) and ranges (R) are:
| Sample | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| X̄ | 42 | 45 | 41 | 43 | 46 | 44 | 40 | 43 | 45 | 41 |
| R | 5 | 6 | 4 | 7 | 6 | 5 | 4 | 6 | 5 | 4 |
- Quality characteristic is attribute (defective / non-defective)
- Sample size n can be variable or constant
- Based on Binomial distribution
10 batches of 100 items each inspected. Number of defectives:
| Batch | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Defectives (d) | 7 | 5 | 4 | 8 | 6 | 3 | 9 | 5 | 4 | 9 |
Batch proportions: 0.07, 0.05, 0.04, 0.08, 0.06, 0.03, 0.09, 0.05, 0.04, 0.09 — all within [0, 0.1313]. Process IN CONTROL.
Numerical: Number of defects (scratches) found in 10 car panels:
| Panel | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|---|---|---|---|---|---|---|---|---|---|---|
| Defects (c) | 4 | 3 | 6 | 2 | 5 | 4 | 7 | 3 | 4 | 2 |
All 10 panels have defects between 0 and 10. Process is IN CONTROL.
Use np-chart instead of p-chart when sample size n is constant and you prefer plotting the actual count (not fraction). Both charts give the same conclusions — np-chart is simpler to compute.
For the p-chart example above (n=100, p̄=0.06, q̄=0.94):
Measurement data → X̄ & R charts
Proportion defective, variable n → p-chart
Count defective, fixed n → np-chart
Count defects per unit → c-chart