Up Learn – A Level maths (edexcel) – Outliers and Standard Deviation
Outliers and Standard Deviation Summary
Here’s a summary of everything you need to know about outliers and standard deviation for A Level.
A*/A guaranteed or your money back
More informationWant to see the whole course?
No payment info required!
More videos on Outliers and Standard Deviation:
Introduction: Outliers (free trial)
Dealing with Outliers (free trial)
Strategy 1: Finding Outliers Using Quartiles (Part 1) (free trial)
Strategy 1: Finding Outliers Using Quartiles (Part 2) (free trial)
Strategy 1: The Constant k (free trial)
Another Strategy for Finding Outliers (free trial)
Finding the Average Deviance (free trial)
Step 3: Standard Deviation (free trial)
Strategy 2: Finding Outliers Using Standard Deviation (free trial)
Univariate Data
2. Quantitative and Qualitative Variables (free trial)
3. Continuous and Discrete Variables (free trial)
4. What are Data? (free trial)
5. Types of Data (free trial)
6. Introduction (free trial)
7. Frequency Tables (free trial)
8. Frequency Tables and Quantitative Data (free trial)
9. Grouped Frequency Tables (free trial)
10. Parts of the Grouped Frequency Table (free trial)
11. Hidden Boundaries (free trial)
12. Finding Class Boundaries (free trial)
13. Grouped Frequency Tables with Boundaries (free trial)
14. Class Widths and Midpoints (free trial)
2. Linear Interpolation (free trial)
3. Linear Interpolation and Tables (free trial)
4. Reading Grouped Frequency Tables (free trial)
5. Cumulative Frequency Counts (free trial)
6. Interpolating Frequency Counts in Subclasses (free trial)
7. Interpolating Frequency Counts- Shortcuts (free trial)
2. The Modal Class Interval (free trial)
3. More Measures of Central Location (free trial)
4. The Total Number of Data Points (free trial)
5. Sigma Notation (free trial)
6. Central Location – Mean (free trial)
7. Sigma Notation Part 2 (free trial)
8. Calculating a Mean from Frequency Tables (free trial)
9. Estimating a Mean from Grouped Frequency Tables (free trial)
10. Central Location – Median (free trial)
11. Describing the Location of the Median (free trial)
12. Finding the Median in a Large Data Set (free trial)
13. Central Tendency and Symmetric Distributions (free trial)
14. Positively and Negatively Skewed Distributions (free trial)
2. Quartiles (free trial)
3. The Interquartile Range (free trial)
4. How do we cut data into quarters? (free trial)
5. Finding the Position of Quartiles (free trial)
6. Rounding the Position of Quartiles (Article) (free trial)
7. The Values of Quartiles Between Data Points (free trial)
8. Finding Quartiles from Frequency Tables (free trial)
9. Finding Quartiles of Continuous Data (free trial)
10. Another Measure of Location (free trial)
11. Percentiles (free trial)
12. Percentiles and Quartiles (free trial)
13. Finding the Position of Percentiles (free trial)
14. Spread – The Interpercentile Range (free trial)
15. What’s So Great About Interpercentile Ranges? (free trial)
2. Dealing with Outliers (free trial)
3. Strategy 1: Finding Outliers Using Quartiles (Part 1) (free trial)
4. Strategy 1: Finding Outliers Using Quartiles (Part 2) (free trial)
5. Strategy 1: The Constant k (free trial)
6. Another Strategy for Finding Outliers (free trial)
7. Deviation (free trial)
8. Finding the Average Deviance (free trial)
9. Step 1: Sxx (free trial)
10. Step 2: Variance (free trial)
11. Step 3: Standard Deviation (free trial)
12. Strategy 2: Finding Outliers Using Standard Deviation (free trial)
13. Mean Absolute Deviation (free trial)
14. Removing Anomalies (free trial)
15. SD and Variance: Measures of Spread (free trial)
16. Finding Variance/SD: A Shortcut Part 1 (free trial)
17. Finding Variance/SD: A Shortcut Part 2 (free trial)
18. Finding the Variance/SD from a Frequency Table (free trial)
19. Finding the Variance/SD from a Grouped Frequency Table (free trial)
20. Comparing Data Sets (free trial)
2. Coding Data (free trial)
3. Rules for Coding Data (free trial)
4. Rules Involving Subtraction (free trial)
5. Rules Involving Subtraction and Division (free trial)
6. Decoding Data (free trial)
7. Finding the Mean of Coded Data (free trial)
8. Finding the Standard Deviation of Coded Data (free trial)
9. How Mean and Standard Deviation are Affected by Coding (free trial)
2. Box Plots (free trial)
3. Box Plots with Outliers (free trial)
4. Comparing Box Plots (free trial)
5. Another Way to Estimate Data (free trial)
6. Plotting Cumulative Frequency Diagrams (free trial)
7. Reading Cumulative Frequency Diagrams (free trial)
2. Why is a Histogram Not a Bar Chart? – Part 1 (free trial)
3. Why is a Histogram Not a Bar Chart? – Part 2 (free trial)
4. Why Do Histograms Use Area to Represent Frequency? (free trial)
5. Plotting A Histogram – Part 1 (free trial)
6. Plotting A Histogram – Part 2 (free trial)
7. Histogram Questions – Part 1 (free trial)
8. Histogram Questions – Part 2 (free trial)
9. Frequency Polygons (free trial)
2. What is a Population? (free trial)
3. What is a Census? (free trial)
4. Censuses: Pros and Cons (free trial)
5. Samples and Inferences (free trial)
6. Why is it Called a Sampling Frame? (free trial)
7. Samples Should Be Representative (free trial)
8. Sample Size (free trial)
9. Types of Sampling (free trial)
10. Opportunity Sampling (free trial)
11. Opportunity Sampling: Pros and Cons (free trial)
12. Quota Sampling (free trial)
13. Quota Sampling: Pros and Cons (free trial)
14. Random Sampling vs Non-Random Sampling (free trial)
15. Simple Random Sampling (free trial)
16. Simple Random Sampling: Pros and Cons (free trial)
17. Systematic Sampling (free trial)
18. Systematic Sampling: Pros and Cons (free trial)
19. Stratified Sampling (free trial)
20. Stratified Sampling: Pros and Cons (free trial)
Here’s a reminder of the key points you should know about outliers and standard deviation.
It’s possible for a dataset to have extreme values, called outliers.
…Which can make some descriptive statistics misleading.
An outlier that is the result of an error is called an anomaly.
If an outlier is an anomaly, we should clean the data by removing it.
And if it isn’t, we need to leave it in, because it gives us important information, even if it skews our descriptive statistics.
To figure out if there are outliers in a data set, statisticians create cut-off values called fences.
Any data point lower than this… or greater than this… is an outlier.
Now, there are two strategies to determine where these fences should go.
One strategy is to use rules built from quartiles.
One for the lower fence.
[Q1-1.5(Q3-Q1), write formula on the lower fence]
And one for the upper fence.
[Q3 + 1.5 (Q3-Q1), write formula on the upper fence]
Where this is the interquartile range. [Q_3 – Q_1, replace by IQR]
[Q1-1.5(IQR)]
[Q3 + 1.5 (IQR)]
And this is a constant. [1.5]
…Which is often 1.5 but can vary, depending on the dataset.
A second strategy is to use rules in this form
Where this represents a measure of spread called the standard deviation. []
The standard deviation is the average distance between data points and the mean.
And to find the standard deviation of a dataset…
Start by finding all of the deviance scores, which is the distance between a given data point and the mean.
We can represent deviance scores using this notation.
Once we have the deviance scores for all data points..
Square them…
And add them.
We can also represent the sum of square deviations like this.
Finally, divide this by the total number of data points, [(x-x)2n] to find the variance. [variance=(x-x)2n]
Which is the standard deviation squared. [2=(x-x)2n]
So to find the standard deviation, just take the square root of this.[=(x-x)2n
Both standard deviation and variance are examples of measures of spread.
Now, it can take an unnecessarily long time to start by finding the sum of squared deviations.
Fortunately, there’s a shortcut we can use to find standard deviation.
First, calculate the total number of data points, the sum of all the data points, and the sum of squares.
Second, find the mean of the squares… and the square of the mean…
Third, use this formula to find the variance.
And take the square root if you want the standard deviation.
When the dataset is given in a frequency table, we can use the same shortcut to find the variance and standard deviation.
Though now, the number of data points is given by the sum of the frequency counts. [f]
And since we need to multiply each data point [highlight x] by its frequency count first [f], we represent the formula like this. [fx, fx2]
[2 =fx2f- (fxf)2 ]
And when the dataset is given in a grouped frequency table, we can only estimate the variance and standard deviation.
First, find the midpoints of each class interval.
Finally, in your exam, you could be asked to compare two data sets.
And in that case, you need to compare one measure of central location, and one measure of spread.
If the datasets don’t have outliers, use the mean and standard deviation.
But if there are outliers, the mean and standard deviation can be misleading, so it’s more appropriate to use the median and interquartile range.
WHAT YOU GET
Every course includes
Interactive Video Lessons
Video content that keeps you engaged and regular activities that keep you from losing focus
Detailed Quizzes
Technical, Memorisation and Mastery quizzes gradually build up your knowledge and understanding
Exclusive Practice Papers
Written by real examiners exclusively for Up Learn in order to give you additional confidence when preparing for exams
Progress Tracker to A*
Bespoke assessment and practice questions to chart your grade gains as you progress through your A Level Maths revision