Transform Tab in Query Editor: The "User Statistics Option of Transform in Query Editor in Power BI" Lesson-7 P-7
"Statistics" option within Power BI's Query Editor Transform tab: Lession-7 P-7
Of course! Here's a summary of the "Statistics" option within Power BI's Query Editor Transform tab:
the aim:
The "Statistics" option provides a quick way to calculate and add summary statistics columns to your existing data table. These statistics help you gain a better understanding of the distribution and central trends of your data.
How to access:
In the Power Query Editor: Select the column for which you want to calculate statistics.
Transform Tab: Go to the "Transform" tab in the ribbon.
Statistics: Click the "Statistics" button.
Available data:
- Count: The number of non-zero values in the column.
- Average: Average value.
- Median Calculation: The median value of sorted data.
- Sum: The total of all values.
- Minimum: The smallest value.
- Maximum: The largest value.
- Standard Deviation: A measure of spread data.
- Variance: Measures the variation from the mean or mean.
Example use cases:
Data Cleaning: Quickly identify potential outliers based on minimum and maximum values.
Analysis: Calculate the mean for sales data, the median for the income distribution, or the standard deviation for quality control purposes.
Feature engineering: Creating new features for machine learning models based on statistical computations.
Important points:
The "Statistics" option calculates a new temporary table of statistics that you can merge back into your original table to create additional columns with the desired statistics.
You can select multiple columns at once to calculate statistics for multiple fields at once.
Lab Work
Statistics Menu:
1:- Sum:
- The total of all values
Steps:
- first of all Load data table
- transform data
- Select table's colum which you want to total
- Select Transform tab from query editor
- Select Statistics icon
- click on sum
Then you have
2:- Minimum:
- Return the minimum value from selected Colum's Values
- Transform --> Statistics -->Minimum
Final view for Minimum value from selected column
3:- Maximum:
- Return the maximum value from selected Colum's Values
- Transform --> Statistics -->Maximum
4:- Median: (The median is the middle value of a set of sorted numbers)
- Return Median value from selected Colum
- Transform --> Statistics -->Median
What is the median?
Calculate the median of these numbers {12, 13, 111, 113, 126, 134, 147}, which in this example is 113 because there are three numbers on both sides.
To find the middle value in a list with the same number of numbers, it is necessary to determine the middle pair, add them, and divide by two.
5:- Average: (Return the average of all values)
- Return Average from selected Colum
- Transform --> Statistics -->Average
What is the Average?
The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set.
Then we have a view
6:- Standard Deviation (Return the average of all values)
- Return Standard Deviation from selected Colum
- Transform --> Statistics --> Standard Deviation
What is the Standard Deviation ?
Standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance.
Calculation by hand to calculate Standard Deviation
Steps to calculate (standard deviation)
The standard deviation is usually calculated automatically by whichever software you use for your statistical analysis. But you can also calculate it by hand to better understand how the formula works.
There are six main steps for finding the standard deviation by hand. We’ll use a small data set of 6 scores to walk through the steps.
Data set | |||||
---|---|---|---|---|---|
46, | 69, | 32, | 60, | 52, | 41 |
Step 1: Find the mean
To find the mean, add up all the scores, then divide them by the number of scores.
Mean (x̅) |
---|
Step 2: Find each score’s deviation from the mean
Subtract the mean from each score to get the deviations from the mean.
Since x̅ = 50, here we take away 50 from each score.
Score | Deviation from the mean |
---|---|
46 | 46 – 50 = -4 |
69 | 69 – 50 = 19 |
32 | 32 – 50 = -18 |
60 | 60 – 50 = 10 |
52 | 52 – 50 = 2 |
41 | 41 – 50 = -9 |
Step 3: Square each deviation from the mean
Multiply each deviation from the mean by itself. This will result in positive numbers.
Squared deviations from the mean |
---|
(-4)2 = 4 × 4 = 16 |
192 = 19 × 19 = 361 |
(-18)2 = -18 × -18 = 324 |
102 = 10 × 10 = 100 |
22 = 2 × 2 = 4 |
(-9)2 = -9 × -9 = 81 |
Step 4: Find the sum of squares
Add up all of the squared deviations. This is called the sum of squares.
Sum of squares |
---|
16 + 361 + 324 + 100 + 4 + 81 = 886 |
Step 5: Find the variance
Divide the sum of the squares by n – 1 (for a sample) or N (for a population) – this is the variance.
Since we’re working with a sample size of 6, we will use n – 1, where n = 6.
Variance |
---|
Step 6: Find the square root of the variance
To find the standard deviation, we take the square root of the variance.
Standard deviation |
---|
From learning that SD = 13.31, we can say that each score deviates from the mean by 13.31 points on average.
Calculation in Power BI (Standard Deviation)
View to calculate Standard Deviation in Power BI
- Select required column from table
- Select Tranform Tab of query editor
- Select Statistics option
- click on Standard Deviation option
Then we have a view
7:- Count Values (Count number of values)
- Return the number of non-null values from selected Colum
- Transform --> Statistics --> Count Values
Then you have final view of count values
7:- Count Distinct Values (Count number of unique values)
- Return the number of unique non-null values from selected Colum
- Transform --> Statistics --> Count Distinct Values
then you have
Thanks: