Cleaning data in Power BI: Transform (Edit) Data Lesson-5
Cleaning Data in Power BI
Data cleaning is an essential step in data analysis, and Power BI offers powerful tools to transform your raw data into a clear and consistent format. Here is a comprehensive guide:
Problem identification:
- Incomplete data: missing values, empty entries, or inconsistent formats.
- Inaccurate data: typos, errors in data entry, or outdated information.
- Conflicting data: Different units, formatting styles, or naming conventions.
- Duplicate data: The same data points appear multiple times.
Common cleaning techniques:
- Filtering: Remove irrelevant rows or columns based on specified criteria.
- Substituting values: Replace missing values with suitable substitutes such as mean or median.
- Formatting: Ensure consistent date, time, number, and text formats.
- Column Splitting: Split a column into separate columns for clarity.
- Merging Columns: Combine related columns for easier analysis.
- Creating calculated columns: Derive new data points based on existing columns.
- Duplicate Removal: Identify duplicate rows or entries and eliminate them.
Power BI Tools:
Power Query Editor: Offers a rich toolbox for cleaning and transforming data.
Conditional Formatting: Visually highlight potential problems such as missing values.
Data modeling tools: Create relationships between tables and manage data types.
Cleaning Points:
- Connect to your data source.
- Analyze data: Identify areas that need cleaning using Power BI visual or data profiling tools.
- Open the Power Query Editor: Transform data in Home Tab your data using the available options.
- Filter out irrelevant data.
- Replace missing values: Choose appropriate replacement or removal strategies.
- Format and Clean Data Types: Ensure consistency and compatibility.
- Split or merge columns: Refine the data structure for clarity.
- Create Calculated Columns: Gain new insights from existing data.
- Remove duplicates: Ensure data integrity.
- Review and Improve: Validate your cleaning process and make adjustments if needed.
Suggestions:
- Document your cleaning steps for future reference.
- Consider using data profiling tools for a comprehensive analysis.
- Start with small datasets to practice and build your skills.
- Take advantage of online resources and community forums for support and learning.
- Remember: Data cleaning is an iterative process. Continually review your data and refine your cleaning techniques for optimal results.
Lab Work
first of all connect required data file and then follow these steps:
With this option (Transform data) we can change:
- Heading (fields)
- Remove row
- Remove column
- Change data type
- Keep Rows
- Keep Columns
- No change table data records and More...
1:- Start:
- Home -> Transform data -> Transform data
- Display this table view window (Which is editable mode)
- Now you can modify your table
2:- Change or rename your field name (Heading)
- Double click or Right Click on required field and change it.
3:- Undo
- if you want to "undo" then you click on cross from "APPLIED STEPS" box
- you can undo one by one
How to change datatypes (like string or numaric)
- We can change the data type of table headings
- click on right top corner icon(ABC123) of any headin and change type
- i.e 123 means numaric type & ABC means string (Text type)
4:- Remove Rows
Remove Rows from table
How many rows to remove from the top (start)
- Home -> Remove Rows -> Remove Top Rows
- Type number you want to how many rows to delete from start (From Top) and press ok button
How many rows to remove from Bottom(end) to top
- Home -> Remove Rows -> Remove Bottom Rows
How many rows to remove from - to (OR) From Mid
- Home -> Remove Rows -> Remove Alternate
Remove Duplicate Rows
- Remove all duplicate(Repeated) rows(Records) from table
- Home -> Remove Rows -> Remove Duplicates
Remove Blank Rows
- Remove all blank(Empty) rows from table
- Home -> Remove Rows -> Remove Blank Rows
Remove Errors
- Remove all errors from table
- Home -> Remove Rows -> Remove Errors
5:- Keep Rows from table
Keep Top Rows
- keep only the top N rows from table
- Home -> Keep Rows -> Keep Top Rows
Keep Bottom Rows
- keep only the bottom N Rows fron this table
- Home -> Keep Rows -> Keep Bottom Rows
Keep Range of Rows
- Specify the number of rows to keep from start to end
- Home -> Keep Rows -> Keep Range of Rows
Keep Duplicate
- keep duplicate rows of selected column
- Home -> Keep Rows -> Keep Duplicats
Keep Errors
- not remove error from selected columns
- Home -> Keep Rows -> Keep Errors