Data cleaning refers to removing redundant or invalid values from a dataset. It is important to remove data which were entered by mistake, error, or are not related to the hypotheses under investigation. The process of cleaning data also involves identifying ‘extreme values’ called outliers and isolating them from the data for analysis purposes. Scatter plots are a useful tool used by researchers to identify extreme values.
Researchers spend the majority of time developing statistical hypotheses and methodologies and will tend to neglect properly cleaning their data. If the data are unreliable due to improper or lackadaisical cleaning, the results are bound to be compromised. It is highly advisable to speak with one of our consultants today for assistance with cleaning data.
© 2006 StatsBusters, LLC. StatsBusters is an American and British cooperative.
The United States and United Kingdom member firms are separate and independent
legal entities and each describes itself as such. All rights reserved.
StatsBusters Online Privacy Statement and Disclaimer