Data cleaning consists primarily in implementing error prevention strategies before they occur (see data quality control procedures later in the document). However, error-prevention strategies can reduce but not eliminate common errors and many data errors will be detected incidentally during activities such as:
– When collecting or entering data
– When transforming/extracting/transferring data
– When exploring or analysing data
– When submitting the draft report for peer review
Even with the best error prevention strategies in place, there will still be a need for actively and systematically searching for, detecting and remedying errors/problems in a planned way. Data cleaning involves repeated cycles of screening, diagnosing, treatment and documentation of this process. As patterns of errors are identified, data collection and entry procedures should be adapted to correct those patterns and reduce future errors.