“We all say we like data, but we don’t. We like getting insight out of data. That’s not quite the same as liking the data itself. In fact, I dare say that I don’t quite care for data. It sounds like I’m not alone. It’s tough to nail down a precise definition of “Bad Data.” Some people consider it a purely hands-on, technical phenomenon: missing values, malformed records, and cranky file formats. Sure, that’s part of the picture, but Bad Data is so much more. It includes data that eats up your time, causes you to stay late at the office, drives you to tear out your hair in frustration. It’s data that you can’t access, data that you had and then lost, data that’s not the same today as it was yesterday… In short, Bad Data is data that gets in the way. There are so many ways to get there, from cranky storage, to poor representation, to misguided policy. If you stick with this data science bit long enough, you’ll certainly encounter your fair share”
http://www.amazon.com/Bad-Data-Handbook-Cleaning-Back/dp/1449321887/ref=sr_1_…