I’ve been thinking about the book “Everybody lies…” One thing that the author uses a lot is data from all different sources. I guess it is to be recognized when some one does something good even if that work is based mostly on diligent products of many other people–in this case data of all sorts.
Looking at recent kaggle competitions, it also seems that companies are starting to notice this. Some competitions, such as the zillow $1mm competition, not only does not prevent competitors from using outside data, it encourages them to use new data source.
That’s very interesting. This kaggle competition not only encourages competition in model building, but also encourages data harvesting–finding and using mature but previously unused data.
This may very well continue for some time yet as we find new ways to treat more and more objects and information as data.
What will be harvested next?