So true, as with any new approach or technique, the art will be about integrating the new, uncharted territory with the old, well known practice. With BigData, we will need to climb a learning curve. As a free climber you get a lot of flexibility on how to approach this. But if our goal is to make way for other climbers behind us, then maybe it should be our responsibility to assure a safe climb for them? Training / test / POC data is fine for learning how to climb, but it likely sucks with regards to securing a good path …
•Overfitting is a general phenomenon that plagues all machine learning methods.
•That is one reason why you must never evaluate on the training data set.
•Overfitting can also occur more generally.
•Therefore choose the best method for the data domain.
•Performance is not expected to be as good on new test data.
•Try dividing data into train, test and validation.