Once a scholar has identified a suitable mathematical function or a suitable set of dependent or independent variables, she can begin to look for a causal story to provide an intuition to back the findings. When she writes up the results for publication, the sequence is often reversed. She will state that she started with a causal theory; then looked for the most plausible way of transforming it into a formal hypothesis; and then found it confirmed the data. This is bogus science. In the natural sciences there is no need for the “logic of justification” to match or reflect “the logic of discovery.” Once a hypothesis is stated in its final form, its genesis is irrelevant. What matters are its downstream consequences, not its upstream origins. This is so because the hypothesis can be tested on an indefinite number of observations over and above those that inspired the scholar to think of it in the first place. In the social sciences (and in the humanities), most explanations use a finite data set. Because procedures of data collection often are nonstandardized, scholars may not be able to test their hypotheses against new data. [Footnote:] One could get around or at least mitigate this problem by exercising self-restraint. If one has a sufficiently large data set, one can first concentrate on a representative sample and ignore the rest. Once one has done one’s best to explain the subset of observations, one can take the explanation to the full data set and see whether it holds up. If it does, it is less likely to be spurious. Another way of keeping scholars honest would be if journals refused to consider articles submitted for publication unless the hypotheses to be tested together with the procedures for testing them had been deposited with the editor (say) two years in advance.
Jon Elster, Explaining Social Behavior: More Nuts and Bolts for the Social Sciences, Cambridge, 2007, pp. 48-49