The following is a comment on the following article published online in Wired magazine.
I think many of the commentators on this thread share my bewilderment at the muddle of a point being made in this essay.
It bears restating emphatically that data is certainly NOT knowledge.
"Models", interpreted most liberally, are anything and everything that bridge the two.
Any synthesis, any definition, any organisation is a model. Any "whither-thence", "if-then", "sometimes-if" clause is a model. Any unit of language is a model. Every work of art is a model - of an experience, a reality, a feeling.
Most generally - all of science (and, perhaps, art, and, even, bloviation in magazines) is a form of modeling. Box's quote is a concise, appropriately humble reminder that "reality" is something that is continuously approached, never attained.
Google isn't science. Like the natural world, google is an event and a source of data of its own - and, to any working (i.e. synthesizing, distilling, modeling) scientist, a useful tool.
The author notes that Mendelian genetics is a simplistic model in view of recent discoveries. Yes. And any refinement of the first model will necessarily take (and most certainly has already taken) the form of refined models. One cannot "fix" the model by simply drowning it in data.
Certainly, the nature of modeling depends on the quality of the available data and on the computational resources available.
As a basic example, the long-standing use of the assumption of normal (meaning bell-shaped) error structures common in statistics was a useful, mathematically tractable, very widely used "model" for comparing sets of data, for finding correlations, for rejecting basic null-hypotheses. This method is arguably on the verge of being made obsolete by the ease and speed of randomization and monte-carlo techniques for obtaining error structures with no prior assumptions.
At this simplest level, the nature of the model has changed. But the need to model remains. As a previous poster noted, the increase in the data and the accessibility of the data is quite on the contrary a clarion call to propose new processes, to account for all the interdependencies, to tease out all the interactions, to identify the dynamics, to pounce and tear away and suck out of the bloated google-sack of data whatever actual knowledge about the natural world might lie therein, in pregnant waiting.
In short, to model like never before.
Eli 13:51, 2 July 2008 (PDT)