Ideas and Data: A Case for Duality
Posted by: rsr
Subject tags: society, science, biology
on Dec 28, 2008
Those familiar with Open2it from its early days will readily discern that this blog represents a small change of format. After considering my place in this community, I decided to move my blog into the threaded community blog. I really like the intermingling of blog posts by all Open2it members and, frankly, I felt a bit left out in having a separate blog. From now on, my shorter contributions will appear among those of other members, while my longer notes will be assigned as either Articles or Essays.
Something that has fired more than a few neurons in my brain is a fundamental duality in science: the relative importance of ideas and data. Science, for those who are rusty on the nuts and bolts, is about hypothesis testing. There is a procedural formula for scientific research that proceeds from ideas to data gathering to analysis to answering questions and finally to asking new questions based on what was learned. The process is rather unending, which is appealing to some, but it boils down to a back and forth between ideas and data.
When I was in graduate school, a prominent professor at Berkeley, David Wake, offered me this wisdom: “Ideas are cheap; data are precious.” Being erudite and proper, he used the correct verb form for a plural—data are. . ., a datum is. . . . But it wasn’t his proper grammar that left an impression on me. It was the profundity of that succinct statement. Back then, in the mid 1990s, it was most certainly true, and in some ways it is true still.
Without data we can never get past ideas. We cannot move on to questions about facts because the facts never present themselves. Data are precious because, through analysis, we come to understand more about the world we live in. When we understand pattern through analysis, we open the flood gates for new ideas and the process renews itself. Look around you. Nearly everything you see has come into being through this interplay between ideas and data.
What makes something precious is its rarity and, to be sure, back in the 1990s, before computers and advanced technologies were used to full advantage in science, data were indeed somewhat rare, at least in the biological sciences. But a new dawn has arrived and data are no longer so hard to come by in many areas of science. In fact, we are overrun with certain types of data, such as genetic sequence data, climate data, brain imaging data, histological section archives, psychological profiles, anthropological data, demographic information, radiotelescope data, and so many others that I dare not attempt to list them all.
Massive databases are used to store terabytes of unanalyzed data and a relatively new discipline—informatics—has emerged in science to deal with the flood of raw data. The discipline emphasizes the study of methods for storing, processing, and communicating about information, and naturally it focuses on computer technology as its primary means of implementing its aims. Methods that streamline the discovery process are prized and they generally involve sophisticated algorithms that extract patterns from data sets that could take a lifetime to parse manually.
I can’t recall as a graduate student ever hearing the term informatics, but now it is omnipresent. Different areas of science use a modifier or compound word, such as medical informatics or bioinformatics, to distinguish between different foci, but there is no escaping the term in its various forms. Numerous job postings seek warm bodies to staff the many nascent university programs in informatics, and while data might still be precious, ideas about how to deal with large data sets are in much shorter supply. If the trend continues, I predict that some learned professor somewhere will offer a student the opposite sentiment offered me by Dr. Wake: “Data are cheap; ideas are precious.”
The problem is more serious than one might think, and it is not limited to scientific data; it encompasses all of human knowledge. How can we efficiently retrieve needed information in scientific journals and scholarly books? Certainly the internet and powerful search engines are a giant leap forward, but if you have done any research recently, you know how long it takes to adequately assimilate just the main literature on a topic, let alone the rest of the relevant information. Review articles were once considered second class publications, but now there are numerous journals that publish only reviews, and these journals are popping up in nearly every field of science. One day, I predict, review articles will be more highly esteemed than primary literature, for assimilating a topic will take months or years without them.
We need ideas about handling data more than ever and we need innovative ways to synthesize massive amounts of information. While we have become increasingly sophisticated at gathering data, our methods for synthesis are essentially the same as those of Galileo and Newton—we read and think. Of course, we have more to read and think about, but I dare say that no modern human really knows any more than did ancient scholars. Nowadays, we specialize in a defined area and we know more about less. Does anyone else see the dilemma here? Synthesis requires broad thinking and if we know more about less we are somewhat narrow in our perspectives.
One result of this narrowing has been the slow slide toward extinction of a discipline once highly esteemed and populated by such figures as Charles Darwin, Edward O. Wilson, John J. Audubon, and Jane Goodall, to name a few—that discipline is natural history. Some might doubt that naturalists and natural historians are a dying breed, but I assure you that there is so little funding available to those who aspire to work as naturalists that, at least institutionally, the discipline will soon be all but gone.
The problem is that experimental data has come to rule the roost and observation has fallen into disrepute among scientists. To be sure, it is a false logic that drives this bias, but it is no less false to state that ideas are somehow less important than data. Where once we could not move past ideas without data, now we cannot move past data without ideas.
In our search for the holy grail of informatics, we might pause to reflect on how ideas and data complement each other. It will not be data that liberate us from the problems we face in informatics; it will be ideas. We must lift our heads from the fine details, open our eyes, and make some observations about ourselves. Let us be naturalists for long enough to discover how we might turn a deluge of data into crystalline human understanding. To some degree, informaticians are thinking like naturalists, but we have a long way to go and not nearly enough broad thinking to help us navigate the data-strewn landscape.
(1 Vote)

written by Soapy Dishwater, January 08, 2009





***********************************
***********************************