(This post was originally published on the Wolfram Blog.)
The creation of large data repositories has been a key historical indicator of social and intellectual development—and indeed perhaps one of the defining characteristics of the whole progress of civilization.
And through our work on Wolfram|Alpha—with its insatiable appetite for systematic data—we have gained a uniquely broad view of the many great data repositories that exist in the world today.
Some of these repositories are maintained by national or international agencies, some by companies and other organizations, and some by individuals. A few of the repositories are quite new, but many date back 40 or more years, and some well over a century. But there is one thing in common across essentially every great data repository: a core of diligent and committed people who have carefully shepherded its development.
Curiously, though, few of these people have ever met their counterparts in other domains of data. And in our work on Wolfram|Alpha we are almost certainly the first group ever to have had the pleasure of getting to know such a broad range of leaders of great data repositories.
And one of the things that we have discovered is that there is much in common in both the methods used and the issues faced by these data repositories. So as part of our contribution to the worldwide data community we have decided to sponsor a data summit to bring together for the first time the leaders of today’s great data repositories.
The Wolfram Data Summit 2010 will be held in Washington, DC on September 9–10.
We have invited leaders of data repositories in all areas—socioeconomic, scientific, financial, medical, geographic, commercial, lexicographic, cultural, biographical, mathematical, and others. And we already know that many data repositories will be represented, including for example the BBC, Bowker, CABI, CDC, comScore, CRC, DataONE, Encyclopedia of Life, FBI, Federal Reserve Bank, Gale, IMF, Internet Archive, Moody’s, NASA, NCBI, NIST, NREL, NSF, U.S. Office of Management and Budget, Open Library, OpenStreetMap, ProQuest, Protein Data Bank, Smithsonian Institution, Sunlight Foundation, Thomson Reuters, UNESCO, UNICEF, US Census, US Department of Transportation, US Department of Education, World Bank, and World Conservation Monitoring Centre—as well as many others.
There is quite a lot to discuss at the Data Summit. Experiences and best practices in data curation. How data should be combined, validated, and standardized. How things from automated sensors to crowdsourcing affect data collection. How governmental and organizational data policies are and should be evolving. What can be done with data that is not yet in digital form. How privacy and commercial issues affect data dissemination. And much, much more.
This is a unique time in the history of data: as scientific and analytical methods become more and more prominent and successful in the world at large, so larger and larger numbers of important decisions are being made on the basis of data, by both organizations and individuals. And as computers, the web, and now mobile devices have become ubiquitous, data can be disseminated vastly more widely than ever before.
It is a difficult matter, though, to do this in a way that is immediately useful to a broad range of people. And that is part of what we are trying to achieve by making knowledge—and data—computable in Wolfram|Alpha.
And in fact, in doing this, we see something else too: that if data can be made uniformly computable, it routinely becomes possible to derive completely new facts and knowledge by combining very different kinds of data—thereby generating vastly more value than could be obtained from any data repository on its own.
It is truly impressive how much data has been carefully collected and organized over the course of many years in the world’s great data repositories. And today this data is poised to become dramatically more relevant and significant in the daily lives of people around the world.
Our hope is that our Data Summit this September will help highlight the great achievements of the worldwide data community to date, and will serve as a catalyst in the next phase of the community’s development.
I myself have been a lifelong enthusiast of systematic data—as well as being directly responsible over the course of several decades for the collection of large amounts of mathematical and computational data. For me, the great data repositories are wonders of the modern world—pure yet tangible instantiations of what our civilization has achieved in many different areas.
And I look forward to the progress that we can make with our Data Summit this September—as well as to hearing all those fascinating tales from the front lines of the world of data.