vista1.jpg (33993 byte)

The breathbreaking view you may enjoy during
coffee breaks in S.Domenico Lecture Hall

DATA MINING IN CRYSTALLOGRAPHY

the 29th crystallographic course at the "E. Majorana" Centre, Erice, Trapani, Sicily, Italy
and a Summer School sponsored by the TMR Programme, DG XII, EC, Bruxelles
and by the International Union of Crystallography

12 May (arrival) to 20 May (departure)1999

.

Directors : Tom L. Blundell, Cambridge, UK and Suzanne Fortier, Kingston, Ontario, Canada

Scientific Report

Since their inception, crystallographic meetings in Erice have striven to explore frontier topics.

The course on Data Mining was held in parallel with the course on Crystal Engineering. Only few lectures were held jointly.

Crystallography was among the first disciplines to recognize the importance of computer storage of data. Indeed, crystallographic databases were established in the early 70’s and have since grown at a fast past, providing an opportunity to mine the wealth of knowledge they contain. The purpose of the School was to provide an overview of the methodologies currently used for data mining in the crystallographic domain and outline the advances needed for a more effective and efficient exploitation of our databases.

 The first part of the School was devoted to data mining methodologies and tools. It provided a survey of data acquisition and validation and knowledge representation techniques and an introduction to the many paradigms of machine learning. This was followed by a journey through the various stages of data mining, from prospecting to sampling, knowledge extraction and refinement. This section of the program clearly displayed the variety and ingenuity of the approaches developed to mine the databases. A plethora of choices exist to extract knowledge from the databases, from visual and statistical techniques to artificial intelligence approaches. Having such choices is viewed as essential, as it is becoming more and more apparent that no single approach will perform well on every problem. In fact, combining results obtained using different approaches can increase the accuracy and usefulness of the data mining exercise.

 The second part of the School focused on applications of data mining to four specific areas: structure classification and prediction, structure-activity relationships, materials design and biotechnology. Superb examples of these applications were presented in topics ranging from supramolecular chemistry, genomics, inter-molecular interactions and drug design. These presentations clearly demonstrated the incredible progress made in a relatively short period of time and highlighted the many rewards that are obtained when digging deeper and deeper into our databases.

 Some clear messages came through from the presentations. Data mining starts with deposits. This speaks to the absolute necessity for providing incentives to the scientific community to deposit their data. Good results can only be obtained from good data, though, pointing to the importance of cleaning and validating the data. Also, while several methodologies have already been proven useful for mining the crystallographic databases, significant progress will require adopting a common language, common standards and establishing test datasets so as to put in place a more rigorous process for assessing the performance of the various tools. Finally, as databases continue to grow at an increasing rate and as initiatives such as genomics and proteomics progress, mining the databases has become both a necessity and an opportunity.

 The School, in bringing together crystallographers and computer scientists in the beautiful surroundings of Erice, was described as a "first date" and somewhat of a "blind date". The clear wish, expressed by the majority of the participants, for further interactions speaks well of the commitment of the two communities to join forces in extracting the rich scientific knowledge embedded in our databases.

 The photo here below shows the group gathered in S Rocco Court.

Suzanne Fortier Course co-director

Note added by the editor: there were 64 attendees from 17 countries. The social highlight of the meeting was the 25th anniversary celebration, as the Internationaḷ School of Crystallography, at the Ettore Majorana Centre, started in March 1974. A synthetic description of the event, called "La Notte d'Argento", is available.

Posted on 10 Oct 1997 and edited on 4 May and 30 Aug 1998; changed radically during Oct 1998, edited also on March 21, 1999. Report on 24 June 1999. Photos added on 18 Aug and improved on 30 Sept 1999.

maintained by Lodovico Riva di Sanseverino, fax +39 0512094904

   riva@geomin.unibo.it