Exploring High Dimensional Complex Data For Hidden Structure

Wednesday, January 16, 2013 - 3:00pm

Marvin Weinstein, SLAC Theoretical Physics 

Advances in fields such as physics, astronomy, biology, epidemiology, medicine and seismology lead to – and indeed, require – experiments that produce massive amounts of data. Concurrent advances in computer technology make it possible to record, warehouse and access data on a scale that would have been inconceivable even a few short years ago. Unfortunately, despite these remarkable advances, there remains a dearth of tools that can be used to explore this vast ocean of data for unexpected treasures. This state of affairs is encapsulated by the frequently voiced lament “We are drowning in data but thirsting for information”.

Dynamic Quantum Clustering (DQC) is a powerful tool designed to meet the challenge of finding hidden structure in massive datasets. It is an exploratory tool, unlike familiar statistical methods that seek to establish if a dataset supports a pre-existing hypothesis. At its heart, DQC is a density based clustering method founded upon ideas drawn from quantum mechanics. My talk will give an introduction to the method and then will focus on the application of DQC to data obtained at the TXM-XANES microscope at the Stanford Synchrotron Radiation Laboratory. The goal was to study the chemical phases (i.e. the various oxidation states of iron) present on a sample from a piece of Roman pottery.  I will show how DQC handles ~700,000 noisy spectra (with 148 energies points) and produces an unbiased and extremely sensitive result.  I will also report on preliminary results in the analysis of a dataset from a LCLS pump - probe experiment.

Exploring High Dimensional Complex Data For Hidden Structure
Find Stanford Synchrotron Radiation Lightsource on FlickrFind Stanford Synchrotron Radiation Lightsource on YouTubeFind Stanford Synchrotron Radiation Lightsource on Twitter