Publications by Author: Goodman, Alyssa A

2014
Goodman AA, Pepe A, Blocker A, Borgman CL, Cranmer K, Crosas M, Stefano RD, Gil Y, Groth P, Hedstrom M, et al. Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Computational Biology [Internet]. 2014;10 (4) :e1003542. Publisher's VersionAbstract 10simplerules_fromplos_site.pdf
Beaumont CN, Goodman AA, Kendrew S, Williams JP, Simpson R. The Milky Way Project: Leveraging Citizen Science and Machine Learning to Detect Interstellar Bubbles. The Astrophysical Journal Supplement Series [Internet]. 2014;214 :3. Publisher's VersionAbstract

We present Brut, an algorithm to identify bubbles in infrared images of the Galactic midplane. Brut is based on the Random Forest algorithm, and uses bubbles identified by >35,000 citizen scientists from the Milky Way Project to discover the identifying characteristics of bubbles in images from the Spitzer Space Telescope . We demonstrate that Brut's ability to identify bubbles is comparable to expert astronomers. We use Brut to re-assess the bubbles in the Milky Way Project catalog, and find that 10%-30% of the objects in this catalog are non-bubble interlopers. Relative to these interlopers, high-reliability bubbles are more confined to the mid-plane, and display a stronger excess of young stellar objects along and within bubble rims. Furthermore, Brut is able to discover bubbles missed by previous searches—particularly bubbles near bright sources which have low contrast relative to their surroundings. Brut demonstrates the synergies that exist between citizen scientists, professional scientists, and machine learning techniques. In cases where "untrained" citizens can identify patterns that machines cannot detect without training, machine learning algorithms like Brut can use the output of citizen science projects as input training sets, offering tremendous opportunities to speed the pace of scientific discovery. A hybrid model of machine learning combined with crowdsourced training data from citizen scientists can not only classify large quantities of data, but also address the weakness of each approach if deployed alone.

beaumont_2014_arxiv.pdf
2013
Beaumont CN, Offner SSR, Shetty R, Glover SCO, Goodman AA. Quantifying Observational Projection Effects Using Molecular Cloud Simulations. The Astrophysical Journal [Internet]. 2013;777 :173. Publisher's VersionAbstract

The physical properties of molecular clouds are often measured using spectral-line observations, which provide the only probes of the clouds' velocity structure. It is hard, though, to assess whether and to what extent intensity features in position-position-velocity (PPV) space correspond to "real" density structures in position-position-position (PPP) space. In this paper, we create synthetic molecular cloud spectral-line maps of simulated molecular clouds, and present a new technique for measuring the reality of individual PPV structures. Using a dendrogram algorithm, we identify hierarchical structures in both PPP and PPV space. Our procedure projects density structures identified in PPP space into corresponding intensity structures in PPV space and then measures the geometric overlap of the projected structures with structures identified from the synthetic observation. The fractional overlap between a PPP and PPV structure quantifies how well the synthetic observation recovers information about the three-dimensional structure. Applying this machinery to a set of synthetic observations of CO isotopes, we measure how well spectral-line measurements recover mass, size, velocity dispersion, and virial parameter for a simulated star-forming region. By disabling various steps of our analysis, we investigate how much opacity, chemistry, and gravity affect measurements of physical properties extracted from PPV cubes. For the simulations used here, which offer a decent, but not perfect, match to the properties of a star-forming region like Perseus, our results suggest that superposition induces a  40% uncertainty in masses, sizes, and velocity dispersions derived from 13 CO ( J = 1-0). As would be expected, superposition and confusion is worst in regions where the filling factor of emitting material is large. The virial parameter is most affected by superposition, such that estimates of the virial parameter derived from PPV and PPP information typically disagree by a factor of  2. This uncertainty makes it particularly difficult to judge whether gravitational or kinetic energy dominate a given region, since the majority of virial parameter measurements fall within a factor of two of the equipartition level α   2.

beaumont_2013_arxiv.pdf
Sanders NE, Faesi C, Goodman AA. A New Approach to Developing Interactive Software Modules through Graduate Education. arXiv.org. 2013.Abstract
We discuss a set of fifteen new interactive, educational, online software modules developed by Harvard University graduate students to demonstrate various concepts related to astronomy and physics. Their achievement demonstrates that online software tools for education and outreach on specialized topics can be produced while simultaneously fulfilling project-based learning objectives. We describe a set of technologies suitable for module development and present in detail four examples of modules developed by the students. We offer recommendations for incorporating educational software development within a graduate curriculum and conclude by discussing the relevance of this novel approach to new online learning environments like edX.
1308.1908v1.pdf
2012
Goodman AA. Principles of High-Dimensional Data Visualization in Astronomy. Astronomische Nachrichten [Internet]. 2012;333 (5-6) :505-514. Astrobites commentary on this articleAbstract

sets, though, interactive exploratory data visualization can give far more insight than an approach where data processing
and statistical analysis are followed, rather than accompanied, by visualization. This paper attempts to charts a course
toward “linked view” systems, where multiple views of high-dimensional data sets update live as a researcher selects,
highlights, or otherwise manipulates, one of several open views. For example, imagine a researcher looking at a 3D volume
visualization of simulated or observed data, and simultaneously viewing statistical displays of the data set’s properties
(such as an x-y plot of temperature vs. velocity, or a histogram of vorticities). Then, imagine that when the researcher
selects an interesting group of points in any one of these displays, that the same points become a highlighted subset in
all other open displays. Selections can be graphical or algorithmic, and they can be combined, and saved. For tabular
(ASCII) data, this kind of analysis has long been possible, even though it has been under-used in Astronomy. The bigger
issue for Astronomy and several other “high-dimensional” fields is the need systems that allow full integration of images
and data cubes within a linked-view environment. The paper concludes its history and analysis of the present situation
with suggestions that look toward cooperatively-developed open-source modular software as a way to create an evolving,
flexible, high-dimensional, linked-view visualization environment useful in astrophysical research.

heidelberg_ag.pdf
2011
Beaumont CN, Williams JP, Goodman AA. Classifying Structures in the Interstellar Medium with Support Vector Machines: The G16.05-0.57 Supernova Remnant. The Astrophysical Journal [Internet]. 2011;741 :14. Publisher's VersionAbstract

We apply Support Vector Machines (SVMs)—a machine learning algorithm—to the task of classifying structures in the interstellar medium (ISM). As a case study, we present a position-position-velocity (PPV) data cube of 12 CO J = 3-2 emission toward G16.05-0.57, a supernova remnant that lies behind the M17 molecular cloud. Despite the fact that these two objects partially overlap in PPV space, the two structures can easily be distinguished by eye based on their distinct morphologies. The SVM algorithm is able to infer these morphological distinctions, and associate individual pixels with each object at >90% accuracy. This case study suggests that similar techniques may be applicable to classifying other structures in the ISM—a task that has thus far proven difficult to automate.

svm_beaumont_2011.pdf
Goodman AA. A Guide to Comparisons of Star Formation Simulations with Observations. Computational Star Formation [Internet]. 2011. Publisher's VersionAbstract

Abstract. We review an approach to observation-theory comparisons we call \Taste-Testing."
In this approach, synthetic observations are made of numerical simulations, and then both real
and synthetic observations are \tasted" (compared) using a variety of statistical tests. We rst
lay out arguments for bringing theory to observational space rather than observations to theory
space. Next, we explain that generating synthetic observations is only a step along the way to
the quantitative, statistical, taste tests that oer the most insight. We oer a set of examples
focused on polarimetry, scattering and emission by dust, and spectral-line mapping in starforming
regions. We conclude with a discussion of the connection between statistical tests used
to date and the physics we seek to understand. In particular, we suggest that the \lognormal"
nature of molecular clouds can be created by the interaction of many random processes, as can
the lognormal nature of the IMF, so that the fact that both the \Clump Mass Function" (CMF)
and IMF appear lognormal does not necessarily imply a direct relationship between them.

1107.2827v1.pdf
2009
Goodman AA, Wong C. Bringing the Night Sky Closer: Discoveries in the Data Deluge. In: The Fourth Paradigm: Data-Intensive Scientific Discovery. ; 2009. Publisher's VersionAbstract
Throughout history, astronomers have been accustomed to data falling from the sky. But our relatively newfound ability to store the sky's data in "clouds" offers us fascinating new ways to access, distribute, use, and analyze data, both in research and in education. Here we consider three interrelated questions: (1) What trends have we seen, and will soon see, in the growth of image and data collection from telescopes? (2) How might we address the growing challenge of finding the proverbial needle in the haystack of this data to facilitate scientific discovery? (3) What visualization and analytic opportunities does the future hold?
Goodman AA. Seeing Science. Proceedings of the International Festival of Scientific Visualization [Internet]. 2009. Publisher's VersionAbstract
The ability to represent scientific data and concepts visually is becoming increasingly important due to the unprecedented exponential growth of computational power during the present digital age. The data sets and simulations scientists in all fields can now create are literally thousands of times as large as those created just 20 years ago. Historically successful methods for data visualization can, and should, be applied to today's huge data sets, but new approaches, also enabled by technology, are needed as well. Increasingly, "modular craftsmanship" will be applied, as relevant functionality from the graphically and technically best tools for a job are combined as-needed, without low-level programming.