Publications by Author: Alberto Pepe

Goodman AA, Pepe A, Blocker A, Borgman CL, Cranmer K, Crosas M, Stefano RD, Gil Y, Groth P, Hedstrom M, et al. Ten Simple Rules for the Care and Feeding of Scientific Data. PLoS Computational Biology [Internet]. 2014;10 (4) :e1003542. Publisher's VersionAbstract 10simplerules_fromplos_site.pdf
Pepe A, Goodman A, Muench A, Crosas M, Erdmann C. How Do Astronomers Share Data? Reliability and Persistence of Datasets Linked in AAS Publications and a Qualitative Study of Data Practices among US Astronomers. PLoS ONE [Internet]. 2014;9 (8) :e104798. Publisher's VersionAbstract

We analyze data sharing practices of astronomers over the past fifteen years. An analysis of URL links embedded in papers published by the American Astronomical Society reveals that the total number of links included in the literature rose dramatically from 1997 until 2005, when it leveled off at around 1500 per year. The analysis also shows that the availability of linked material decays with time: in 2011, 44% of links published a decade earlier, in 2001, were broken. A rough analysis of link types reveals that links to data hosted on astronomers' personal websites become unreachable much faster than links to datasets on curated institutional sites. To gauge astronomers' current data sharing practices and preferences further, we performed in-depth interviews with 12 scientists and online surveys with 173 scientists, all at a large astrophysical research institute in the United States: the Harvard-Smithsonian Center for Astrophysics, in Cambridge, MA. Both the in-depth interviews and the online survey indicate that, in principle, there is no philosophical objection to data-sharing among astronomers at this institution. Key reasons that more data are not presently shared more efficiently in astronomy include: the difficulty of sharing large data sets; over reliance on non-robust, non-reproducible mechanisms for sharing data (e.g. emailing it); unfamiliarity with options that make data-sharing easier (faster) and/or more robust; and, lastly, a sense that other researchers would not want the data to be shared. We conclude with a short discussion of a new effort to implement an easy-to-use, robust, system for data sharing in astronomy, at, and we analyze the uptake of that system to-date

Pepe A, Goodman A, Muench A. The ADS All-Sky Survey, in Astronomical Data Analysis Software and Systems XX. Paris, France ; 2012. Publisher's VersionAbstract

The ADS All-Sky Survey (ADSASS) is an ongoing effort aimed at turning the NASA Astrophysics Data System (ADS), widely known for its unrivaled value as a literature resource for astronomers, into a data resource. The ADS is not a data repository per se, but it implicitly contains valuable holdings of astronomical data, in the form of images, tables and object references contained within articles. The objective of the ADSASS effort is to extract these data and make them discoverable and available through existing data viewers. The resulting ADSASS data layer promises to greatly enhance workflows and enable new research by tying astronomical literature and data assets into one resource.

Goodman A, Fay J, Muench A, Pepe A, Udomprasert P, Wong C. WorldWide Telescope in Research and Education. In: Egret D, Gabriel C ADASS XXI. San Francisco: Astronomical Society of the Pacific ; 2012. pp. tba. Publisher's VersionAbstract

The WorldWide Telescope computer program, released to researchers
and the public as a free resource in 2008 by Microsoft Research, has changed the way
the ever-growing Universe of online astronomical data is viewed and understood. The
WWT program can be thought of as a scriptable, interactive, richly visual browser of
the multi-wavelength Sky as we see it from Earth, and of the Universe as we would
travel within it. In its web API format, WWT is being used as a service to display professional
research data. In its desktop format, WWT works in concert (thanks to SAMP
and other IVOA standards) with more traditional research applications such as ds9, Aladin
and TOPCAT. The WWT Ambassadors Program (founded in 2009) recruits and
trains astrophysically-literate volunteers (including retirees) who use WWT as a teaching
tool in online, classroom, and informal educational settings. Early quantitative
studies of WWTA indicate that student experiences with WWT enhance science learning
dramatically. Thanks to the wealth of data it can access, and the growing number
of services to which it connects, WWT is now a key linking technology in the Seamless
Astronomy environment we seek to oer researchers, teachers, and students alike.

Rodriguez MA, Pepe A, Shinavier J. The dilated triple. In: Chbeir B, Hassanien A Emergent Web Intelligence: Advanced Semantic Technologies. Springer ; 2010. pp. 3-16. Publisher's VersionAbstract
The basic unit of meaning on the Semantic Web is the RDF statement, or triple, which combines a distinct subject, predicate and object to make a definite assertion about the world. A set of triples constitutes a graph, to which they give a collective meaning. It is upon this simple foundation that the rich, complex knowledge structures of the Semantic Web are built. Yet the very expressiveness of RDF, by inviting comparison with real-world knowledge, highlights a fundamental shortcoming, in that RDF is limited to statements of absolute fact, independent of the context in which a statement is asserted. This is in stark contrast with the thoroughly context-sensitive nature of human thought. The model presented here provides a particularly simple means of contextualizing an RDF triple by associating it with related statements in the same graph. This approach, in combination with a notion of graph similarity, is sufficient to select only those statements from an RDF graph which are subjectively most relevant to the context of the requesting process.
Pepe A, Mayernik MS, Borgman CL, Sompel HVD. From Artifacts to Aggregations: Modeling Scientific Life Cycles on the Semantic Web. Journal of the American Society for Information Science and Technology [Internet]. 2010;61. Publisher's VersionAbstract
In the process of scientific research, many information objects are generated, all of which may remain valuable indefinitely. However, artifacts such as instrument data and associated calibration information may have little value in isolation; their meaning is derived from their relationships to each other. Individual artifacts are best represented as components of a life cycle that is specific to a scientific research domain or project. Current cataloging practices do not describe objects at a sufficient level of granularity nor do they offer the globally persistent identifiers necessary to discover and manage scholarly products with World Wide Web standards. The Open Archives Initiative's Object Reuse and Exchange data model (OAI-ORE) meets these requirements. We demonstrate a conceptual implementation of OAI-ORE to represent the scientific life cycles of embedded networked sensor applications in seismology and environmental sciences. By establishing relationships between publications, data, and contextual research information, we illustrate how to obtain a richer and more realistic view of scientific practices. That view can facilitate new forms of scientific research and learning. Our analysis is framed by studies of scientific practices in a large, multi-disciplinary, multi-university science and engineering research center, the Center for Embedded Networked Sensing (CENS).