A logical approach to data provenance.
- đ¤ Speaker: James Cheney, Informatics, University of Edinburgh
- đ Date & Time: Friday 17 November 2006, 14:00 - 15:00
- đ Venue: FW11
Abstract
While scientific computation historically has been synonymous with large scale numerical computation and supercomputing, scientists are now using increasingly sophisticated computational techniques such as databases and decentralized “Grid” computation. However, scientific data is expected to meet rigorous standards of data integrity, and this is difficult to achieve using current tools. One important ingredient of scientific data integrity is that data should be accompanied by documentation of the process by which it was recorded: for example, creation/modification timestamps, descriptions of any operations performed, and identities of authors and intermediate sources. This information is often called \emph{provenance} or \emph{lineage}.
Although many provenance-tracking systems and provenance data models have been proposed, most of them are based on ad hoc definitions of provenance, some of which depend on the syntax of the program (rather than its semantics). Thus, the behavior of such systems varies widely, and we lack a uniform framework to compare the correctness and expressiveness of various approaches.
We will describe a new approach which is based on the idea that provenance should reflect the ways the output of a function depends on its input; in particular, it should reflect counterfactual information (i.e., tell us something about what would have happened if the input were changed). We formalize this approach by defining a logic whose models are functions and whose formulas are assertions about the dependence behavior of the function. This provides a general approach to defining and reasoning about the correctness of provenance-tracking techniques.
This talk describes joint work with Peter Buneman (U. Edinburgh), Stijn Vansummeren (U. Hasselt, Belgium), and Adriane Chapman (U. Michigan)
Series This talk is part of the Logic and Semantics Seminar (Computer Laboratory) series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge talks
- Computing and Mathematics
- Department of Computer Science and Technology talks and seminars
- FW11
- Interested Talks
- Logic and Semantics Seminar (Computer Laboratory)
- Martin's interesting talks
- School of Technology
- tcw57âs list
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

James Cheney, Informatics, University of Edinburgh
Friday 17 November 2006, 14:00-15:00