A Scalable Approach for Managing Unstructured Information
- đ¤ Speaker: Kim Keeton (HP Palo Alto)
- đ Date & Time: Friday 09 March 2012, 11:00 - 12:00
- đ Venue: SS03, Computer Lab, William Gates Building
Abstract
Digital data is being generated in mind-boggling amounts: 15 petabytes—more than 8X the information contained in all US libraries—is created daily. The data landscape is shifting—in addition to structured data in databases, organizations are increasingly dealing with unstructured data such as email, documents, spreadsheets, blogs, Web pages and media files. Unstructured information comprises 80% of most organizations’ information today, and it is growing at an annual rate of 60%. Users are demanding increasing sophistication in the level of information processing that storage and information management systems provide. In addition to the traditional challenges of storing the bytes and searching and classifying the content, they need to leverage their information to provide relevant and timely insights that improve the outcomes of the tasks that they undertake.
In this talk, I will describe recent work at HP Labs on unstructured information management, including SCAN -lite, an extensible framework for gathering structured metadata from unstructured documents, and LazyBase, a scalable database system for ingesting, storing and querying the resulting metadata. Leveraging the high degree of replication present in the enterprise, SCAN -lite uses a two-phase scanning policy (e.g., an initial phase to identify duplicate content and a second phase to do more complicated analysis) that considers client priority classes and idle time to minimize the impact on client foreground workloads. LazyBase is a scalable NoSQL database system that provides extremely high ingest rates, a strong consistency model (as contrasted with eventual consistency), and an explicit per-query tradeoff between freshness and query speed.
Bio: Dr. Kimberly Keeton is a Principal Researcher in the Storage and Information Management Platform group at HP Labs in Palo Alto, CA, USA . Her research focuses on simplifying the management of enterprise information systems, including system design and implementation, modeling, and optimization techniques to automatically design systems to meet users’ (e.g., dependability or information quality) goals.
Series This talk is part of the Computer Laboratory Systems Research Group Seminar series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- CL's SRG seminar
- Computer Laboratory Systems Research Group Seminar
- Department of Computer Science and Technology talks and seminars
- Interested Talks
- ndk22's list
- ob366-ai4er
- rp587
- School of Technology
- SS03, Computer Lab, William Gates Building
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Kim Keeton (HP Palo Alto)
Friday 09 March 2012, 11:00-12:00