Building and using the Finnish Internet Parsebank
- đ¤ Speaker: Filip Ginter (University of Turku)
- đ Date & Time: Thursday 20 October 2016, 11:00 - 12:00
- đ Venue: GR05, English Faculty, 9 West Road (Sidgwick Site)
Abstract
The Finnish Internet Parsebank is a corpus of 270M Finnish sentences of Internet crawl data, syntactically analysed in the Universal Dependencies representation. I will present the parsebank, some of the lessons learned when crawling and analysing the data, the tools and derived resources we developed, and some of the uses the parsebank has seen. In particular, I will focus on the syntax query tools which can efficiently handle a corpus of over 4 billion tokens of syntactically analysed data. I will also mention some future directions aiming at a similar parsebank for the majority of the languages in Universal Dependencies.
Series This talk is part of the Language Technology Lab Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- GR05, English Faculty, 9 West Road (Sidgwick Site)
- Guy Emerson's list
- Interested Talks
- Language Sciences for Graduate Students
- Language Technology Lab Seminars
- ndk22's list
- ob366-ai4er
- rp587
- Simon Baker's List
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Thursday 20 October 2016, 11:00-12:00