University of Cambridge > > Women@CL Events > women@CL talklet -- NetOS, Security, and NLP group

women@CL talklet -- NetOS, Security, and NLP group

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Helen Yannakoudakis.

Speaker: Desislava Hristova (NetOS)

Title: Measuring Urban Social Diversity Using Interconnected Geo-Social Networks

Abstract: Large metropolitan cities bring together diverse individuals, creating opportunities for cultural and intellectual exchanges, which can ultimately lead to social and economic enrichment. I will present a novel network perspective on the interconnected nature of people and places, allowing us to capture the social diversity of urban locations through the social network and mobility patterns of their visitors. Using a dataset of approximately 37K users and 42K venues in London, we build a network of Foursquare places and the parallel Twitter social network of visitors through check-ins. I will describe four metrics of the social diversity of places which relate to their social brokerage role, their entropy, the homogeneity of their visitors and the amount of serendipitous encounters they are able to induce. This allows us to distinguish between places that bring together strangers versus those which tend to bring together friends, as well as places that attract diverse individuals as opposed to those which attract regulars. We correlate these properties with wellbeing indicators for London neighbourhoods and discover signals of gentrification in deprived areas with high entropy and brokerage, where an influx of more affluent and diverse visitors points to an overall improvement of their rank according to the UK Index of Multiple Deprivation for the area over the five-year census period.

Speaker: Sheharbano Khattak (Security)

Title: Do You See What I See? Differential Treatment of Anonymous Users

Abstract: The utility of anonymous communication is undermined by a growing number of websites treating users of such services in a degraded fashion. The second-class treatment of anonymous users ranges from outright rejection to limiting their access to a subset of the service’s functionality or imposing hurdles such as CAPTCHA -solving. To date, the observation of such practices has relied upon anecdotal reports catalogued by frustrated anonymity users. We present a study to methodically enumerate and characterize, in the context of Tor, the treatment of anonymous users as second-class Web citizens.

We focus on first-line blocking: at the transport layer, through reset or dropped connections; and at the application layer, through explicit blocks served from website home pages. Our study draws upon several data sources: comparisons of Internet-wide port scans from Tor exit nodes versus from control hosts; scans of the home pages of top-1,000 Alexa websites through every Tor exit; and analysis of nearly a year of historic HTTP crawls from Tor network and control hosts. We develop a methodology to distinguish censorship events from incidental failures such as those caused by packet loss or network outages, and incorporate consideration of the endemic churn in web-accessible services over both time and geographic diversity. We find clear evidence of Tor blocking on the Web, including 3.5% of the top-1,000 Alexa sites. Some blocks specifically target Tor, while others result from fate-sharing when abuse-based automated blockers trigger due to misbehaving web sessions sharing the same exit node.

Speaker: Menglin Xia (NLP)

Title: Text Readability Assessment for Second Language Learners

Abstract: Developing reading ability is an essential part of language acquisition. However, finding proper reading materials for training language learners at a specific level of proficiency is a demanding and time-consuming task for English instructors as well as the readers themselves. To automate the process of reading material selection and the assessment of reading ability for non-native learners, a system that focuses on text readability analysis for second language (L2) learners can be developed.

One of the major challenges in the task of readability assessment for the texts aimed at L2 learners is the lack of significantly sized level-annotated data. For the present work, we collected a dataset of CEFR -graded texts tailored for learners of English as an L2 and investigated text readability assessment for both native and L2 learners. We applied a generalization method to adapt models trained on larger native corpora to estimate text readability for learners, and explored domain adaptation and self-learning techniques to make use of the native data to improve system performance on the limited L2 data.

This talk is part of the Women@CL Events series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2019, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity