BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:The Law of Large Populations: The return of the long-ignored N and
  how it can affect our 2020 vision - Xiao-Li Meng\, Whipple V. N. Jones Pr
 ofessor of Statistics\, Harvard University
DTSTART:20180312T170000Z
DTEND:20180312T180000Z
UID:TALK101758@talks.cam.ac.uk
CONTACT:Emily Brown
DESCRIPTION:For over a century now\, we statisticians have successfully co
 nvinced ourselves and almost everyone else\, that in statistical inference
  the size of the population N can be ignored\, especially when it is large
 .  Instead\, we focused on the size of the sample\, n\, the key driving fo
 rce for both the Law of Large Numbers and the Central Limit Theorem. We we
 re thus taught that the statistical error (standard error) goes down with 
 n typically at the rate of 1/√n.   However\, all these rely on the presu
 mption that our data have perfect quality\, in the sense of being equivale
 nt to a probabilistic sample.  A largely overlooked statistical identity\,
  a potential counterpart to the Euler identity in mathematics\, reveals a 
 Law of Large Populations (LLP)\, a law that we should be all afraid of. Th
 at is\, once we lose control over data quality\, the systematic error (bia
 s) in the usual estimators\, relative to the benchmarking standard error f
 rom simple random sampling\, goes up with N at the rate of √N.   The coe
 fficient in front of √N can be viewed as a data defect index\, which is 
 the simple Pearson correlation between the reporting/recording indicator a
 nd the value reported/recorded.  Because of the multiplier √N\, a seemin
 gly tiny correlation\, say\, 0.005\, can have detrimental effect on the qu
 ality of inference.  Without understanding of this LLP\,  “big data” c
 an do more harm than good because of the drastically inflated precision as
 sessment hence a gross overconfidence\, setting us up to be caught by surp
 rise when the reality unfolds\, as we all experienced during the 2016 US p
 residential election. Data from Cooperative Congressional Election Study (
 CCES\, conducted by Stephen Ansolabehere\, Douglas River and others\, and 
 analyzed by Shiro Kuriwaki)\,   are used to estimate the data defect index
  for the 2016 US election\, with the aim to gain a clearer vision for the 
 2020 US election and beyond.
LOCATION:LT4\, Simon Sainsburys Building\, Cambridge Judge Business School
END:VEVENT
END:VCALENDAR
