• Graduate Program
    • Why study Business Data Science?
    • Program Outline
    • Courses
    • Course Registration
    • Admissions
    • Facilities
  • Research
  • News
  • Summer School
    • Deep Learning
    • Machine Learning for Business
    • Tinbergen Institute Summer School Program
    • Receive updates
  • Events
    • Events Calendar
    • Events archive
    • Summer school
      • Deep Learning
      • Machine Learning for Business
      • Tinbergen Institute Summer School Program
      • Receive updates
    • Conference: Consumer Search and Markets
    • Tinbergen Institute Lectures
    • Annual Tinbergen Institute Conference archive
  • Alumni
Home | Events Archive | Predicting Personality Scores from Parliamentary Speeches
Research Master Defense

Predicting Personality Scores from Parliamentary Speeches


  • Series
    Research Master Defense
  • Speakers
    Paul Stroet , Paul Stroet
  • Field
    Data Science
  • Location
    Online
  • Date and time

    August 30, 2021
    09:00 - 10:00

This paper shows a novel method for predicting personality scores for political elites which circumvents survey-based measurements, but instead allows the use of text-based measurement of personality traits. The novelty lays in that it machine encodes features from texts by means of LDA, rather than relying on manual encoded features such as LIWC, MRC and prosodic features. The feature engineered variables extracted by LDA provide linguistic cues of personality traits and encapsulate the various political portfolios of the different MPs, thereby accounting for variation in text as exerted in different political domains. Next, the challenges of the current predictive modeling approaches are overcome by developing a predictive model which allows for the automatic detection of interaction effects, and the utilization of more flexibility to accommodate the complex structure that comes along with text data. The current most successful approach in predicting personality scores, support vector machines, serve as benchmark in assessing the performance of predictive models which are believed to have superior statistical properties. Indeed, random forests and neural networks can more accurately predict personality scores when fed with the machine encoded input data, than support vector machines fed with the LIWC features. The data is comprised of personality scores from Belgian MPs, and the text data is web-scraped from parliamentary speeches.

Keywords: Automated Content Analysis; Topic Modeling; Web-Scraping