Natural Language Processing
Course is scheduled in the period May 4- July 15 (day and hour to be announced). Course locations: Zuidas Amsterdam and Erasmus University Rotterdam campus.
Registration for this course
Deadline for registration is April 19, 2020. Go the the registration page. Max. capacity: 25 students.
Natural Language Processing (NLP) comprises statistical and machine learning tools for automatically analysing text data to derive useful insights from it. Vast amounts of information are stored in this form, and hence NLP has become one of the essential technologies of the big data age. In this course, core concepts and techniques from the area will be studied, with a focus on methods that are popular in business applications. These include n-gram models, word vectors, sentiment analysis and topic modelling.
This course offers students a theoretically informed understanding of NLP. It aims at broadening the knowledge of the methods involved in NLP, as well as a hands-on experience with the steps that need to be taken in a NLP project. We focus on three aspects:
- to create deep(er) understanding of the main methods in NLP (n-gram, lexicon approach, word2vec and other advanced machine learning methods);
- to obtain an experience to scrape and clean the data yourself;
- to apply this knowledge and experience in a group assignment which gives you the possibility to show your creativity.
By the end of this course, you will be able to analyse and evaluate NLP approaches. Moreover, you will apply this knowledge and skills in a real-life setting, enabling you to translate and apply theoretical knowledge into practice.
- Information theory, regular expressions and scraping (tokenization, stemming, lemmatization, parsing).
- Word vectors and dimension reduction based on bag of words (n-grams, PCA)
- Sentiment analysis (lexicon-based vs model-based)
- Sentence completion (hidden Markov model, GPT and BERT)
- Topic models (LDA) and word embeddings (GloVe, Word2Vec)
By the end of the course students will be able to:
- understand the fundamentals of natural language processing including different ways of representing text data for statistical analysis,
- discuss and apply different sentiment analysis and topic modelling techniques,
- program selected algorithms involved in these methods, and
- be acquainted with NLP research areas.
Coordinator/Lecturer: prof. dr. Bas Donkers (EUR)
Lecturer: dr. Meike Morren (VU)
Course fee for internal research master and PhD students: € 1.000,-
Course fee for external PhD students: € 1.250,-
Recommended knowledge: Programming Basics, Mathematics, Statistics, Econometrics I, Supervised and Unsupervised Machine Learning
Required knowledge: Linear algebra, Regression, Machine Learning (e.g., classification, random forests, support vector machines)
Link to course manual
Registration for this course
Go to the registration page