At Blendle, we have have a large and constant stream of all sorts of data flowing in: new articles, new recommendations from our editors, new users, traces of users interacting with the platform. And that’s not even all of it. We have more than 1 million users, over 4 million news items and close to a billion events.
Work at Blendle is fast-paced. Just getting coffee and playing Candy Crush isn’t part of the deal: when you’re interning with us, we want you to work with our real data and experiment on our actual users. You’re part of our team. To make that work, we need you to work independently and ask for help when you need it. To secure a happily-ever-after match for all of us, we have an interview process for each of our internships.
Below you find some project ideas, of course other projects are possible as well. We are mainly interested in bachelor or master level graduation projects, with a duration of >6 weeks to a couple of months. If you are interested please send an email to firstname.lastname@example.org.
Computer Vision: ‘Share from the newspaper’
Taking a picture of a physical article in the newspaper and getting the Blendle link to it (which can then be shared). That’s the goal. This project involves computer vision techniques, possibly some OCR, information retrieval and maybe even a bit of app building with our iOS and Android teams.
Computer Vision: Find Replacement Images
Due to copyright restrictions, we cannot always show the original image from an article on Blendle. Using computer vision, we can automatically find public domain images that closely resemble the original picture. Currently, our editorial team selects new public domain pictures for selected articles. We’d love you to help us create an automated process to make this (pre)selection. This project provides great training material for an automated approach.
Computer Vision: Obituaries
Obituaries are among the most viewed items in historical newspapers; a very large portion of views in the newspaper archives of the Dutch Royal Library are for these items. Within Blendle, obituaries are currently not viewable beyond the full newspaper overviews we provide in our kiosk. The reason is that obituaries are provided to us as (non-vector) images. This project looks into OCR-ing and indexing obituaries as well as birth notices and other relatively structured and often occurring non-textual items in our newspapers. Some experience with computer vision and processing large amounts of data is required.
Computer Vision: Sudoku
Our content contains many sudoku’s and other games. Currently, these can’t be interacted with as they are merely images. It would be awesome to recognize and interpret these games, starting with sudokus, so that we can create interactive versions out of them. This project involves computer vision techniques, data modeling and maybe even some user interaction design.
Machine Learning: Editorial Pick Prediction
Predict which articles will be picked by our editorial team. Perhaps also predict the number of reads an article will have. A classical machine learning project with a lot of real and noisy data, and with large potential impact.
Recommendations: Blinder™, a Tinder-like app for reading articles
Provide Blendle users with a very direct feedback mechanism: swipe right for articles you like, left for articles you hate. Using a bandit algorithm, a recommendation model can be trained real time so that the reward for the user’s effort is instant. This project requires some knowledge of reinforcement learning techniques.
Recommendations: Provide explanations of why we recommend particular items to particular users
When properly explained, a user might accept a recommendation more easily and provide more detailed feedback to tweak the recommendation algorithm. In this project, you will work closely with our recommendation team to understand how recommendations are made and translate this into an explanation we can show to our users. Such an explanation could be: ‘because you often read from X’ or `because you’ve read 63% of the author’s previous articles.
Recommendations/NLP: Get to know our users through Twitter
When our users sign up, we know nothing about them. But they can link Facebook and Twitter. Your task would to build a profile based on what users say on Twitter by scraping their profiles and analyzing their tweets. Analysing what they tweet, which links do they share and who they follow.
NLP: ‘Here’s the Blendle link’ Twitter bot
Actively crawl Twitter for articles that we offer in Blendle. We can map links from publishers to the Blendle articles or analyze the content of a tweet for a reference. We could even use the techniques from ‘Share from the newspaper’ project to find pictures of articles.
NLP: Reactions on Twitter
On the Blendle platform, users do not have an option to post comments to our articles. But users do respond on other platforms. For instance on Twitter. And not just to the links we post, but also to links shared by others such as the publishers. Taking our own shares on Twitter as a starting point, but potentially continuing with the output of the ‘Here’s the Blendle link’ project, this project would gather responses on Twitter. These responses should then aggregate them into our item profiles.
NLP: ‘Author pages’
Everything by and about an author on a single page, this could be their home on Blendle and the first stop for fans. This project is about understanding what information we have about authors and how authors would use such a page.
NLP: AI Editors
Our poor editors wake up at 5am and have about 2 hours to select the best articles of the day to recommend in our newsletter. In this project, you get to work with our editorial team to understand how they curate Blendle content and develop ways to make their life easier. A first exploratory analysis of building classifiers for their activities shows that we can support, but certainly not replace them. An automated approach could be used for preselection of articles for the editors or as complementary automated recommendations.
NLP: Clustering articles
We receive articles from hundreds of sources, and often they describe the same real-life events, but from different perspectives. In this project, you would develop an approach to automatically cluster these articles. These clusters can then be used by editors to select the best article on a topic and by our users to find alternative perspectives on a topic. Ambitious students could work on automatically extracting different viewpoints and clustering articles across languages.
In this project your focus is on the robust enrichment of articles with Wikidata entities. Several sub-problems can be investigated, e.g. detection of new (potentially popular) entities, inferring an entity hierarchy, semantic expansion of entities, etc.