At Predict Effect we are interested in exploring novel algorithmic solutions for audience acquisition and monetization. One of the fundemental problems we are trying to solve is to build a robust collective intelligence graph that captures all users activity (topics/interests/sentiment e.t.c). An approach we have applied with success is embedding internet activities from different sources in a latent representation space. Imagine you have a map where you can insert a pin for every document, website, user, internet activity, etc. such that two pins are closer in the map if they are related in some way. E.g., pins corresponding to Hillary Clinton and Ted Cruz will be close to each other since both of them are running for the next presidential election. Similarly, both of these pins will be close to pins corresponding to news articles covering election news.
Having created this map using our embedding algorithm, we can find similar documents, users, actions, etc. for a given tweet, facebook post, amazon review, user, etc. This allows us to place relevant ads, suggest new items to users, create personalized content and much more. For example, we use this embedding algorithm to generate facebook interests for targeting. These interests have contextual content which reach a new, relevant audience similar to the current users.
Naturally, to do all of this at scale requires processing a large volume of data. We have a smart and experienced team of engineers and scientists to do so. We handle around 300 - 400 million documents (and other internet phenomenon) every day. To handle this volume in a practical and timely manner, we tailor our machine learning algorithms.
Our team works in the area of topic modeling, semantics, compositional models, embedding and metric learning algorithms. The following are some problems in this area on which our team works:
Topic Modeling Representing documents in our map by topic is an effective representation choice. This means that two pins in our map for two news articles will be close if they are about the same topic. e.g., The Fed & Interest Rates.
Semantics We perform semantic parsing and interpretation of documents to understand the meaning of what a document is trying to say. E.g., a tweet “I can’t stand Bieber”, has the keyword “Bieber” but we don’t want to place an ad for Justin Bieber’s music on this tweet.
Compositional ModelsA general internet activity can be thought of as a composition of other sub activities. E.g., a twitter or facebook user can be thought of as a composition of the documents (posts, tweets) that the person authored, the users that he/she follows and comments he/she makes, etc. We use compositional models to embed any arbitrary internet activity.
Metrics Learning Metrics Learning We want to improve our future performance based on past results. We use feedback such as clicks, views and other actions to re-evaluate our metric.
We also plan to expand into sentiment and other user behaviour modeling algorithms to continue improving on performance.