Project Libera
Project: Libera is an intelligent user-content connector that I designed for my passion project during my 3 months at Metis’ Data Science Bootcamp in Chicago. Powered by a custom designed webcrawler and the magic of machine learning, Project: Libera is a tool for automatically seeking out new web content related to Data Science.
Why Libera?
As an aspiring data scientist I’m constantly trying to find new articles/blog posts/information to give me a leg up on the competition, but the pay off between time spent looking for articles and actual quality articles read is sporadic. Some days you hit gold. Some days you spend 4 hours looking at videos of munchkin cats doing cute things forgetting why you’re on the internet in the first place.
Libera was designed to eliminate the risk of distraction and also to expand my blog collection out of my own social network and list of sites that I already view. Because isn’t that the real beauty of machine learning? Making a computer do something for you, faster than you, and potentially even better than you. But you can’t train a machine without data. So I went to my old data science haunts and collected a good amount of blog posts and then let a blind web crawler hit all of their direct links. These were then hand labelled with a simple flask app I made for quick data editing and before I knew it I had a web cralwer that used a Naive Bayes classification model to determine if a blog was related to data science.
Taking it Further
While having a web crawler that could get me new content started to address the time suck that is web browsing, I quickly realized it wasn’t nearly enough. What if I’m on a data viz kick? Or maybe I’m trying to buckle down on big data. Perhaps I just need some statastiscs. Just having a collection of blogs about data science didn’t work if I was looking for something more specific. To address this, I took my blog text corpus and used Natural Language Processing, dimensionality reduction, and unsupervised learning techniques to automatically identify subtopics. With automated collection and subtopicing I set out to make a front end Flask based web app to tie it all together.
A Look at the Front End
With a large collection of posts and segmentation into subtopics complete, I turned to make Libera more user friendly. Searching out web pages in a MongoDB isn’t exactly the best UX for my purposes. So I turned to Flask and made a landing page, user sign up form, and recommendation feed. When a user signs up for recommendations, they are given an interest form to select what exactly it is they’re looking for:
Once they’re selected their interests they’re brought to a feed of recommendations. These recommendations are presented using Embedly cards to give snapshots of the content and the icons on the left allow for user interaction.