For DWD final, I made a web-based book named the complete truth of life.
The complete of truth does not exist. However, people have never stopped discussing and looking for answers. The book seeks to converge and curate wits from the internet that will eventually outlive us. This is a book that can only exist in the digital realm.
I used Cheerio.js to scrape data from Quora, Reddit, and Yahoo Answers.
Some websites such as Quora has “scroll down the page to load” function, which makes it difficult to do web-scraping. For example, say there are 700 answers under one question, but I was only able to scrape the first 7 answers because the full HTML doesn’t load until you scroll down the page. I was able to get around this problem by saving the html to local and scrape data from the local file.
Save data to MongoDB
Serve the website
The website was served using Express and ejs Template.
Front-end of the website was written with HTML/CSS/jquery and turn.js.
Search Key Words
I created search function with query operators and regular expression.
- I wanted to make this live, meaning when people search a keyword, they get live-scraped “life advice” from the answer forum.
- As I worked on this project, I found the analogy between a real book and internet forum very interesting. Writers write books that would influence other people even after they die, and in the age of the internet, we all leave words and traces online that will eventually outlive ourselves. I would love to make series of books on different topics from internet data.