Day 3 of 100 Days of Code

So it was day 3 of 100 Days of Code. I thought I'd try something fun today to break up the sheer amount of Javascript I've been trying to do. I was also encouraged by the progress of my good friend Justin Hennessey, who has had his own very productive 100 days of code.

Machine learning has always been something I've wanted to do more of. In my previous jobs, I've come across machine learning theory and using Amazon SageMaker, but apart from running pre-built tutorials, I haven't really ventured out to build something.

I've been fascinated by AI not just being able to extract information from content (e.g. image recognition, speech recognition, etc.) but also being able to create content. The idea of a model able to generate content is not just interesting but also a little scary. Humans so far have had a monopoly on creativity; how far can AI now go in creating content indiscernible from human input?

So I needed somewhere where I could get plenty of data. Immediately social media came to my mind, sites like Twitter and Facebook have content literally getting generated by the second.

Facebook provides the ability to download your personal data; this gave me an idea. Could I train a model that would generate content which felt like me? This would allow me to outsource my Facebook messenger chats to a bot, freeing time for me to finish Star Wars Jedi: Fallen Order.

My search for an algorithm to use came to GPT-2, which was trained simply to predict the next word in 40GB of Internet text. The website also stated that "due to our concerns about malicious applications of the technology, we are not releasing the trained model", which meant that I would need to train my own model.

Thankfully, I came across someone else who had already attempted this, which saved me a lot of trial and error. Instead, I could focus on learning what the Python code was doing in the Jupyter notebook. Note: there are some minor mistakes in the Python code in the Jupyter notebook from the DevopsStar site. I've uploaded the amended file here.

Problems encountered:

  • My dataset was larger than 5GB, which meant I ran out of space on the default notebook instance configuration. Thankfully, I was able to stop the notebook instance, increase storage and restart.
No alt text provided for this image
  • Downloading a 5.5GB dataset from Facebook and then re-uploading it to S3 took quite a while even with Singapore's internet speeds; I made it faster by spinning up an EC2 instance and did the transfer from there.

The first run of fake conversations created by the model made my wife and I laugh so much we cried. It was literally generating gibberish, but through the nonsense we could recognise our colloquialisms. I hope the quality gets better from here! The model is still training, after a few more runs I should see the conversations get a lot more "real". Here is a fake conversation between my wife and I being generated by the model:

A fake conversation between my wife and I being generated by the model

Tomorrow, I will be creating a front-end using React to call my nonsense conversation API.

To view or add a comment, sign in

More articles by Elgin Lam

Others also viewed

Explore content categories