I’m not going to copy the Wikipedia article on AI here. I roughly know what it is and I assume you do too.
If you’re into philosophising about AI, that’s cool but this is a practical guide (mainly written for myself but you’re welcome to tag along).
I’m probably going to do some coding, and I’ll use some mathematical formulas along the way. I’ll explain my thought process every step of the way.
Cool. My aim is to be able to gather a LOT of knowledge about this topic in the timespan of one year.
I won’t just rehash theory here. I will try to find interesting datasets and solve real life problems.
Which language should I use?
There are so many: C#, C++, Python, Matlab, Java, Lisp, Erlang,…
Most people on Quora seem to agree it heavily depends on the goal of the project, but generally they seem to prefer Python.
Another person checked which languages were used by the top contestants in the Google AI Challenge. The winners seem to be Java, C++ and Python, in that order.
Python has a lot of cool libraries like Numpy, Scypy and Pybrain. Matlab is proprietary and expensive. C++ is very fast but low-level so slow to write. Java is very general-purpose but lots of boilerplate and a bit clumsy in some aspects such as passing closures. Erlang seems to be good for parallel processes, not so much for computationally expensive tasks.
From everything I’ve read so far, it seems the logical choice would be either Python or C++.
According to this ranking Python is the second most popular language right now.
For now, I’ll stick with Python and check out some libraries. I might switch to C++ later on if needed. I’ve used Matlab a fair bit in my studies so might occasionally use that as well.
What should I read?
I need some basics first. Let’s see. This list of deep learning topics is quite intimidating.
Maybe I should set a goal for myself, otherwise I might get demotivated.
It seems deep learning is a new buzzword, on Google Trends it’s growing very quickly.
According to this Nvidia article, the field of AI is progressing ridiculously fast since 2015.
until recently neural networks were all but shunned by the AI research community. They had been around since the earliest days of AI, and had produced very little in the way of “intelligence.” The problem was even the most basic neural networks were very computationally intensive, it just wasn’t a practical approach. Still, a small heretical research group led by Geoffrey Hinton at the University of Toronto kept at it, finally parallelizing the algorithms for supercomputers to run and proving the concept, but it wasn’t until GPUs were deployed in the effort that the promise was realized.
It seems deep neural networks just use more layers and more processing power and slightly different algorithms:
as the network is getting tuned or “trained” it’s coming up with wrong answers — a lot. What it needs is training. It needs to see hundreds of thousands, even millions of images, until the weightings of the neuron inputs are tuned so precisely that it gets the answer right practically every time — fog or no fog, sun or rain. It’s at that point that the neural network has taught itself what a stop sign looks like; or your mother’s face in the case of Facebook; or a cat, which is what Andrew Ng did in 2012 at Google.
Ng’s breakthrough was to take these neural networks, and essentially make them huge, increase the layers and the neurons, and then run massive amounts of data through the system to train it. In Ng’s case it was images from 10 million YouTube videos. Ng put the “deep” in deep learning, which describes all the layers in these neural networks.
OK. I’ll shift my focus to deep learning, more specifically deep neural networks. It seems deep learning is mostly about neural networks anyway, on the Wikipedia page there’s only one non-neural network algorithm mentioned: multilayer kernel machines.
Here‘s a chart of deep learning software. Almost all of these are in Python or C++ which confirms our earlier suspicions.
This is a cool page with only the best of the best deep learning papers. That’s assuming more citations means better.
Still quite intimidating. I’ll start by reading the Wiki page on Deep Learning.
To keep it simple for now: a feature or attribute simply means a specific input to the algorithm. For example if you’re processing an image – let’s say to classify it as a banana or a non-banana – the features would be all the pixels of the image. If you have a 100×100 pixel image, you’d have an input of 10.000 features (also called 10.000-dimensional) – actually 30.000 because you have 3 color values per pixel but let’s ignore that for now.
From the page on feature learning:
Feature learning is motivated by the fact that machine learning tasks such as classification often require input that is mathematically and computationally convenient to process. However, real-world data such as images, video, and sensor measurement is usually complex, redundant, and highly variable.
In our 10.000 pixel picture of a banana, there are going to be a lot of redundant pixels, which just means you don’t need all of them. In other words, the input space is not very optimal for the task of recognising a banana.
- You don’t need all the pixels, you can probably remove a lot of them
- Pixels lying next to each other are probably related (if a pixel is yellow, there’s a high chance its neighbours will be yellow as well, except on the edges of the banana)
- You probably don’t need the noise (high frequency information) to recognise the banana (for example the brown patches on the banana or small background features), but unfortunately all this info is included in the 10.000 pixel representation
- So you can probably transform the pixels to a completely different coordinate system that’s more suitable for the job. You could for example transform the image to the frequency spectrum using a Fourier transform and then remove the 50% highest frequency dimensions. This would leave you with only 5000 remaining attributes instead of 10.000.
Intermezzo: What’s the Frequency Domain?
This is important to grasp and I will probably use it a lot in the future so I need to explain.
Everything in our universe can be represented in 2 ways. The normal, intuitive way is spatial. This is how we perceive every day life. This can be 1D (sound waves), 2D (images), 3D (real life) or higher dimensional.