Fountain of codes
I recently stumbled upon a very interesting, possibly the most interesting algorithm in the field of coding theory – Fountain codes.
As the name suggests, it does not involve making a fountain. But technically it does. I mean, not a real fountain but a programmatic fountain. So cool!
The problem statement that let to this idea was,
Is it possible to encode a file into N chunks such that no matter what subset of the chunks you chose, you can reconstruct the whole file.
*wait for it* and yeah that’s how some nasty scientists came up with fountain codes. What a name! What a sense of humor!
I won’t go deep into the details of its algorithm, but for reference you can check out the simplest fountain code – Luby transform code
- In a quick rap, it splits the file into N chunks.
- Picks up D chunks at random using a good distribution function.
- XOR’s the shit out of the D chunks and creates a packet.
- Hooks up the indices of the chunk to the packet that were XOR’ed and
- dispatch!
- This keeps happening as long as the receiver doesn’t raise a message saying “Bingo, got the file”.
On the receiver end, there are two queues. I call it the decoded chunk queue and In progress message queue. Decoded chunk queue has all the chunks that are completely decoded. So if the length of the decoded chunk queue is N, that’s your termination condition! Send “Bingo, got the file” to the sender.
- As long as the length of the decoded chunk is not N,
- Keep reading a new packet.
- If the packet is made of only one chunk, put it in the decoded chunk queue
- If it is made out of more than one chunk, then see if any existing decoded blocks have been used to create this packet. If yes, then XOR it with the packet. Remove the index from the chunk index list that was hooked up while encoding. If the list length is one, you have decoded another chunk. Place it in the decoded chunk queue, otherwise put it in the In progress message queue.
- Whenever a new chunk is found, iterate through the In progress message queue and XOR the new chunk with pending messages wherever possible. If any chunk is successfully decoded in this step, put it in the decoded chunk queue too.
*heavy breathing, pause please*
One might argue, hey is this even complete? Will it ever end? In a perfectly ideal world, yes. But our world is so not ideal. So we need to make use of a good random function for choosing the D random chunks. One that was suggested was the ideal soliton distribtion.
I wondered what applications it might have. Here are few,
- File transfer using super fast unreliable UDP. Since receiver do not have to acknowledge each packet, we can use the speed of UDP for file transfers. In fact this algorithm was made in order to send data through connections that are very lossy.
- This one is a copy. BitTorrents can heavily use this algorithm to reconstruct sparsely seeded chunk of a file.
Computer science is awesome!