Programming challenges at PDF Butler
Here are the few challenges one can face at PDF Butler, two of them were from the past week.
Challenge 1
You are given rectangles that are axis aligned as a tuple of integers (left, top, right, bottom).
Your objective is to tell if the two rectangles are crossing each other. To cross each other, two rectangles have to cross each other sides. Two coincident sides are not crossing.
Here is the situation with 4 crossings (in yellow).
One idea is to check if segments of red rectangle intercept segment of brown rectangle. Because we are working with a discrete plane and axis aligned rectangles, you could solve this by using intervals.
Given two lines, you can compare relative positions as such that:
Then you swap the pair for the other test. It's quite messy and not intuitive:
(b.left < r.left < b.right || b.left < r.right < b.right) && (r.top > b.top > r.bottom || r.top > b.bottom > r.bottom)
A simple trick is understanding you don't need to even find the intersection. So, there should be an easier way.
You can partition the space of relative position of two rectangles and handle 2 of the three cases. There are indeed two much easier cases to deal with.
One is finding out if one rectangle is enclosed in another, for the purpose of partitioning, we treat coincident sides as contained. This has the undesirable consequence that the rectangle is contained in itself. In the input (Excel file), this situation can happen, but in a template, it doesn't make sense!
You should be able to quickly infer the condition from the drawing.
The other is finding out if the rectangles are disjoint (outside of each other). You should be able to quickly infer the condition from the drawing too.
Crossing being the third partition, you can work by excluding the two first partitions: if two rectangles are neither contained or disjoints, they are overlapping. Now, it's intuitive and there are even less cases to be accounted.
Challenge 2
That one is simpler to explain with a picture.
You are given non crossing rectangles (it's what the crossing detection is for) and you need to convert them as a tree as depicted.
You can imagine the yellow part as the root and the protrusing parts as branches.
Here, the difficulty is that pairs of numbers are not completely ordered. It's even more true for quartet.
For one dimension, you can use the well documented nested sets:
You may think of GIS data-structures, but we needed something fast to build and yielding a tree compatible with our Word model.
Challenge 3
This one is a bit easier, but can be done in an efficient way; That's where the difficulty resides.
You are given a list of list and you need to iterate it as such that you can display its columns as rows.
So:
[
[1,2,3,4],
[5,6],
[7,8,9]
]
Becomes:
1,5,7
2,6,8
3,,9
4,,
Challenge 4
Parsing and updating formula. When cells are moving around, formula have to also be "moved around". All references have to be updated to the location of the new cells. So, not only the formula have to be parsed, but no information can be lost when updating the references.
This involves producing a BNF matching Excel formula in a good enough way, manipulating the AST (again, a tree) and spitting back the formula.
Conclusions
We work with a variety of technology and we are also building novel solutions. Those solution comes with novel problems, which means you can't always find help in other places.
It helps greatly to not stop at your usual management system knowledge, but know about existing solution across different fields. Being able to design your own data-structure and deal with recursion is not a once in a year exercise, but happens everyday.
Those are great exercises to brush up your skills with real life problems.