Computing With Histograms

Computing With Histograms

Histograms and bar graphs are familiar concepts to anyone working with data. They are used to convey information about the frequency or other measure for individually separate and distinct categories of data. Word clouds serve a similar purpose in the particular context of words.

Even though not entirely correct, we will use the term histogram to refer to an assignment of values to a finite number of bins or buckets. The assignments can be absolute, for example the quantities of ingredients in a recipe, or relative, for example the percentages used in resource allocations, or poll numbers. For the more mathematically inclined readers, polynomials can also be viewed as histograms, as they are assignments of coefficients to the powers of the variable(s). Fourier series decompose complex waves into simpler ones, and they too are histograms assigning coefficients to the cosine and sine of the multiples of the fundamental frequency. Vectors can also be regarded as histograms where the coordinates are the values.

In fact, many systems we interact with, and activities we perform all the time, like accounting systems, portfolio balancing, risk assessment, are related to histograms. Even the state of a quantum system is a histogram, which is one of the ideas contained in my recent presentation on simulating quantum computing. The real interesting part is dealing with changing histograms that evolve in time. There are a few simple rules that define histogram evolution, which we will explore next.

Resource Allocation

It may seem straightforward to update a histogram that represents the state of a system by just replacing the current histogram with a new one. That is easy to perform as a computation, but in reality things are not that simple. Take the example of balancing an investment portfolio. In this case the histogram represents relative allocations specified as percentages, so they have to add up to one.

After specifying new allocations, the system has to sell and buy individual investment instruments. Assuming that one of the instruments in a cache account, one could sell instruments that need to have less allocations, put the money in the cache account and then buy the instruments that need to have more allocations in the target state. The general idea is that a reallocation step is needed for each instrument. Even though it is not efficient, we allow each of the instruments to be reallocated across all available instruments. Each such reallocation is a histogram itself, with the values adding up to one. It is also important to note that the reallocations are relative, they are independent of the market value in the reallocated instrument. Then a general histogram transformation looks like this:

The operator >>= denotes the application of a transformation specified in the middle by a reallocation of each bucket. If you think this is similar to linear algebra, you are right. Each transformation step is a column in a matrix that yields the new state when multiplied with the current state.

Accounting Systems

The need to allow any kind of reallocation of one bucket across all available buckets is even more clear if we look at the example of an accounting system. In this case the buckets are the accounts in the system, and the assigned values are the balances in the accounts. It is convenient for accounting systems to enforce the invariant that the sum of all account balances is zero by adding synthetic accounts if necessary. The state of an accounting system can then be represented by a histogram like the one pictured here, where segments on the right side are representing positive values, and the ones on the left side are representing negative values. An account owner can make payments to other accounts as a transformation step corresponding to that account, and an entry with a negative value for that account has to be included in the transformation step. This way the invariant of having a zero sum is satisfied for each step.

Quantum State

The state of a quantum system is a vector that can also be viewed as a histogram. In a quantum system composed of quantum bits, the buckets are the binary strings representing the possible outputs of the system after a measurement is done, and the values are the complex numbers, called amplitudes, associated to each of the possible outputs. Since complex numbers are the same as arrows in a plane, we can represent them as slanted segments as in the picture here. As in the previous examples, quantum state has its own invariant: the squared lengths of the amplitudes add up to one.

Transforming quantum state is similar to portfolio balancing, except complex numbers are used instead of real ones. But the code implementation is the same, and reduces to linear algebra. A quantum bit can be thought of as a coin that has an arrow or complex number associated to each face, called amplitude. The probability of each side turning when the coin is tossed is the square of the amplitude's length.

When multiple quantum bits are used amplitudes have to be assigned to all possible combinations of coin sides.


General Case

Going back to the general case of transforming histograms, we already stated that we need to specify a reallocation of each bucket value across all available buckets. Then the same framework can be applied to all the cases we mentioned above, which combines direct transfers of value from one bucket to another. The way these direct transfers are created and how they are combined is specific to each use case. The code below provides details about the common abstraction and the specific rules of each case.

Probabilistic State

Probabilistic state evolving according to Bayes' Rule fits the framework described here. The histogram specifies the probabilities associated to every hypothesis (possible outcome). But the transformation steps are not probabilities, they are likelihoods that don't add up to one, so a normalization procedure is needed in this case. A transformation is equivalent to a diagonal matrix. Below is the example of the evolution of the probabilities of five dice shaped like the platonic solids being used in a sequence of throws. Every time a new data point is available, the probabilistic state is updated.


Code

The superclass representing the common abstraction looks like this in Scala:

The subclass for resource allocation (e.g. portfolio balancing):

The one for accounting systems:

The one for probabilistic state:

And finally the one for quantum state:


To view or add a comment, sign in

Others also viewed

Explore content categories