Algorithms and Data Structures - Vectors
This article is a brief overview across these two domains, which has a huge importance in software development, where even so, tons of developers doesn't embrace their knowledge, losing the opportunity to really make difference in the art of problem solving with high performance.
WARNING: This is a SIMPLE ARTICLE about Algorithms and Data Structures. If these are subjects which you master, DO NOT GO AHEAD, because here I only specify basic concepts to help people that are entering into the domain, to become more aware about what these things are, and how they work together.
First of all: A brief overview about the concept of algorithm
Well, the time has arrived for me, and this subject / knowledge area simply showed me that without it, I wouldn't be able to do too much if I am an ambitious software developer wanting to create something that really impact the human being, some society or anything like that.
Let's talk about those crazy stuff, generally related to those stereotyped fat, bald, weird and genius nerds (which is pure stereotype since that it has lots of good looking people which also knows these concepts [LOL]).
The algorithms.
So, what the hell is an algorithm?
"An algorithm is a set of procedures, executed step-by-step, with the goal of solving a well-defined problem"
Normally, this term is always related with computers. You can even think that it has people where when read this word, automatically starts to imagine the traditional binary matrix image from "The Matrix" movie, but guess what: an algorithm doesn't necessarily must be related with computers, or mathematics only.
I could create an algorithm which instructs a person on how to open a door, or on how to open a bottle of a delicious Belgian beer (I simply love the Belgian beer recipes, where this love is so bigger that I used this example on the detailed example below), and so on. So in fact, what it matters to think about algorithms is that it is a concept related with the sequence of steps of something in some environment to solve a problem, where in the software engineering area, of course, it is completely related with computer programming (man, I love those crazy english words soup).
Of course that, nowadays, keep the idea of a computer program being just a simple set of instructions that were implemented sequentially is a complete mistake, since that even the way that we write those instructions was improved, with the goal to increase the quality and the time needed to build a solution / program / application / whatever.
So lets build a simple example of an algorithm which solves the following beautiful problem:
we're seated at the living room and we need to drink a bottle of beer, to become happier than we normally are, considering that it is being kept inside of the kitchen's refrigerator
(I really think that the world would be a better place to live if most of the problems could be like this one)
So let's take a look at the algorithm below:
// The algorithm starts here
Our brain feels the following need 'Yeah! I want a beer!'
We get up from the living room couch
We walk into the kitchen's direction and enter into it
We look for the refrigerator
We find the refrigerator and walk into its direction
We stop in front of the refrigerator and open it
We look for the beer bottle
After we've found the beer bottle and do a smile, we get it
We close the refrigerator
We look for the bottle's opener gadget
After we've found the opened gadget we open the bottle of beer
We drink the bottle and fly to the moon
// The algorithm ends here
So as you can see at the list of the phrases above, you find a bunch of phrases, which is a set of instructions telling "how do we get a beer from the kitchen's refrigerator, from the living room's couch".
A bizarre example, I know, but this is an algorithm. We described the set of instructions which solves the well-defined problem above.
Now, considering its structure, in other words, considering that it is only a list of instructions, giving kind of "orders" on how to solve the problem step-by-step, we say that this form of instructions follows a paradigm. Its paradigm is the "Imperative Paradigm".
Okay, now let's go to another example, but with a truly real problem to be solved in software engineering and also in real-life: data sorting.
In our example, it will be used a very simple data set as an example, because the goal here is only to give an idea of the algorithm concept in practice.
Let's imagine that we have a list of values in pairs, containing an index number and a name in each row:
Right, now let's imagine that we need to sort that list in an ascendent order by their first letter. How would we do that?
So, let's describe this procedure step-by-step.
1st) We get the first element "1. Jason" as reference. It contains the letter J as the first letter.
2nd) We get the second element "2. Steven" and compare with the first element "1. Jason" as reference. It contains the letter S as the first letter. Then we compare: the J letter comes first than S in the alphabet? The answer is yes. So we leave in the way that it is.
3rd) We get the third element "3. Dennis" and compare with the first element "1. Jason" as reference. It contains the letter D as the first letter. Then we compare: the J letter comes first than D in the alphabet? The answer is no. So we change the value positions: Dennis goes to 1 and Jason goes to 3, where the list becomes:
4th) We get the fourth element "4. Rubens" and compare with the first element "1. Dennis" as reference. It contains the letter R as the first letter. Then we compare: the D letter comes first than R in the alphabet? The answer is yes. So we leave in the way that it is.
5th) We get the fifth element "5. Oziris" and compare with the first element "1. Dennis" as reference. It contains the letter O as the first letter. Then we compare: the D letter comes first than O in the alphabet? The answer is yes. So we leave in the way that it is.
6th) We get the sixth element "6. Antonietus" and compare with the first element "1. Dennis" as reference. It contains the letter A as the first letter. Then we compare: the D letter comes first than A in the alphabet? The answer is no. So we change the value positions: Antonietus goes to 1 and Dennis goes to 6, where the list becomes:
7th) The list has ended. Now, you noticed that I underlined the "first element" and "as reference" words in the previous steps right? So, I did that to highlight the fact that we always keep a reference to compare, a position reference. So, now that the list has ended, we move our reference to the second position, where everything that is before this position reference, can be considered as ordered, and then we get the third element "3. Jason" and compare with the second element "2. Steven" as reference. It contains the letter J as the first letter. Then we compare: the S letter comes first than J in the alphabet? The answer is no. So we change the value positions: Jason goes to 2 and Steven goes to 3, where the list becomes:
8th) We get the fourth element "4. Rubens" and compare with the second element "2. Jason" as reference. It contains the letter R as the first letter. Then we compare: the J letter comes first than R in the alphabet? The answer is yes. So we leave in the way that it is.
9th) We get the fifth element "5. Oziris" and compare with the second element "2. Jason" as reference. It contains the letter O as the first letter. Then we compare: the J letter comes first than O in the alphabet? The answer is yes. So we leave in the way that it is.
10th) We get the sixth element "6. Dennis" and compare with the second element "2. Jason" as reference. It contains the letter D as the first letter. Then we compare: the J letter comes first than D in the alphabet? The answer is no. So we change the value positions: Dennis goes to 2 and Jason goes to 6, where the list becomes:
11th) The list has ended. Now we move our reference to the third position, where everything that is before this position reference, can be considered as ordered, and then we get the fourth element "4. Rubens" and compare with the third element "3. Steven" as reference. It contains the letter R as the first letter. Then we compare: the S letter comes first than R in the alphabet? The answer is no. So we change the value positions: Rubens goes to 3 and Steven goes to 4, where the list becomes:
12th) We get the fifth element "5. Oziris" and compare with the third element "3. Rubens" as reference. It contains the letter O as the first letter. Then we compare: the R letter comes first than O in the alphabet? The answer is no. So we change the value positions: Oziris goes to 3 and Rubens goes to 5, where the list becomes:
13th) We get the sixth element "6. Jason" and compare with the third element "3. Oziris" as reference. It contains the letter J as the first letter. Then we compare: the O letter comes first than J in the alphabet? The answer is no. So we change the value positions: Jason goes to 3 and Oziris goes to 6, where the list becomes:
14th) The list has ended. Now we move our reference to the fourth position, where everything that is before this position reference can be considered as ordered, and then we get the fifth element "5. Rubens" and compare with the fourth element "4. Steven" as reference. It contains the letter R as the first letter. Then we compare: the S letter comes first than R in the alphabet? The answer is no. So we change the value positions: Rubens goes to 4 and Steven goes to 5, where the list becomes:
15th) We get the sixth element "6. Oziris" and compare with the fourth element "4. Rubens" as reference. It contains the letter O as the first letter. Then we compare: the R letter comes first than O in the alphabet? The answer is no. So we change the value positions: Oziris goes to 4 and Rubens goes to 6, where the list becomes:
16th) The list has ended. Now we move our reference to the fifth position, where everything that is before this position reference, can be considered as ordered, and then we get the sixth element "6. Rubens" and compare with the fifth element "5. Steven" as reference. It contains the letter R as the first letter. Then we compare: the S letter comes first than R in the alphabet? The answer is no. So we change the value positions: Rubens goes to 5 and Steven goes to 6, where the list becomes:
So, the process finally has been finished, and as you can see, our list is in ascending order considering only the first letter of each name.
You must be thinking: My Gosh! You are saying that to write an algorithm I must write all of those stuff? Giving orders to the computer to do things? In some parts, yes! But of course that in computer programming, we have different ways to write the same thing, and for your lucky: very much simpler than that. In fact, that's the reason that we have so many different paradigms and coding styles through the coding universe.
The algorithm described above has a name. It's called by "Bubble Sorting", and is not a very optimized algorithm for sorting, but it is very good way to use as an algorithm example.
Now, another question comes in mind: "A paradigm? What the heck is this?"
Programming Paradigm: The System of Ideas Used to Guide the Design of Programming Languages
In spite of we define algorithm as being a set of instructions to solve a well-defined problem, we can also say that we can find different ways to write them. This was because across time, the problems using software has increased as its complexity too, so new ways of writing code had to be created to ensure quality and speed during the development process.
Ideas had been used on the design of new languages, creating different language concepts that we call as paradigms. We have the following main programming paradigms at our disposal (it is important to know that we can find lots of more paradigms, but I will be focusing only on the two below):
- Imperative Paradigm (or Procedural)
- Object-Oriented Paradigm
So, what is a paradigm? There's a lot of different definitions for the subject, but I could say that paradigm, in programming domain, is a way of thinking to solve a problem. Depending of the problem, we need to think in a way or in another way, to solve it. So, a set of classifications were created to separate those different ways of thinking. We have only two below, but if you do a brief research through the internet, you'll figure out that lots of different paradigms exists.
Imperative Paradigm: The imperative way of programming is just like our example on the sorting problem above. It's a way where we build lines of commands saying what the computer should do, and its main problem is on the statement repetitions, needing lots and lots of lines repeating a few statements when needed. It stores data into variables to process them.
Object-Oriented: The object-oriented way of programming envolves the idea of mixing real world things into the programming concepts. We have self-contained units (called by objects) which handles different variables (called as attributes) with different types, representing something in the problem's domain. When we implement a solution using the object-oriented paradigm, we frequently say the following names: class, attributes, methods and instances.
Doing an analogy with an architectural building project, we could say that the class is the building project drafting, the attributes are the characteristics of the building such as its dimensions, the number of rooms on it, its size, its color and so on, the methods would be their functions such as open the entrance door, close the entrance door, open the window, close the window, turn on the lights, turn off the lights and so on and the instance would be the building itself, the result of the project execution. With one project you can have lots of different buildings created from it. The same is with a class. We can have different "instances" of the same class.
Ok! I got a tiny taste of what an algorithm and paradigm is, and what about data structures?
Let's think about a program which its main goal is to save informations about a company employees. What kind of information an employee can have? Well, an employee has a name, birthdate, id, home address, phone number, cell phone number, and so on. But let's keep only those "fields" as enough data to each employee.
The id. We can use numbers to set the id values, just like an index on a list. So, the computer stores the number corresponding to the id on a single place of the memory and done!
The name. Hum. This one is a bit tuff. Well, what a name is? For instance "Rubens" name. What we see? A set of letters right? In the computer world, we could save each letter in a different place of the memory right? Yes, sure, but it would be terrible. We would fill all the computer memory opening an e-book or something, and of course, it wouldn't be productive to fulfill most of the memory addresses only to save a single letter in each, so we have to find a way to save this set of letters into a single memory block. How? Using our first data structure: a vector.
Vector
The computer use vectors to save a set of letters on a single memory address. Think about a vector as a list with an index column, just like below:
When we have a word, in fact we are dealing with a vector of letters, but in the programming universe we call these elements a bit differently. Instead of calling a set of letters, we say that is a String. Not only words or name are Strings, but entire phrases as well, even all this text can be considered a String (of course, excluding the formatting data and the images as well).
A bit about the Strings
Instead calling "letters", we call characters, because when we say characters, we can consider more than alphabetic letters. We can have symbols such as "#", or "$", or also numbers "1". Yes, numbers. It's crazy I know, but in the computer world we can represent a number as a String. The difference between a number 1 and a character 1? We can say the difference between then when we try to do some "sum" operation. First of all, Strings always (or at least, almost always) is represented by a set of characters between double quotes, so we can say that: "1" + "1" results differently than 1 + 1. One results in "11" (what we call as concatenation of characters), where the other results in the mathematical result of 1 + 1, which is 2.
So this is it, Strings is nothing more than a vector of characters. This is how we can have all the letters, words, phrases, being saved into the same memory address.
Now, have you realized the similarity between our vector friend above, and the list of names used on our sorting algorithm? Have you realized that this is how the computer process a set of letters (or vector of characters) to sort the elements inside of it? Let's imagine that we want to invert how a name is written. Well, we just need to create another vector/array with the same length, and then add each letter, sequentially, but getting them in backwards from the vector/String used as the source. So, we iterate through the vector starting from the end until reach the beginning. Magic!
Let's simulate that (I know, this is example is incredibly obvious and simple, but as I said in the beginning of the article: This text is focused to the ones that doesn't have any idea of that these subjects are). So, first of all, let's create another empty vector/array with the same size:
Once done that, we will start to fill the new vector/array, but from backwards, how? Let's do step by step:
1st) Let's get the last letter from the vector/array, considering that it has 6 letters, let's start getting the 6th letter, which is S.
Put the letter S into the position 1 of the new vector/array.
2nd) Now, we get the 5th letter from the source, which is the letter N, then we put that letter into the position 2 of the new vector/array.
3rd) We get now the 4th letter from the source, which is the letter E, then we put that letter into the position 3 of the new vector/array.
4th) We get now the 3rd letter from the source, which is the letter B, then we put that letter into the position 4 of the new vector/array.
5h) We get now the 2nd letter from the source, which is the letter U, then we put that letter into the position 5 of the new vector/array.
6th) We get now the 1st and last letter from the source, which is the letter R, then we put that letter into the position 6 of the new vector/array.
And done!
Well, here we are AGAIN! TONS of instructions and phrases, together with a vector/array, in other words: algorithm working together with a data structure. Spooky isn't it? Well, I will be remembering that fact later!
We processed a String just as an example, to reinforce what a String really is: a vector of characters.
Here we've shown a vector of characters, but vectors can have other data types too, such as integers, floats, booleans, and so on.
Note that for vectors, since that it contains index, we can get an element in the middle of it, directly, just using the index number. We say that, for vectors/arrays, doing what we call as asymptotic analysis, we have constant access to the elements, getting it instantly. In other data structures which keeps a set of data, unfortunately, this process can take more time to execute.
To simulate that, normally in most program languages, to get a value in the middle of the vector, we can simply define vectorName[3], we access the value in the position 3 directly, or if we need to specify a value in a specific position we also can do something like "vectorName[2] = 'Value'", so, what we're doing there is to set the value "Value" in the position 2, directly, in the speed of light (lol).
Well, I think that this is it for now. I will be writing more articles regarding the subject, talking about others data structures more complex, which works together with algorithms, and then you will start to understand the reason that you can find so many books about data structures that always talks about algorithms too, because they kind of depend each other to work!
See you in another article!