Data Structures

Tarcísio Marinho

Published Jan 14, 2018

Data structures is a important and common topic in computer science. It’s about how to store and manipulate the data inside a computer architecture.

Data is a way to transform information in a efficient form for processing by computers. At the lowest level (computer science concern), raw data is binary digits (0’s and 1’s) since that’s all computers process. The basic data types are integer, float, char, boolean. But it can get a lot more complex.

Arrays and linked lists

Since the limited size and only one data type, the array sometimes isn’t the best approach while coding. When the size of an vector can change between program runtime and you want to have multiple data types, we use a structure called linked lists.

Here’s how linked list is defined and represented in C.

With linked lists you can allocate and delete how much nodes you need in runtime. The basic linked list operations are: Insertion, deletion and search. This structure can be also double linked, and circular. The linked list solves several problems, and can represent a good structure to store data.

Both array and linked lists are linear data types this means that insertion complexity time is constant O(1), but for search is linear time O(n). Linked list isn’t the best structure for searching data inside.

Trees

Another data structure widely used are trees, and there are a lot of them! It’s used in databases, file systems, routing algorithms, decision making algorithms, games and much more.

The most basic tree is the binary tree and the binary search tree (BST). The difference is that the binary search tree uses the binary search algorithm to speed up searching inside the tree with time complexity O(log n).

When a new element or node is inserted, it’s inserted sorted in the right place. Here’s tree logic: insert a new number on the tree, first check if the new number is less or bigger than the current node, if its less than go to left child, otherwise go to right child, if it’s null, than allocate. This is a recursive insertion.

After inserting a several nodes, some times the tree can get unbalanced and became a linked list, affecting the binary search algorithm and being O(n).

AVL Tree

For this issue above we have balancing algorithms such as left rotation and right rotation. That helps the tree stay partial/fully balanced.

Each new insertion we have to check if the tree branch needs to be balanced. The left and right rotations have time complexity of O(1), average insertion in the AVL tree is O(log n).

The AVL tree is used when your tree operations are basically search, because the binary search algorithm takes full advantage from a balanced tree, the downside is the insertion or deletion operations, that costs a bit more. If you’re doing more insertion or deletion operations AVL isn’t the best tree to use.

Red Black Tree

The red black tree uses the balancing technique and also an integer to mark if the node is black or red (0 or 1). The tree proprieties ensure that the longest path from root to leaf is maximum two times bigger than any other path from other leaf, making the tree approximately balanced.

The insertion and deletion of a node on the RB tree uses balancing and recoloring to make a tree partly balanced. Since recoloring is just value attribution, it takes no time comparing to rotations. This makes the tree less rotations and not all insertions have to rotate, differing from AVL.

If your application uses more insertion and deletion operations than the search, the red black tree is your pick choice.

B and B+ Trees

All the trees so far are use in principal memory (RAM), but what if our tree is huge that doesn’t fit on the RAM memory, some TBytes of databases and we want to store the tree on secondary memory (HD) which is a slow access memory? we cannot use AVL nor Red black tree, because each node access (“points”) to a place on the file, called offset (“fseek”) and would make slow the search on secondary memory. Here’s where B and B+ comes in.

Their design makes file search more efficient by bringing the page concept.

Each node has an array of 3 keys and an array of 4 pointers (Implementation variant). Split and concatenation makes the B or B+ tree to be balanced.

All the B+ keys are on the leafs, the non-leaf nodes isn’t on the file (removed). The all the leaf’s node have a pointer to the next leaf node.

This makes the insertion and search more efficient from O(log n) to O(1).

Graphs

Different from trees, graphs aren’t hierarchical structure, it’s formed by nodes and vertices (Connections)

A lot of real world problems can be solved using graphs: Router path, shortest path between cities, Google maps, circuits design, friendship relations. And there is a lot of problems hard (“complexity talking”) graph problems, such as: clique, Traveling salesman problem, 3-SAT, and much more others.

There are some important algorithms for graph search and finding shortest path between two nodes, such as: A*, Bellman-Ford, Dijkstra’s.

There are two types of graphs: directed graphs and non-directed graphs and the graphs can also have a weight between connections.

Data Structures

Tarcísio Marinho

Arrays and linked lists

Trees

AVL Tree

Red Black Tree

B and B+ Trees

Graphs

More articles by this author

Others also viewed

Over the edge: Learn how GraphDB can make you weightless and direct you to a node

Polars Vs Spark

Data Engineering Is Not Just SQL. And That Changed How I Think.

The Future of Data Science, Part II: Tools of Tomorrow

Not always "More resources == More Performance" (Spark Optimization)

From Code to Commentary: Using Claude 3.7 Sonnet Natively in Databricks

Big Data Love Triangle

Implementing SCD Type 2 in PySpark — A Practical Guide for Data Engineers

Big Data Analytics, Where are We Going?

Lakes, Lakehouses, Warehouse and.....MDM?

Explore content categories

Arrays and linked lists

Trees

AVL Tree

Red Black Tree

B and B+ Trees

Graphs

How can a malware encrypt a company existence ?

Dec 1, 2019

Ransomware Encryption Techniques

Feb 5, 2019

Criptografia Assimétrica RSA

Jan 14, 2018

How ransomware work’s and GonnaCry linux ransomware

Jan 14, 2018

Limits of computation

Jan 14, 2018

Others also viewed

Over the edge: Learn how GraphDB can make you weightless and direct you to a node

Polars Vs Spark

Data Engineering Is Not Just SQL. And That Changed How I Think.

The Future of Data Science, Part II: Tools of Tomorrow

Not always "More resources == More Performance" (Spark Optimization)

From Code to Commentary: Using Claude 3.7 Sonnet Natively in Databricks

Big Data Love Triangle

Implementing SCD Type 2 in PySpark — A Practical Guide for Data Engineers

Big Data Analytics, Where are We Going?

Lakes, Lakehouses, Warehouse and.....MDM?

Similar topics

How Data Structures Affect Programming Performance

Common Data Structure Questions

Google SWE-II Data Structures Interview Preparation

Explore content categories