Creating Intuition for Data Structure and Algorithms in Salesforce APEX.
Collections are the heart of the APEX language.We have prebuilt data-structures such as List , Set and Map(collections) in the language. So do we need knowledge of data structures and algorithm then? We will try to find the answer in this apex blog.
We use collection in apex for almost any task. Whether it is bulkification , storing objects ,storing ids or dml etc. And we also follow the patterns such as no soql inside the for loop , no deeper for loops or no deeper if else branching etc.
At this point we should stop and try to find the answers of some of these questions :
So let’s start with some basic code patterns we write and try to get insights from them why some code is efficient then other and by what measure.
1.No SOQL inside the for loop.
explanation 1: maximum no of soql query in a single transaction is 100.
explanation 2:query is a database read operation and it is very expensive so it should not be used inside iteration.
let's say there is a time of execution of one query is c sec and the loop size = n.
n = 1 , time= c;
n=10 ,time = 10*c;
n=100,time = 100*c;
n=10000 ,time= 10000*c;
n= n , time= n*c
for smaller n the time is very less but as the n increases the time increases with very high rate.
So the solution here is use collections to get the results in one query and then iterate over the collection.
List<Account> accountsList = [select id,fields from account];
for(account ac: accounts){
//business logic. }
Now the time will be : c + accountsList.size() secs.
now the time is reduced from (c*n) to (c+n)
....
2.No deeper for loops.
for(account ac : accounts){
for(contact ct : contacts){
for(childOfcontact ct: ct.childOfcontacts){
//business logic
}
}
}
let's assume size of accounts will be n;
size of contacts per account on average is m;
size of childContacts per contact on average is k;
business logic takes z sec.
Then total execution time will be n*m*k*z sec.
if n= 1000, m= 100 and k= 50 => time = 5000000*z sec.
if n=m=k=n then cost = n*n*n*z
Time in the solutions:
1. if you optimize with 1 query and two for loops then the time will
c sec for 1 query + n*n*z sec
2.if you can optimize it with 1 query and 1 for loop then the time
will be : c sec for 1 query + n*z sec
....
3.List is ordered but Set and Map is not ordered.
Ordered: we can the determine the order of the elements and can access it by
specifying some index.
Unordered: we can not determine the order of the elements and also can not access
the element by specifying some index.
To understand this we need to first understand the process of storing the
elements in each of these collections.
1. List: We can think of list as dynamic array. Array is collection of elements of
same type. Dynamic array is a re sizable array i.e. it can increase it's size on
adding the elements and it can decrease it's size on removal.
add(ele): on add operation list appends the element to the end of the list.So
the list is ordered.
List<Integer> list = new List<Integer>(); //[]
list.add(1); //[1]
list.add(2); //[1,2]
list.add(3); //[1,2,3]
Hence the order of the list depends on the order of input so we can also specify
index to get an element.
2. Set:Set is also a dynamic array but process of populating it with values is
different from that of a list. It uses concept of hashing to store the
values.
Hashing: In hashing we map a set of values to a set of keys. we use hash
function to create the key for a value.
e.g. let hash function is h(ele) = ele % 10 => remainder when divided
by 10.
h(1) = 1%10 = 1, h(5) = 5 , h(9) = 9%10=10
for ele = 1, key =1,value= 1
for ele = 9 , key = 9,value 9
Then we use this key as index to store the value in the set (array).
Set<integer> set = new Set<Integer>();
set.add(5) can be viewed as set[h(5)]= set[5]=5
set.add(2) can be viewed as set[h(2)] = set[2]=2
set.add(9) can be viewed as set[h(9)] = set[9]=9
See here that we are adding the elements in order (5,2,9) but the
values are being stored as order based on the hash function. So we
don't have any idea how the values are being stored in set i.e it is
unordered. So we can't access elements from set by specifying some
index.
*10 is the size of set. we can choose other size also. it's called
bucket size.
3.MAP: Map also uses the concept of hashing to store the key and value. The
difference between the set and map is that set only stores the key but map
stores (key,value) pair.
Map<Integer,Integer> map = new Map<Integer,Integer>(); //{}
map.put(5,10);{5=10}
map calculates hash value of 5 by using the hash function h(5) and stores
the (5,10) based on the h(5) value.
So map is also unordered as we don't know how keys are being stored.
...
Recommended by LinkedIn
4.Set doesn’t contain the duplicate values.
Set stores value based on the hash function.
So if you provide duplicate value the key (index) will be same. So it will just
replace the old value with same new value.
set.add(2) => set[h(2)] = set[2%10] = set[2] = 2
Now if we again add 2 to set : it will just set set[2]=2;
...
5.Key should be unique in Map.
Keys should be unique enough so that value doesn't get override.
map.put(2,4) => map[h(2)]= h[2%10] = h[2] = {2,4}
map.put(2,5) => map[h(2)] = h[2%10] = h[2] = {2,5}
//Old value got replaced by the new value with the same key.
...
6. How Map uses object(non primitive type) as key:
As we have seen already the key should be unique enough to get the unique hash
value.
1. sobject type:
let's see this code:
Map<Account,integer> objectMap= new Map<Account,Integer>();
account a = new account(name='kumar');
account b = new account(name = 'kumar');
objectMap.put(a,1);
objectMap.put(b,1);
system.debug(objectMap);// output will be : {Account:{Name=kumar}=1}
Even though the a and b are two different object ,But the size of the map is
only one.
The hash value for these two objects is same i.e. a.name and b.name =
'kumar'.
The default hash value or uniqueness of key is calculated using the object
field values.
2.Other Non primitive type:
For Other primitive type such as user defined type , we can define the hash
function or uniqueness by overriding the equals method and hashcode method.
https://developer.salesforce.com/docs/atlas.en-us.apexcode.meta/apexcode/langCon_apex_collections_maps_keys_userdefined.htm
...
7.Set.contains(ele) takes less time than List.contains(ele) ?
Suppose we have collection of ids. Collection<ID> (set<id> or list<Id>)
And List of contacts : List<contact> ctList;
we need to filter those contacts whose id is in that collection.
Which will give better performance set or list ?
for(contact ct : contact){
if(Collection.contains(ct.id)){
// do something.}
}
1. if collection is list:
List<Integer> intlist = new List<Integer>(){1,2,3,4}
list.contains(4) can be viewed as
for(Integer i : intlist){
if(i == 4){
return true;
}}
i.e. it searches for the element is list.So if the size of list
n > 1000. it will iterate through the list 1000 times for
worst case i.e for the last element. order of O(n).
2.If collection is set:
Set<Integer> setInt = new Set<Integer>(){1,2,3,4}
if hash function h(ele) = ele %10
h(3) = 3%10 = 3;
setInt.contains(3) can be viewed as :
if(setInt[h(3)]!=null)return true;
It takes constant time to return true or false. c1 time to calculate hash value
and c2 time to access the value at that index.
Then c1+c2 sec = c sec which is constant time of order O(1).
So clearly the set.contains(ele) is way faster than the list.contains(ele).
....
8.Is enhanced for loop better then the traditional for loop ?
It depends with your case.
In traditional for loop you have access to the index so you can customize
things. We can replace the value etc. We can customize the counter. it is more
error prone.
Enhanced for loop is less error prone and easier then the for loop.
It is less customizable and gives direct access to object or element in
collection.
...
We have seen that the even we are happy using the collection but we are using the data structure and algorithm concepts every day. As the time will pass the complexity and problem will grow so knowledge of data structures becomes necessary for developers.
Nice Article
Superb Article
Great explanation 👍👍