Security Implementation On Your Cloud Journey- 2 Parallel Paths
For the last few years there has been a significant shift from building and maintaining applications on premises to migrating existing applications and building new applications on the Cloud. This has introduced numerous challenges and risks while protecting the applications and keeping the data safe. Migrating to the cloud has been one of the key requirements of Digital Transformation for our clients.
The last few years we have been migrating multiple applications to a Hybrid Cloud Architecture. In our case the hyper scaler is Azure. As the implementation progressed and the complications and solutions increased, I felt it prudent to keep track and organize our findings. This article focusses solely on the security related aspects of migrating to the cloud.
I initially envisioned this to be a brief reference document for our next cloud migration project, but it evolved into a larger article. It is intended for technical folks and managers to be better prepared for dealing with security in the cloud migration. It summarizes key technical implementation points from the perspective of development/SDLC and infrastructure management and should help you in preparing for an exciting journey to the cloud.
Problem Statement
There were a host of reasons for migrating the identified applications to the cloud a) Legacy Technologies b) Escalating Costs c) Operational Inefficiencies d) Inability to Scale and Innovate and most importantly consolidate and create a public utility model (SaaS).
It was therefore necessary to employ multiple transformation methodologies due to a) Different technologies (Eg Mainframe, VB.net, Applets, Java) b) Short life span of some of the applications d) Duplicate features that needed consolidation e) Lack of subject matter experts and reference documents.
Amongst the 5Rs of migrating applications to the cloud suggested by Gartner, we banked on Rearchitect, Replace, Revise & Refactor , Rehosting was not an option that we considered since it was preventing us to take full advantage of the cloud.
Despite the above complexities our optimistic assumption was that the transformation journey will be orderly albeit with a certain number of knowns and a buffer to handle the unknowns. But migration challenges that we encountered made for an extremely rough and interesting journey. With multiple scrum teams the journey sometimes resembled a horde rather than an orderly caravan.
Migration Challenges
We had listed some challenges before migrating the applications. The list below has some that we planned and some that surprised us.
All the above kept us grappling for solutions. But amidst all this it was the constant worry of ensuring security & compliance that stood out. Since it was the first time that these applications and data were to reside outside the data center, it was critical for us to get it right.
In addition, the applications were related to the Financial Domain, things can get a lot more complicated in a hybrid multi-cloud environment for this domain.
My experience has always been on Financial Services wherein the security regulations and compliance requirements are extremely strict. I am sure that might be the case in other domains too. But having applications on the cloud with customer’s financial data carries multiple security related risks.
A few years ago, financial institutions were reluctant to move to a public cloud and our work was mostly migrating applications to a private cloud mainly due to security related concerns and legacy technology. But subsequently we have seen a shift in preference to move to the public cloud due to the convincing security strategies provided on the public cloud and the costs that can be saved. The hyper scalers are continuously investing significantly in cloud security thereby ensuring evolving security fixes. The confidence in pubic clouds is such that we now receive requirements to even move core applications to the cloud.
As we progressed in the journey we had to tread very carefully and treat every architecture and design decision from a security perspective. But at the same time, we had to ensure it didn’t impede our delivery speeds. The risks due to security attacks, ransom ware attacks and other nefarious activities can be quite calamitous for a financial institution. Therefore, the regulatory compliance is quite strict.
For example, the below is from the American Bankers Association Consortium
Banks have the highest level of security among critical U.S. industries—and the most stringent regulatory requirements.
Our threat levels increased due to a landscape that had a combination of Legacy Systems that were cloud washed, a few cloud native systems, hybrid systems and a mix of IAAS, PAAS & SAAS models.
All this meant we needed a structured way of architecting, designing, and developing applications on the cloud.
5 Pillars of a Well Architected Application
Nearly all the hyper scalers recommend methodologies that are well evolved including tried and tested best practices to Create, Implement and Operate a cloud topology. Organizing our thoughts, process, teams, methodologies around these benefited us hugely.
This article is based on the second pillar “Security” and is a consolidation of the gamut of security related aspects that need to be taken care of. This is not a complete list in any stretch of imagination but will give you a start to determine the security aspects that you must line up during a cloud journey and hopefully give a head start to explore further. The threats and consequent solutions are constantly evolving therefore strategy needs to be frequently tweaked.
If you find most of the references in the subsequent paragraphs referring to Azure that’s because our biggest focus has been on Azure for the moment.
2 Tracks For Implementing Security
Our experience from the program is that security should be applied as part of
1 - Secure Software Development Lifecycle
Security should be deeply embedded in the SDLC process just like what Microsoft recommends as part of their Security Development Lifecycle (SDL).
Perhaps it should be called as Secure Software Development Life Cycle (SSDL)!!
Let us go through the various stages from the perspective of security,
Requirements
Define the security and compliance requirements upfront. Having said that, this must be a continuous process. The hyper scalers spend billions of dollars on security research alone – a constant feedback loop will be useful to check on what new threats have come up and what innovative strategies are available for adoption. For example, we had to scramble when the Log4j vulnerability came up.
Keep in mind that in a Hybrid Cloud Architecture and BYOD needs will bring about new requirements that need to be laid out. For example, resources like printers that were available seamlessly in on-prem will need an alternate mechanism, even an IPP protocol for remote printing might not suffice. Files will need to be secured before they reach the printer. BYOD will have its own security risks to be handled, for example we nearly missed out implementing a Virus Scanner for file uploads.
Engage the Operations team earlier to define the metrics and reporting that is required. Define and agree on an SIEM strategy. Observability requirements need to be discussed and agreed early in the transformation journey – what metrics, logs need to be collected, and early dashboard wireframe for this data will be useful.
Design
Even if an agile process is followed, it is recommended to have upfront designs or at least high-level thoughts lined up for certain security requirements like for example on Identity management, Credential management, Managing Identity, Process for certifying 3rd party components, Security model for DBaaS if you chose to go the DBaaS way etc. It will be useful to have all of these as part of your backlog right from day one and tracked rigorously.
Implementing this later as an add-on will be tricky and prone to performance and testing delays. Application migration itself will be quite complex and there is a general tendency to postpone security and high-availability pain-points to a later stage. At the minimum the security points should be available in the backlog at the start of the sprints. Certain security implementations for example SSL or Key Vault management may impact performance and it will be useful to encounter these problems earlier.
Compliance
Also, certain compliance requirements will need up-front design. For example, if you are using Kafka on the cloud you may need to be careful on message retention, as the default retention setting might not meet security compliance. Cloud providers can also help on ensuring compliance implementation. Azure for example supports validating various regulatory standards like GDPR, MeitY etc.
Internal and External/Domain specific compliance requirements needs to be collated earlier. For example what are the various RBAC (Role based access controls) that need to be provided for the operations, developers, infrastructure providers, customers. What is the governance mechanism for provisioning VMs and databases.
Azure Blueprints and Azure policy can be leveraged to enforce implementation of both Internal and External policies
`Threat Modelling
Threat Modelling is an important design aspect that needs to be considered. It involves identifying various threats and incidents and preparing solutions to prevent them. It could be for example DDOS attacks, Stealing of certificates, Compromised VMs by brute force attacks, Data accessed and locked out.
The security testing process can then validate these threats and solutions.
The Mitre Organization for instance has exhaustive threat modeling resources. Some if not all can be considered as part of your threat model and some of them can impact on your design decisions.
Work with your Risk and Compliance (R&C) team to setup a threat modelling to identify security risks. This requires cybersecurity expert from R&C to be part of the team right from kick-off, don’t just depend on your application architects, a lot of deep diving is required which only a dedicated expert can bring.
Development
Clean code training was a pre-requisite amongst our teams. From novice to experts including managers were required to undergo a certification. This is required apart from any application stack specific trainings.
But now for all cloud related programs we intend to have the following as a prerequisite apart from clean code training.
Training the team on secure coding principles will ensure secure coding standards are followed right from the word go. It ensures input validations are solid, credentials are handled appropriately, sensitive data is not exposed, secure logging standards establishes compliance. Logging for security and compliance is a specialized subject by itself – here is a useful article. A mandatory security training can impart some crucial information like usage of key stores, leveraging managing identity earlier in the process. Retrofitting these later will be difficult, error prone and costly.
The OWASP site has great resources for secure coding practices. A training on security ensures that security gotchas in the application are shifted left rather than finding it either in Systems Integration Testing or worse User Acceptance Testing.
Large teams that run migration factories are recommended to include certified Architects, DevOps and Security experts. But it is highly recommended for a large percentage of the team to be certified on a fundamental certificate like AZ-900. This creates a general awareness of what is available on the cloud in terms of Cost optimization, Security, Reliability etc. A fundamental training also eases communication between the cloud experts and the rest of the team.
Engineering teams are generally focused only on application migration, but a certified Operations teams who run Opex budgets can validate if cost is optimized with the levers provide by the hyper scaler or if appropriate security compliances are taken care of.
Testing Application Security
Testing for security apart from application testing needs to be embedded as part of the Secure Software Development Cycle. Some of these can be automated, some require specialist teams, and some may congest the development pipeline due to the time it takes to test. A testing strategy with the aid of proper tools will greatly aid in alleviating these issues.
The obvious and recommended testing are
All the above falls under the realm of Application Security Testing and implemented using various tools which we will investigate below. An interesting point to note is that the application security testing market is set to grow by 13.2B USD by 2025
It will be recommended to employ an external security testing team to conduct penetration and other security related testing.
One of the items that we took advantage was to get the application audited for a Microsoft Badge. This is helping us validate our architecture and design based on a checklist. Getting an application audit done before it goes into full production is highly recommended. There are other benefits like show casing your application once its certified.
Additionally, if your application is integrating to a clearing house or exchange it might be required to get an external audit certificate before going live.
Also, most of the hyper scalers provide tools and frameworks to validate your application is compliant according to industry standards. For example, Microsoft Defender
Deployment/Operations
This is the stage when the rubber hits the road. It is in production that the application is visible to the outside world and when actual users start coming in. By this time the application security monitoring systems, mitigation factors, operational process should be finely tuned. The SIEM tool(s) (Security Information & Event Management) will play a critical part in this stage.
As mentioned earlier, the SIEM strategy should be worked out during the requirements phase itself and the Engineering, Operations and Infrastructure teams should work as a 3-in-a-box to set this up. This can and will need to be refined as you go forward in the migration journey.
It will be useful to define and test a security incidence response operating procedure including setting up access to teams who can ably conduct forensic analysis because the engineering team may not be available after go-live. It will be useful to set up a Security Operations centre SecOps for this, depending on the number of applications and threat levels.
With a Hybrid Cloud architecture, multiple applications and multiple APIs, logs and databases it was imperative for us to setup a central dashboard. An SIEM tool is helping us to achieve this – we are still in the process of implementing it completely, though we had decided on an SIEM tool lack of communication meant it was not being used appropriately and in some cases not used at all!
Tip: Tools like Defender and Sentinel (Azure SIEM) come with a monthly cost – you may want to look around to see if there are appropriate alternatives without compromising security.
It is in post-production that the actual security risk monitoring and response will be kicked into gear. Once the engineering team is moved on, the operations team will need to put in a process to be constantly updated on new threats, malicious attacks and the innovative technologies that counter them.
Least Privileged Access
We have put in strict controls and run a very tight and thin line when running Agile engineering teams and Cloud Infrastructure teams. The engineering teams expect agility in implementing PoCs, installing software, provisioning hardware etc, but tight control by infrastructure teams meant agility was curtailed a fair bit.
This creates a lot of angst between the teams and impacts development speed but security being paramount it’s a price that needs to be paid.
The infrastructure team takes away the provisioning and access is provided according to organizational guidelines and can be restrictive and include multiple approvals. The whole process needs to be managed manually and utilizing cloud provided features like Azure Blueprints, Policies, Tags etc to ensure consistency in security, compliance and also cost.
A restrictive access might impede developer productivity also, you may not be able to utilize the test environment of the cloud also because even that will fall under the same Management Group and therefore can still be restrictive.
Recommended by LinkedIn
We found it useful to have a separate Cloud Subscription that is governed and managed by the engineering team themselves. This can be used purely for conducting Proof-of-concepts, small out-of-application development etc. But this will still introduce additional cost and security management that needs to be handled. One needs to be careful here not to let VMs and Applications run without monitoring. Even if the subscription is purely for R&D ensure security and most importantly ensure there are no runaway costs.
Infrastructure, Tools and Architecture.
Now coming to the alternate path or perspective of implementing security. While the previous section was more from a process point of view, the below is the technical layers in which security needs to be implemented.
There are some key principles that will need to be applied when implementing a Hybrid/Multi Cloud. These are extremely critical as the attack points or attack surface is higher the risks are elevated.
· Zero Trust Architecture: A “never trust & always verify” principle that provides least-privileged access to connections, networks, users, devices. Applications and data are walled with restricted access using policies. A Zero Trust Operating Model also needs to be implemented where access rights are validated frequently.
· Defense in Depth: This is a strategy to wrap the inner most vulnerable part which is usually data with multiple layers of physical, network, access management, safe applications etc. A good analogy to think of layered security is the architecture of castles wherein the inner most sanctum is protected by multiple layers and protective systems. An attacker has to go through a gamut of security layers before it reaches the crown jewel - Data
· Posture Management: Or Cloud Security Posture Management (CSPM) is the usage of tools and processes to continuously monitor and remediate configuration issues. Leverage industry and domain best practices by mapping your architecture. The tools are that built specifically for this helps automate the management. Tools like Azure Defender continuously monitor the landscape and provide a Secure Score.
This is where the importance of looking at security is from the Infrastructure and Architecture perspective comes into play. Let us take a layered approach
Physical Security
For the Cloud Infrastructure we rely on the Hyper Scaler’s expertise and physical constraints that are placed.
But since it’s a Hybrid Cloud, we have put in rigorous mechanisms to check and monitor access to our development centers and data centers. Our infrastructure management team are kept separate, and they are governed by the organization’s Risk and Compliance teams.
Access to certain servers/VMS are either controlled by VPNs or by access from clean rooms. We refer it as ODC, though ODC stands for Offshore Development Centre., the rigors of setting up an ODC in terms of physical access, barring of personal devices etc were reused.
Perimeter
The first line of defense for a remote attack will need the following primary defenders
DDOS Protection – Prevent Denial of Service Attacks. Actually, in this case moving to the cloud is beneficial as one of the preventive measures is to increase bandwidth which the cloud can provide in addition to the tools available by the cloud provider. These attacks are categorized into Layer 3 / 4 or Layer 7 attacks and the implementation of a tool/plan depends on the requirement.
Firewall – Firewalls have always been available for some time to monitor and prevent unauthorized network traffic and multiple firewalls were used to create Demilitarized Zones, but with the current security threats they can also provide L3-L7 filtering and also prevent application layer attacks. It will be necessary to have a firewall with Intrusion Prevention & Intrusion Detection capabilities. Even then some firewalls may not support payload monitoring. There are also additional Web based Firewalls to eliminate SQL Injection attacks, Protection against bots, crawlers etc.
A payload monitoring and a firewall based on hardware rather than software is a typical requirement for financial services compliance. The full features required for compliance in financial domain may not be serviced by the Hyper scaler and you might need to evaluate a 3rd party specialist provider.
Network
As a follow on to the perimeter security, network security is a large and complex process. Some of the security features you could look at:-
TLS : - In this day and age TLS is an mandatory requirement. There are several TLS/SSL best practices that needs to be followed, for instance implementation of HTTP Strict Transport Security. Encryption in Transit
Routing Manager – For example Azure Traffic Manager can be used to manage your loads and manage disaster management. It can also be used to improve your security posture.
Virtual Private Cloud Setup – Use best practices to build landing zones and VPCs Eg Network Segmentation, Rule Based Access Control (RBAC) using Network Groups, Least Privileged access using zero trust model, etc.
Hybrid/Multi-Cloud Communication – A critical bridge between the remote networks, look at usage of VPNs, Private connections (Eg ExpressRoute – though expensive), Bastion Hosts, Hardened VPN Appliances. It will be preferrable to encrypt the data before it is transferred and decrypted at the cloud and vice-versa.
Cloud Network Monitor – An AI based tool to monitor and alert on network attacks and vulnerabilities.
Infrastructure/Compute
Unless as an exception your architecture is going to be based on Cloud Native principles.
Therefore it will require the following security guard rails:-
Container & Cluster Security – This site has some good starting points for Kubernetes and Docker security. Falco is a recommended tool by CNCF for containers and clusters.
Patching – Leverage an automated patching mechanism for your VMs.
VM/Compute Security – Use Antivirus/Antimalware to protect your compute infrastructure.
One more item that we would like to evaluate is Open Policy Agent as recommended by the CNCF foundation. This is a policy-based control across the Cloud Native Stack. (Its still under incubation since I last read)
An important set of guidelines to consider is the 12 Factor App principles for development on the cloud. Though, unfortunately, security is not addressed explicitly, the guidelines help in ensuring best practices are followed while building cloud native applications. Points like port binding ensure you don’t hard code your ports and disposability ensure you use secure key management tools like Azure Key Vault or GCP’s Cloud Key Management Service.
As part of VM protection look from brute force attacks. Standard admin ports can be timed-out and not kept open all the time. Azure provides Just-In-Time VM access to protect ports.
Application/SecDevOps
With security being the prime concern, expert opinion is to prioritize security over the rest and therefore bring it to the “left” and to practice SecDevOps rather than just DevSecOps. Though application is not a tangible layer, it can be thought of as a virtual layer.
It helps in putting in guard rails and best practices earlier in the development lifecycle. As is advised fixing an issue earlier in the SDLC is much more cheaper than later.
SecDevOps needs to be implemented from a tooling perspective also.
Secure your IDE
From a tooling perspective it is recommended to look at open-source tools like Sonar Lint or invest in commercial code analysis tools. These can be integrated into your IDE and can be an add-on to the Secure Code training that was discussed earlier. Anything that you can catch earlier before code check-in is a bonus.
Secure your CI/CD Pipeline
The pipe to production needs to be secured automatically – Some of the guard rails you can place are-
Static Application Security Testing (SAST) like HCL Appscan allow for security vulnerabilities to be identified without running the application. These tools help in shifting the vulnerability check to the left
We had put in gates during code check-in and pre-deployment to ensure vulnerabilities are not shipped
Dynamic Application Security Testing tools (DAST) that look for security vulnerabilities by running the application. These are mostly black box testing as they check for vulnerabilities from the outside
But many of these tools can take some serious amount of time to run and it might choke your pipeline and may need an alternate strategy
Some key points to secure your pipeline
When lifting-and-shifting legacy code without an existing CI/CD pipeline, it might throw bucket loads of vulnerabilities when implementing a new pipeline. Then, “it’s working now and will work the same way on the Cloud” will not cut it. The legacy vulnerabilities that sneaked in over the years and you were living with on premise will need to be taken care of when moving to the cloud. Our gate ensured “zero” vulnerabilities before going into production on the cloud.
Identity & Access Management
Role based access control (RBAC) is the industry standard for securing your Networks, Databases, Applications etc. RBAC with Active Directory provides granular control of access using roles and permissions.
When we started of our migration, Azure AD was not meeting our expectations and we had to build our own IaaS based Active directory and self-managed synching between various geos.
We are now replacing all our IaaS Active Directories to Azur AD considering the extra features like stronger password protection, extended multi-factor authentication, support for SaaS apps using OAuth2, SAML and more importantly for managing our secrets and credentials utilization of Azure Key Vault and Azure AD was the right fit.
In a hybrid-cloud you would also need to strategize synching between on-premises AD to Azure AD using Azure AD Connect
This will also impact your policy management. Do you want to replicate existing policies or take advantage of new features and create new policies.
Data
This is the core layer and the most sought after by attackers. From a financial services perspective data needs to be encrypted, we leverage Database-As-A-Service features to encrypt our data.
Transparent Data Encryption and Always Encrypted are two different encryption options
Its not sufficient to just encrypt the data its equally or more important to store the encryption keys and credentials – we leverage Azure Key Vault for secure storage of the keys and credentials.
Storage of critical data - Sometimes it might be mandated to store only mapping ids of critical client data in the cloud and refer to the on-prem for client data. This will have an implication on your overall architecture and design. You need to check if having only the IDs will be sufficient to carry out business process. Though the cloud storage may be cheaper to archive data, it may be required to return archived data to on-prem.
Disk Encryption – Ensure you leverage disk encryption if storing data in your VMs.
Ransomware Protection
If all else fails and data has been compromised, it will be extremely beneficial to have a fallback option to get your data back so that you are not held hostage.
An interesting offering is the framework provided by ShelteredHarbor a financial industry led initiative which prescribes standards for protecting against such calamitous events. One of the key architectural recommendations is to create an offline data vault. The data vault can a response to a security breach. The data vault is encrypted, and separated – kept offline from the enterprise’s infrastructure. If security is an overriding concern as the outcome of your threat modelling and cost is to be managed then a data vault makes good sense
Secure your operations
Once the systems are guard railed the operations teams take over and there are a multitude of acronyms/frameworks to help securely run the operations
ASOC (Application Security Orchestration and Correlation) – Consolidation of security events from across IAST, DAST, SAST tools
SIEM (Security Information & Event Management)– Consolidation of security events from Networks, Firewalls, DDOS Prevention Tools etc
SOAR (Security Orchestration & Automation – An automatic response to SIEM events, but since it has Orchestration there is a blurring of features between SIEM and SOAR.
Then there is SASE (Sassy) – Secure Access Service Edge a term coined by Gartner which describes a technology that delivers security controls as a service (so benefits of cloud like scalability and elasticity is included) directly to the source using Software Defined Access. This trend is picking up and I believe our next transformation program will need to consider this as remote and hybrid working will become the norm.
A table for phase wise security guard rails.
Note: Though they are quite a few references to specific cloud provider’s products, from a financial domain perspective you may need an architecture that is cloud agnostic. But in case you decide to leverage a cloud provider’s product you may need to be prepared with an Exit strategy and Exit plan. For example, the European Banking Authority has a set of recommendations/rules for outsourcing to cloud providers.
Preventing malicious attacks, nefarious activities, plugging vulnerabilities is complex. Hopefully a structured methodology of implementing preventive measures using the parallel tracks of Process and Infrastructure, as described above will be beneficial. Threats are constantly evolving so preventive measures can never be stopped and must also constantly evolve.
The above should give you a “gist” of key technical points. I hope the links and the thoughts will provide you to research further into each of the topics.
Great Article, Thank you for sharing
Very insightful article!! Security is key and much asked for a financial product/application. Thanks for sharing.
Nice article Ramesh Venkatraman! Security is a critical dimension that requires serious attention as one tries to move workloads to public cloud, especially in regulated industries. Thanks for sharing your experience!