Back to Basics: System Architecture

Back to Basics: System Architecture

In my previous post of the "back to basics" series, I wrote about software architecture. If you like to read it before proceeding, it's found here.

Communication

Purpose of any client-server software application is to serve a representation of information, via a transport protocol. An example is serving a JSON representation (i.e. hypertext) of customer data, via HTTP protocol.

The transport layer can exchange data in two schemes:

  • Stream-oriented, which requires end-to-end connection between client and server.
  • Message-oriented, where a prior connection is not needed (e.g. message-broker).

Due to an uptick in hacking of software systems, clear-text communication is frowned upon. It would very likely compromise a system. Improve your transport security by applying end-to-end encryption between client and server.

No alt text provided for this image

Validation

Validating user-input is a critical function of any application architecture. Input validation achieves two objectives:

  • Ensures data is stored in proper format and representation.
  • Reduces attempts of invasive access (e.g. prevent SQL injection).

If you have control over client requests (e.g. web application, client SDK), push as much of the validation to the client, as it would provide rapid feedback.

But be cautious of skipping on redundant validation server-side. Operate on a no-trust basis with any user-input, the user can attemtp to circumvent client-side validation. The purpose of client-side validation is to improve user-experience and reduce the number of round-trips to the server, it's not meant to replace server-side validation.

Users of a system are a main vector of attacks against it, and they are a source of vulnerability. Validation can reduce such risk from users.

No alt text provided for this image

The Application

As demand on a software application grows (more relevant to SaaS systems), a single instance of the application in a production environment may not be sufficient to handle the load. To scale the system to demand, make use of a load-balancer, which is a proxy between the client and the application. It allows multiple nodes of the application to run concurrently, and thus, handle more requests. The load-balancer can employ one of a number of strategies to distribute the load between the nodes. Benefits of using a load-balancer include:

  • The system can scale up as demand grows (and scale down if demand declines, especially when pay-as-you-go cloud services are used to host the application).
  • Reduces downtime, since each node can be exchanged without stopping the system.
No alt text provided for this image

The Database

Almost all software systems store and retrieve data from some storage. In the case of relational databases, it's best to normalize the schema:

  • To reduce data redundancy.
  • To allow the schema to be easily extensible and scalable.

One strategy to speedup lookup time on a database is to index database columns that are read frequently. But limit indexing to such columns only, as indexing increases write-time to a database table.

When querying the database for a dataset, it's best to avoid complex queries (e.g. sub-queries, loops, using stored procedures or functions). It's best to apply atomic queries and leave the filtering to the application, in order to reduce connection time and CPU load on the database. Since there usually is one instance of the database, complex queries can overtax its resources. Many application nodes can be spun up to do the work instead.

Many relational databases employ a connection pool to limit the number of concurrent connections. It's best to limit the pool size to a small number, and set the timeout on connections to few seconds, so that it fails quickly, instead of hanging on to the connection until it times out. With this approach, database resources are released faster to be available to other clients. It reduces the chance of exhausting database resources.

To maintain accurate transaction timelines of CRUD operations, it's best practice to use UTC timestamps. Organizations that rely on local timestamps find it difficult to trace back what-happened-when. UTC timestamps ensures less confusion.

Failover Redundancy

No system is perfect, and failures happen. One of these failures could affect access to data. To protect against such failure, it's prudent to have a back-up of the database that is updated in real-time or near real-time, and could be swapped with the main database in case of failure, to ensure continuous operation of the system with minimal disruption.

Database Capacity

As the database records grow, storage space will decrease. This mandates strategies to avoid hitting the ceiling of available space. A couple of such strategies include:

  • Warehousing historical records that serve no further use in transactions.
  • Sharding, this is a better strategy of scaling the database.

Data is the orb around which everything else in a system orbits. It's the primary target of unauthorized access attempts. I described the importance of data security on transport, it is also of paramount importance to secure data at rest. Below are some methods to achieve that:

Data Encryption

Apply encryption as much as possible. At minimum, it should apply to passwords and any Personally Identifiable Information (PII). Data encryption can take two forms:

  • Encryption by key, where it can be decrypted using the key.
  • Hash function, which is a one-way encryption scheme that can't be deciphered. The original data is mapped to its hash, but not the opposite.

Hash encryption is ideal for data that the business would not need to access (e.g. passwords). For data such as customer phone number, using key-encryption is appropriate.

An example of data security is a company that hosts identity and user management API. After a user password is hashed, a portion of the hash is stored in one database, and the rest of it is stored in a different database altogether, not co-located with the first one.

Database Encryption

To take database security a step further, encrypt the database files themselves. Thereby, even if some unauthorized user gets hold of the database itself, they will have a difficult time extracting its content.

Proxy Access

To take database security even further, firewall the database behind a DMZ, and limit all access to it thru a proxy. With the proxy:

  • No user is allowed to access the database directly, obfuscating the source of data.
  • Have an audit trail of all access to the database.
No alt text provided for this image

Caching

Caching data is a good practice, because it reduces access time to data and reduces load on the database. Light-weight memory-caching databases have low overhead on writes, and have fast lookup times. Caching non-transactional data that is accessed frequently and seldom changed can provide a great boost to system performance. Beware of having cached data become stale and out of date. Update the cache periodically or as needed.

Using client-side caching can save the client from server-side lookup. Non-sensitive and non-critical data (e.g. user settings) are well suited for client-side caching. Client-side caching tools vary from HTTP cookies to light-weight storage.

No alt text provided for this image

Analytics

It's a good practice to store analytical data in a separate database server from transactional data. It frees transactional database resources to serve client requests.

For non-real-time analytics, update the analytics database during times when transactional database is least active (e.g. overnight). Run cron jobs to read data, anonymize them and roll them up for analytics.

No alt text provided for this image

Logs

Logs are nowadays customary in system architecture. Logs should record enough detail such that they would provide a raw record of all access to application. The log can also be a good reference to trace and debug any encountered issues.

That said, logging sensitive data is a common mistake. when an application secures data on transport, and secures it at rest, it defeats the whole purpose of security to store plain-text data in log entries (some major data hacks took place by scraping logs containing sensitive data). For example:

  • Delegating password hashing to the database means that database logs would have plain-text log of the password. Encrypt the password in the application.
  • It's a practice of ISPs to log URLs of requests they process. Accepting user password token as a URL query string parameter means the ISP logs would capture it. Place the password in the body of HTTP request, instead of the URL.

As with database timestamps, it's good practice to tag log entries with UTC timestamps too.

No alt text provided for this image

Authentication and Permission

Last but not least, authentication and permission. Authentication and permission are some of the principle tools of the CIA information security triad:

  • Confidentiality: authentication and access control of Personally Identifiable Information and business information.
  • Integrity: protect data from unauthorized modifications.
  • Availability: ensuring data is available when needed.

Permission to access data should apply principle of least privilege, which means:

  • Time: shortest time necessary.
  • Scope: smallest scope required.
No alt text provided for this image

Finally, don't wait until your systems are overwhelmed, hacked or leaked. It will be a lot more expensive to restore people's faith and trust.

No alt text provided for this image


To view or add a comment, sign in

More articles by Ahmad A.

  • Software Engineering Management: How I Increased My Team's Delivery Rate to 81%

    Nine months ago, I stepped into the role of Engineering Manager at Xplor Pay, where the business division operates a…

  • Software Project Planning

    In my previous post of the "back to basics" series, I wrote about software testing. If you'd like to read it before…

  • Back to Basics: Software Testing

    In my previous post of the "back to basics" series, I wrote about UI design. If you'd like to read it before…

  • Back to Basics: User Interface

    In my previous post of the "back to basics" series, I wrote about data flow. If you like to read it before proceeding…

  • Back to Basics: Data Flow

    In my previous post of the "back to basics" series, I wrote about system architecture. If you like to read it before…

  • Back to Basics: Software Architecture

    Most software products employ client-server architecture. A client requests data or information, and the server…

Others also viewed

Explore content categories