Programming/designing for scale

Programming/designing for scale

There are some terms or skills (e.g. scale, cloud centric) which can be seen in any job description for senior developers(By developers, I mean anyone involved in development of a software application - implementers, designers/junior architects and dev managers).I wish these skills could be learnt by following any particular language/framework, but that is not the case. So we face two very basic questions -

  1. What is expectation when you are supposed to code/design/architect for scale (can also be rephrased as 'for cloud'),
  2. How to transition from single process thinking to uncertain and complex distributed world.

 

Disclaimer first, I don't claim to be an expert to answer these questions, but I am definitely one who is looking out for answers for quite some time. My source of knowledge has been extensive reading and some implementations limited to JEE stack. If you are using some application server(e.g. Weblogic/JBoss for JEE applications, or IIS for Microsoft technologies etc.), they help in solving many of the problems in distributed applications e.g. transaction management, security, session management(prefix word 'distributed' in all of these terms and they become really complex). There are lots of technical materials available to learn how these problems are solved by each solution, but here we care about problems, not solutions. Being aware of problems is more important, solutions will come.

 

Coming back to the two questions now. The answer to 'a' can be summed up as being aware of problems in a distributed application. These are bottlenecks in scaling your application. Following are some limited list of suggestions when developing an application for web -

  • Session Management - Avoid sessions if your application don't really need them. Stateless applications scale best. Session management is an expensive job for which you depend on application server. If that is not avoidable, try to keep minimum stuff in your sessions, this helps your session management component to sync session data faster across nodes in cluster. Persisting session data on file system has its own issues(for short lived sessions, if the node handling session request goes down, the session info persisted in its file system will be read after node restart only, by which time it may be too late).
  • Security - Move most of security related stuff to another service(SSO is one way), outside your web application. It helps modifying and applying your security settings at one place only, without any downtime to existing nodes.
  • Deployment strategy - How you add a new node to your cluster, how easy is that. Do you have too many parameters that you need on application start-up or some files on (shared) file system that your application depends on. Both of these can make it difficult to scale. Try to minimize number of classpath variables and arguments required by your web application.
  • Move configuration outside application code - The web application archive(war in JEE terminology) should not have any configuration files that you think can be changed on-the-fly. All such files should be centralized since configuration for entire cluster has to be same. Apache Zookeeper is one such solution that provides distributed configuration management, but same can be accomplished by storing all configurations in your database and maintaining a distributed cache of these configurations so that any change can be notified to all nodes in cluster. Later one should be cheaper as most of the applications already maintain cache for their domain related data.

 

There is a big list of TODO's(and more importantly NOT TODO's), but I would better hear from you before adding more.

 

The answer to 'b' is related to 'a'. Apart from taking care of above I would suggest to start installing clusters. We developers, since early days of our careers(and also for productivity) install single nodes on our system and hence do not face these issues. Learning to install clusters and load balancers will take you closer to how things actually work in real life.Apart from steps mandated by particular software used for load balancing; there should not be much to it. But the effort is worth doing.

 

This post is just a thought that has been in my mind and could compile only because I woke up toooo early in the morning today :)

Hope it strikes a chord!

 

good insight and suggestion(installing cluster in local machine). Thanks for post!

To view or add a comment, sign in

More articles by Kuldeep Tiwari

  • Application Development Stack

    This is an exciting times for application development where we see new challenges and tools/technologies/design…

  • Planning to use Akka ?

    For quite some time I have been pondering over introducing Actor based concurrent programming model toolkit Akka in a…

    2 Comments

Others also viewed

Explore content categories