Decoding Performance: A Scalability Showdown Between CalculiX and Code_Aster

Decoding Performance: A Scalability Showdown Between CalculiX and Code_Aster

In the demanding world of Finite Element Analysis (FEA), performance is king. When tackling complex engineering problems, especially with large datasets and intricate geometries, how efficiently your software scales with increasing computational resources can make or break your analysis time. This is precisely what we set out to investigate: a deep dive into the scalability of two prominent open-source FEA solvers: CalculiX and Code_Aster.

Our journey was driven by a common pain point for many users of open-source software: the sheer number of configuration options. With variables like geometry, mesh, loads, and conditions, it’s easy to get lost in the labyrinth of possibilities. To bring clarity, we performed a rigorous study, testing over 300 variations of the same FEA problem to establish some practical guidelines for users, even if they are highly specific to our benchmark.

Understanding Scalability and Speedup

Before we dive into the results, let’s quickly define our key metrics:

  • Scalability: Simply put, scalability is about how much the solution time decreases as we increase the number of hardware resources (like CPU cores) used.
  • SpeedUp (S): This measures how much faster a problem can be solved when increasing the number of CPUs. The formula is S = T1 / TN, where T1 is the time taken with one CPU and TN is the time taken with N CPUs.
  • Efficiency (E): This tells us how effectively the speedup is achieved relative to the increase in CPUs. The formula is E = S / N = T1 / (N * TN).

As we increase the number of CPUs, speedup generally increases until a certain limit is hit. Pushing beyond this limit can lead to diminishing returns, meaning low efficiency, no speedup at all, or even a reduction in performance. This limit, however, is not fixed and depends on several factors, including the software version, hardware used, and the nature of the simulation (e.g., the models involved and the mesh density).

Our Testbed: Hardware and Software

To conduct our experiments, we utilized a Cloud HPC environment, specifically leveraging:

  • CPUs: AMD EPYC 7B13 @ 2.8GHz and Intel(R) Xeon(R) PLATINUM 8581C CPU @ 2.9GHz. We explored configurations with varying core counts and threads per core (e.g., XX-YY where XX is the number of cores and YY is the number of threads per core).
  • RAM: 8 GB per vCPU allocated.

On the software front, we tested:

  • Code_Aster 17.0: Utilizing a Singularity container compiled in-house.
  • CalculiX 2.21: Publicly available version, as well as versions compiled in-house with PARDISO (v2.19) and PARDISO with MPI (v2.18).

The FEM Model: A Consistent Benchmark

To ensure a fair comparison, we employed a single, well-defined FEA model:

  • Analysis Type: Static linear analysis.
  • Elements: 3D elements with a tetrahedral mesh.
  • Loading: Constrained with gravity force only.
  • Post-processing: Not considered; only displacement was calculated.
  • Size: Approximately 3,984,763 cells and 865,432 nodes.


Article content

We also explored different solver configurations within both Code_Aster and CalculiX, including various MUMPS and PETSc options for Code_Aster, and default, Spooles, PARDISO MultiThread, and PARDISO MPI for CalculiX.

Key Findings and Observations:

Our extensive testing revealed some compelling trends:

1. Code Aster Scalability with Hyperthreading: Increasing the number of threads in CalculiX showed time reductions, but this was primarily observed with a limited number of threads per core (up to 4 threads per core). Specific solver configurations like PETS-4 and MULT-5 / MUMP-2 showed the most promising hyperthread scalability within this range.


Article content

2. Code Aster Scalability with Multi-Core: The time reduction with increasing CPU cores was most significant when combined with 2 threads per core. Similar to hyperthreading, certain solver configurations like PETS-4 and PETS-3 demonstrated better multi-core scalability under these conditions.


Article content

3. Intel vs. AMD Performance: When comparing Intel and AMD processors, we found no major difference in computational speed for the tested configurations. The primary distinction lay in the LLC (Last Level Cache) allocation, which didn’t translate into significant performance disparities in our benchmark.


Article content

4. NITER and Scalability: Testing with NITER = 10 (both hyperthread and multicore) provided further insights into how the number of iterations affects scalability. The results generally aligned with previous observations, with specific solvers showing better scaling tendencies.


Article content

5. STAT vs. MECA Scalability: Comparing MECA_STATIQUE and STAT_NON_LINE solver types in Code_Aster, we observed that MECA_STATIQUE generally offered better scalability across various configurations.


Article content

6. CalculiX vs. Code_Aster Comparison: This was a crucial part of our study. We observed that CalculiX, particularly with its PARDISO solver (both with and without MPI), often demonstrated superior scalability compared to Code_Aster in our benchmark. CalculiX with PARDISO could effectively scale up to 200,000 nodes per core, while Code_Aster’s thread scalability had a more pronounced limit (around 2-4 threads, with some solvers reaching up to 100,000 nodes per core but with significant RAM consumption).


Article content

Conclusions and Recommendations:

Our in-depth analysis leads to these key takeaways:

For Code_Aster:

  • MECA_STATIQUE is the preferred solver for better scalability.
  • PETSc solvers also show good performance.
  • Be mindful of thread limits; excessive threads (beyond 2, and at most 4) might not yield proportional speedups and can lead to increased RAM consumption.
  • Intel and AMD architectures showed comparable performance in our tests.

For CalculiX:

  • The PARDISO solver (with or without MPI) is a strong contender for achieving excellent scalability.
  • CalculiX demonstrates impressive scalability, handling up to 200,000 nodes per core effectively.

This study underscores the importance of understanding your software’s scalability characteristics and how they interact with your hardware and the specific problem you’re trying to solve. While our findings are based on a specific benchmark, they offer valuable insights for users looking to optimize their FEA workflows with open-source tools.

Have you had similar experiences with CalculiX or Code_Aster? Share your thoughts and findings in the comments below!

Great work 🙌 Congratulations 🎉👏 You might be interested looking at the open-source scientific python framework I'm developing: https://www.suffisciens.com/nuremics Might be also interesting for your community. I'm also developing this kind of benchmarking comparing SOFA Framework versus #OpenRadioss. You can discover on the website of the framework the #BuildInPublic initiative I'm developing using this framework. The first use case I'm looking at is a classical Computational Structural Mechanics benchmark of cantilever beam subjected to end shear force. I'm planning to compare 3D solids #FiniteElements, 2D shells and 1D beams modelizations. Looking forward to engage with you and your community! Take care, Julien

Like
Reply

Do I read correctly that Code_Aster was containerized while CalculiX was directly deployed on the host system?

Like
Reply

I didn't get it. Who wins? Is this linear elasticity only? Can I compare against FeenoX?

I haven't used Aster in a long long time ... the UI is not very intuitive and I struggled doing even simple stuff. Its getting better but its not as simple as PrePoMax or FreeCAD ;) Calculix is more than just linear static analysis and it really excels at doing complex dynamics and even fluids simulations so I am not sure this is a fair comparison. Both are very capable pieces of code, and one should really use whatever tool gets the job done. Here's a list I put together years ago (things at run on Linux for FREE): Mystran Nastran Calculix LS-dyna Salmone - code_aster and CFD Elmer FEATool Moose Open Radioss Z88/Z88Aurora Could take a lifetime to learn them all !

To view or add a comment, sign in

More articles by Ruggero Poletto

Others also viewed

Explore content categories