Thursday, November 19, 2009

The birth of QuickThread

Most parallel programming extensions (OpenMP, Threading Building Blocks, Cilk, etc.) have a design heritage of incrementally adding parallel constructs into (onto) the primary programming language (e.g. C, C++, FORTRAN, etc…) without consideration of specific requirements of a particular application. These parallel programming extensions become the culmination of the incremental steps. As features are required, they may or may not get incorporated into the programming extension.

Although QuickThread has a birth date of November 18, 2009 (date it became available for purchase), it has a rather long gestation period and courtship. The development time was over a period of four years. The initial design criteria were to solve a specific problem as that problem requirements would evolve over a period of 10 to 15 years. The architectural design, and principal functionality, of QuickThread was made in 2006 for how I envisioned computing platforms would evolve through to 2020. The design criteria were to solve a particularly difficult engineering problem that needs to be solved within that time frame.

The engineering problem is a high fidelity simulation of a 2nd generation space elevator (here). Whereas a 1st generation space elevator is generally a passive tethered satellite using one tether of approximately 100,000 km, a 2nd generation space elevator is constructed using several dynamic tethers (tethers in motion). The total length of tether material for a 2nd generation space elevator might reach 300,000 km.

A high fidelity simulation might require examining the dynamics of the tether using 1m segments. Or about 300 million tether segments, each segment requiring 100-300 points of data. And with simulated (virtual) run times of multiple years. This requires a massive amount of computation.

The computational requirements of the simulation was amenable using an SMP processing architecture, but not amenable using message passing architecture. At the time of the design of QuickThread, AMD had NUMA capable Opteron dual core processors. I anticipated that Intel would offer many core NUMA capable processors in the then near future and that it would be reasonable to expect 8, 16 or 32 core NUMA capable processors in 2010 to 2015. And systems configured using multiples of these processors would be suitable for computing mid-fidelity simulations of 2nd generation space elevators.

With the computational requirements (I personally) demanded, the internals of QuickThread had to be fully NUMA capable and support multiple levels of processor cache and multiple levels of NUMA separation. This also required the thread scheduler to be fully cognizant of the platform capabilities without requiring programming change as platforms evolved, and to be able to schedule tasks to the cache level and/or NUMA node.

QuickThread also had to be an asynchronous tasking system with independent task group synchronization capabilities. This requirement was the principal design requirement of the thread scheduler within QuickThread. As a secondary, but nearly as important design requirement, was for the integration of the language extension to be relatively easy for the programmer in use with large legacy applications. In my case, the simulation code I began with had roots extending back to the mid 1980’s and consisted of approximately 750 source modules and 600,000 lines of code.

This was the basis for the initial design requirements of QuickThread. Additional posts will not delve too much into the historical aspects of QuickThread but rather into the features, capabilities and programming ease.

Jim Dempsey

No comments:

Post a Comment