

BiographyMichael Quell was born in 1993 in Vienna, Austria. He received his Bachelor's degree and the degree of Diplomingenieur in Technical Mathematics from the Technische Universität Wien in 2016 and 2018, respectively. After finishing his studies he joined the Institute for Microelectronics in June 2018, where he is currently working on his doctoral degree. He is researching high performance algorithms and data structures within the scope of the Christian Doppler Laboratory for High Performance TCAD. 
A Parallel Velocity Extension for LevelSetBased Material Flow on Hierarchical Meshes in Process TCAD
The levelset method is widelyused for high accuracy threedimensional topography simulations in process technology computeraided design (TCAD) due to its robustness to topological changes. Particularly challenging are material flow processes, such as oxidation, reflow and silicidation, as these require the solution of intricate physical models and the extension of the modeldependent velocity fields to the entire simulation domain at every time step in order to accurately compute advection.
The velocity extension is another computational task at every time step that is significant when one considers that material flow simulations can easily require several hundred time steps and are applied multiple times in cuttingedge fabrication processes of integrated circuits. Therefore existing scalar and vector velocity extension algorithms for levelsetbased material flow simulations on hierarchical meshes are optimized and parallelized, reducing the overall turnaround time of TCAD workflows.
The performance of the algorithm is evaluated by investigating a representative material flow simulation of a threedimensional thermal oxidation process of silicon at 1000℃ for 15 minutes. The initial material layout and the final material layout are shown in Fig. 1. Fig. 2 shows a parallel speedup of 7.1 for the vectorvalued extension and 6.6 for the scalarvalued extension for 10 threads, the latter outperforms a previous approach by up to 60%. The performance gain of the scalar velocity extension is attributed to two changes: i) a reduction in global synchronization barriers in the data exchange step between different meshes of the hierarchical mesh; and 2) a dynamic splitting of the workload of a task if it exceeds a certain threshold.
The original vector velocity extension uses a serial implementation based on the fast marching method, but the new implementation employs the same parallelization strategies as the new scalar velocity extension. The reason for a better parallel speedup of the vectorvalued extension over the scalarvalued extension is the higher workload per point of the vectorvalued extension (three times as many computations), thus reducing the synchronization overhead in relation to the total workload.
Fig. 1: Material layout before (left) and after (right) the thermal oxidation process.
Fig. 2: Comparison of the parallel speedup for the scalar and vector velocity extension implementations.