Research Papers

Stochastic Simulations With Graphics Hardware: Characterization of Accuracy and Performance

[+] Author and Article Information
Arvind Balijepalli

Manufacturing Engineering Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899; Department of Mechanical Engineering, University of Maryland, College Park, MD 20742arvind@nist.gov

Thomas W. LeBrun

Manufacturing Engineering Laboratory, National Institute of Standards and Technology, Gaithersburg, MD 20899lebrun@nist.gov

Satyandra K. Gupta1

Department of Mechanical Engineering, University of Maryland, College Park, MD 20742skgupta@umd.edu

Certain commercial entities, equipment, or materials may be identified in this document in order to describe an experimental procedure or concept adequately. Such an identification is not intended to imply recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that the entities, materials, or equipment are necessarily the best available for the purpose.


Corresponding author.

J. Comput. Inf. Sci. Eng 10(1), 011010 (Mar 10, 2010) (11 pages) doi:10.1115/1.3270248 History: Received May 27, 2008; Revised January 07, 2009; Published March 10, 2010; Online March 10, 2010

Methods to implement stochastic simulations on the graphics processing unit (GPU) have been developed. These algorithms are used in a simulation of microassembly and nanoassembly with optical tweezers, but are also directly compatible with simulations of a wide variety of assembly techniques using either electrophoretic, magnetic, or other trapping techniques. Significant speedup is possible for stochastic particle simulations when using the GPU, included in most personal computers (PCs), rather than the central processing unit (CPU) that handles most calculations. However, a careful analysis of the accuracy and precision when using the GPU in stochastic simulations is lacking and is addressed here. A stochastic simulation for spherical particles has been developed and mapped onto stages of the GPU hardware that provide the best performance. The results from the CPU and GPU implementation are then compared with each other and with well-established theory. The error in the mean ensemble energy and the diffusion constant is measured for both the CPU and the GPU implementations. The time taken to complete several simulation experiments on each platform has also been measured and the speedup attained by the GPU is then calculated.

Copyright © 2010 by American Society of Mechanical Engineers
Your Session has timed out. Please sign back in to continue.



Grahic Jump Location
Figure 1

Conceptual map of the GPU rendering pipeline. The three main stages are physically represented in the center of the figure while examples of their memory models (scatter and gather) are shown at the bottom.

Grahic Jump Location
Figure 2

Calculate z=dx/dt+y using the postprocessing stage. The green squares in the figure are inputs to the program and the orange squares are the outputs.

Grahic Jump Location
Figure 3

Three user-defined programs chained together to implement the velocity Verlet integrator in the postprocessing stage of the GPU

Grahic Jump Location
Figure 4

Log linear plot of normalized energy (ESimulated/ETheoretical) and normalized diffusion constant (DSimulated/DTheoreitcal) against time-step (δt) for an ensemble of 30 glass particles with radius 500 nm suspended in a water bath at 293 K. This data in this plot is used to pick the parameters for the full simulation with 900 particles.

Grahic Jump Location
Figure 5

Log-log plot of the average particle displacement after 1×106 time-steps at δt=0.01τ as a function of particle diameter from Eq. 6. The gray region shows the effect of ±10% error in the diffusion constant. Smaller particles exhibit larger excursions from their initial positions as a function of particle diameter; however, the average displacement is no larger than 1/10×d for the finite-time simulation.

Grahic Jump Location
Figure 6

Relative error observed by subtracting a reference CPU64 position trajectory for a single 500 nm particle from those of the CPU32 and GPU32 implementations. All trajectories are generated using the same list of random deviates. Figure (a) shows the error for the GPU32 case for three separate time-step values and figure (b) shows the error for the CPU32 case.

Grahic Jump Location
Figure 7

Speedup (calculated as the ratio of the GPU simulation time over single-precision CPU simulation time) as a function of the ensemble size. The three curves show the results when the output was sampled after every 10, 100, and 500 simulation time-steps (δt).




Some tools below are only available to our subscribers or users with an online account.

Related Content

Customize your page view by dragging and repositioning the boxes below.

Related Journal Articles
Related eBook Content
Topic Collections

Sorry! You do not have access to this content. For assistance or to subscribe, please contact us:

  • TELEPHONE: 1-800-843-2763 (Toll-free in the USA)
  • EMAIL: asmedigitalcollection@asme.org
Sign In