Large-scale finite element analysis (FEA) with millions of degrees of freedom (DOF) is becoming commonplace in solid mechanics. The primary computational bottleneck in such problems is the solution of large linear systems of equations. In this paper, we propose an assembly-free version of the deflated conjugate gradient (DCG) for solving such equations, where neither the stiffness matrix nor the deflation matrix is assembled. While assembly-free FEA is a well-known concept, the novelty pursued in this paper is the use of assembly-free deflation. The resulting implementation is particularly well suited for large-scale problems and can be easily ported to multicore central processing unit (CPU) and graphics-programmable unit (GPU) architectures. For demonstration, we show that one can solve a 50 × 106 degree of freedom system on a single GPU card, equipped with 3 GB of memory. The second contribution is an extension of the “rigid-body agglomeration” concept used in DCG to a “curvature-sensitive agglomeration.” The latter exploits classic plate and beam theories for efficient deflation of highly ill-conditioned problems arising from thin structures.