We study three wave function optimization methods based on energy minimization in a variational Monte Carlo framework: the Newton, linear, and perturbative methods. In the Newton method, the parameter variations are calculated from the energy gradient and Hessian, using a reduced variance statistical estimator for the latter. In the linear method, the parameter variations are found by diagonalizing a nonsymmetric estimator of the Hamiltonian matrix in the space spanned by the wave function and its derivatives with respect to the parameters, making use of a strong zero-variance principle. In the less computationally expensive perturbative method, the parameter variations are calculated by approximately solving the generalized eigenvalue equation of the linear method by a nonorthogonal perturbation theory. These general methods are illustrated here by the optimization of wave functions consisting of a Jastrow factor multiplied by an expansion in configuration state functions (CSFs) for the C2 molecule, including both valence and core electrons in the calculation. The Newton and linear methods are very efficient for the optimization of the Jastrow, CSF, and orbital parameters. The perturbative method is a good alternative for the optimization of just the CSF and orbital parameters. Although the optimization is performed at the variational Monte Carlo level, we observe for the C2 molecule studied here, and for other systems we have studied, that as more parameters in the trial wave functions are optimized, the diffusion Monte Carlo total energy improves monotonically, implying that the nodal hypersurface also improves monotonically.