Individuals can vary drastically in their response to the same treatment, and this heterogeneity has driven the push for more personalized medicine. Accurate and interpretable methods to identify subgroups that respond to the treatment differently from the population average are necessary to achieving this goal. The Virtual Twins (VT) method is a highly cited and implemented method for subgroup identification because of its intuitive framework. However, since its initial publication, many researchers still rely heavily on the authors’ initial modeling suggestions without examining newer and more powerful alternatives. This leaves much of the potential of the method untapped. We comprehensively evaluate the performance of VT with different combinations of methods in each of its component steps, under a collection of linear and nonlinear problem settings. Our simulations show that the method choice for Step 1 of VT, in which dense models with high predictive performance are fit for the potential outcomes, is highly influential in the overall accuracy of the method, and Superlearner is a promising choice. We illustrate our findings by using VT to identify subgroups with heterogeneous treatment effects in a randomized, double-blind trial of very low nicotine content cigarettes.