Uplift modeling aims to optimize treatment policies and is a promising method for causal-based personalization in various domains such as medicine and marketing. However, applying this method to real-world problems faces challenges such as the impossibility of validation and binary treatment limitation. The Contextual Treatment Selection (CTS) algorithm was proposed to overcome the binary treatment limitation and demonstrated state-of-the-art results. However, previous experiments have implied that CTS is cost-ineffective because it requires a large amount of training data. In this paper, we demonstrate that the estimator maximized in CTS is biased against the true metric. We then propose a variance reduced estimator based on the doubly robust estimation technique that provides unbiasedness and desirable variance. We further propose a treatment policy optimization algorithm called VAriance Reduced Treatment Selection (VARTS), which maximizes our estimator. Empirical experiments on synthetic and real-world datasets demonstrated that our method outperforms other existing methods, particularly under realistic conditions such as small sample sizes and high noise levels. These theoretical and empirical results imply that our method can overcome the critical challenges of uplift modeling and should be the first choice for optimizing personalization in various fields.