There has been an increasing use of multisite randomized trials in evaluations of educational programs. Multisite designs provide unique opportunities for investigating between-site heterogeneity in the mediation mechanism that characterizes an educational process central to a program theory. Re-analyzing data from the National Job Corps Study, a multisite randomized evaluation, this dissertation develops methods for empirically examining the Job Corps program theory. Job Corps is the nation’s largest education and training program for 16-24 year old disadvantaged youths, most of whom had dropped out of high school. Previous research has suggested that Job Corps generated a positive average impact in promoting economic independence. However, the impact was not uniform across all the sites. The multisite data allow us to further investigate whether the central program element, i.e. educational and vocational training, played the same mediating role across sites, and whether the role of other program elements was consistent over the sites. Such evidence will be crucial for enriching theoretical understanding and for informing the design and implementation of education programs alike. However, due to some important constraints of existing analytic tools, analysts have rarely investigated between-site heterogeneity of mediation mechanisms in multisite program evaluations. ,To enable researchers to assess the generalizability of an education program theory across a wide range of contexts, this dissertation develops a comprehensive weighting-based analytic procedure for multisite causal mediation analysis. The procedure utilizes a propensity score-based weighting strategy to flexibly decompose the average program impact at each site into a direct effect and one or two indirect effects, the latter being transmitted through one or two hypothesized focal mediators. To enhance the external and internal validity of causal conclusions, I further incorporate a sample weight to adjust for complex sample and survey designs and employ an estimated non-response weight to account for non-random attrition in the longitudinal follow-ups. Extending a theoretical model of causal inference under the potential outcomes framework, I conceptualize the population average and the between-site variance of the direct and indirect effects and identify them based on the above weights. For the estimation and inference of the causal parameters, I develop a method-of-moments procedure that takes into account the sampling variability of the estimated weights. Finally, I use a weighting-based balance checking procedure to assess if the weighting adjustment effectively reduces selection bias associated with the observed covariates and adopt a weighting-based sensitivity analysis strategy to assess the consequences of potential violations of key identification assumptions. ,I employ the proposed analytic procedure in an in-depth evaluation of Job Corps. The empirical results lend support to the program theory that Job Corps promotes economic well-being among disadvantaged youths through education and training. The results also highlight the crucial role of support services for reducing behavioral and health risks and reveal the need for standardizing the quantity and quality of such services across Job Corps centers.