Estimation of the parametric Accelerated Failure Time (AFT) model is no walk in the park. Much of the research points to two main methods. The first is Koul et al (1981) in which he uses a synthetic data approach; and Stute (1993, 1996) in which the author uses WLS regression. These are often preferred to other estimators due to their computational simplicity.
HOWEVER, in all cases the method of Koul performs worse than that of Stute. This is because an AFT model requires an estimate CDF for survival probabilities. In Koul's theoretical paper, the authors do not use the Kaplan-Meier estimator because they note ,"in our proof we need to take the log), and using a KM estimator can potentially lead to taking the log of 0. However, if we replace Koul's estimate of the survival probabilities with the Kaplan Meier estimate, the bias in the estimates essentially goes away. Furthermore, by assuming that max(dependent_variable) is uncensored, we can avoid the divide by 0 dilemma. To add to the ambiguity, other papers that cite Koul claim that Koul uses the Kaplan-Meier estimate when he actually doesn't.
My issue is if I cannot find a theoretical justification for using the KM estimate in Koul's mehtod, I cannot use it to code. That said, is there any theoretical adjustment of adapting Koul's synthetic data method and replacing the survival probabilities with the KM estimate? I have looked but haven't been able to find any.
No comments:
Post a Comment