Tree-based reinforcement learning for estimating optimal dynamic treatment regimes