The impact of imbalanced training data on machine learning for author name disambiguation