Tedious tuning

XGBoost have plenty of adjustable parameters, which makes it powerfull. Problem arises if we fine-tune those parameters. Full gridsearch will take ages (ex. if we test 3 values for 10 parameters, it will give 310 = 59049 combinations).

I’ve started using another method from Analytics Vidhya, where parameters are ajusted pairwise, from the most influential to the less one:

  1. n_estimators
  2. max_depth, min_child_weight
  3. gamma, min_child_samples
  4. n_estimators
  5. subsample, colsample_bytree
  6. reg_alpha, reg_lambda
  7. n_estimators, learning_rate

It significantly reduces complexity ( test 3 values for 10 parameters will give 3* 5 = 45 combinations). I’ve been using this method since a year. It’s satisfying, but there is one drawback: pairwise searching takes time. You need to wait for the result of the first pair, update model and test another pair. Waiting time is enough long to get bored and to less to get involoved in other activities.

xgboost-AutoTune

To improve it, I wrote a library xgboost-AutoTune, which automatically choose xgboost.XGBRegressor parameters pairwise. The only data which user must provide are:

  • model (with or without initial parameters),
  • scoring method,
  • input data,
  • output data
  • min_loss.

Min_loss is crutial. If newly chosen parameters improve scoring by more than min_loss, algorithm will perform more precise search. Otherwise, it will go to tune the next pair.

Library is designed for Python 3.5. Code and implementation datails are available here.