photo by matt artz from unsplash.

Extremely Fast and Cheap Decision Trees

Ziheng Wang
3 min readJul 14, 2021

--

Gradient-boosted trees (GBT) dominate most data science applications in industry (and Kaggle) due to their superior accuracy and interpretability. In a deployment scenario, their latency and cost can become a concern, especially in enterprise settings where thousands of models might be deployed simultaneously. Large data-driven enterprises could easily spend upwards of tens of thousands of dollars per month crunching these trees.

Recently there have been several academic and industrial attempts at optimizing decision trees for inference, the most notable being Treelite. Treelite is used by AWS Sagemaker and is typically the first tool people go to. Unfortunately, the performance of Treelite still falls short for many heavy GBT users in finance and retail, who have handcrafted their own real-time tree-inference libraries.

If your organization could benefit from cheaper and faster GBT inference, yet do not wish to embark on the expensive and risky endeavor of engineering your own in-house tree inference library, you are in luck. We at 172.ai (oneseventwoai.com) are releasing the beta version of our GBT optimization API, 172Trees, soon! Our API currently targets batch deployment scenarios with batch sizes as small as 1 to as large as millions.

How does our performance stack up against Treelite? Here are some comparisons. All experiments are done on a c5.xlarge instance on AWS with four threads. We use an Xgboost model with default parameters (max_depth = 6, n_estimators = 100) trained on random data with different number of features. The following chart showcases the speedups 172Trees achieves over Treelite on a batch of 5000 data points. All timing measurements use the Python API of the respective libraries.

We see that in this setting, Treelite performs more than 2x better than native Xgboost. However, 172Trees still achieves more than 2.5x speedup over Treelite in all cases. 172Trees is able to accomplish these speedups through a series of tricks including eager evaluation of tree nodes and just-in-time code compilation targeting the latest instruction-set architectures.

In a high-throughput streaming business use case, this means that 2.5x fewer CPU-hours are needed to host the GBT model in a production setting, saving 60% of the cost. In financial or other latency sensitive use cases, this reduces prediction latency and enables larger and more accurate models to be run under the same latency constraints.

The 172Trees API works similarly to the compilation phase of Treelite. It takes a serialized Xgboost/LightGBM/Scikit-learn/SparkML model, and returns an optimized shared library object which can be used with a free Python or C++ wrapper library. You could also write your own wrapper to suit your own use cases. You can upload the API to our beta endpoint, or host the API yourself locally on on the cloud under license.

We are currently welcoming beta customers to trial the API for free. We are also more than happy to discuss your use case with you. If you are interested please contact at oneseventwoai.com.

--

--