Modern black-box models are recognized for their predictive accuracy and ability to capture complex interactions in data. While these models excel at prediction, they are difficult to interpret and may fail to uncover useful relationships in the data. They also have high storage and evaluation costs due to their massive sizes. Using the example of decision tree ensembles (random forests, boosting), we propose a framework to extract compact sets of decision rules at the post-training stage. Our estimator is given by solutions to a discrete optimization problem balancing multiple considerations such as prediction accuracy, number of rules, interaction depth of a rule, number of features used, etc. We can also incorporate somewhat non-standard constraints such as stability of the selected rules, routing patterns of the samples, etc., within the optimization framework. We develop specialized algorithms to address these optimization problems at scale that appear to be well beyond the capabilities of off-the-shelf commercial solvers, and discuss their statistical properties. Time permitting, I will discuss how related compression problems arise in the context of sparsifying large neural networks/LLMs where the main goal is to improve model efficiency, and how computational statistical tools from sparse regularization appear to be promising.