The ability to perform in-context learning — solving new tasks by conditioning on a prompt with instructions and in-context training examples — is a remarkable property of language models such as GPT-4. But how it emerges from training at scale is a mystery. In this talk, I will discuss two projects that make some progress towards unraveling this mystery. First, we show that when the pre-training distribution is a mixture of HMMs, in-context learning can be interpreted as implicit Bayesian inference, and we develop a small synthetic dataset where Transformers and LSTMs both exhibit in-context learning. Second, we consider in-context learning of well-defined function classes such as linear regressors, decision trees, and neural networks. We show that we can train Transformers that can perform learning of these non-trivial function classes via the feedforward pass. We find that the algorithms represented by the Transformer are non-trivial: they can exploit sparsity, exhibit double descent, and outperform existing decision tree algorithms.