The basic idea is to create a deep generative model for unsupervised learning by combining a top-down directed model P and a bottom up directed model Q into a joint model P*. We show that we can train P* such that P and Q are useful approximate inference distributions when we want to sample from the model, or when we want to perform inference. We generally observe that BiHMs prefer deep architectures with many layers of latent variables. I.e., our best model for the binarized MNIST dataset has 12 layers with 300,200,100,75,50,35,30,25,20,15,10,10 binary latent units. This model reaches a test set LL of 84.8 nats.