Batch gradient descent computes the gradient of the cost function over entire training dataset before making an update. Theoretically, this sounds good since we want to model our input dataset, let us say X, best, however it can be computationally quite expensive. How can we reduce this cost? Well, we could decrease the size of inputs. But, we want … Continue reading Why does batch size matter?
In this post I will try to cover some of the widely used graphical models and how can we convert one to another. This post is largely influenced by discussions with my classmate Ankush Ganguly. Some material which I used extensively to learn about this topic, and also strongly suggest include: Paper by Brendan J. Frey, "Extending … Continue reading Probabilistic Graphical Models