In general, there are no guidelines on how to determine the number of layers or the number of memory cells in a LSTM.
The number of layers and cells required in an LSTM might depend on several aspects of the problem:
The complexity of the dataset. The amount of features, number of data points etc.
The the data generating process. Following example of how data generating process can play significant part.
Ex - Prediction of oil prices compared to the prediction of GDP of a well understood economy. The latter is much easier than the former. Thus, predicting oil prices might as well need more number of the LSTM memory cells to predict with the same accuracy as compared to the GDP.
- The accuracy required for the use case.The number of memory cells will heavily depend on this. If the goal is to beat the state-of-the-art -- one needs more LSTM cells in general. Compare that to the goal of coming up with reasonable predictions -- which would need lesser number of LSTM cells.
I follow these steps when modelling using LSTM:
Try a single hidden layer with 2 or 3 memory cells. See how it performs against a benchmark. If it is a time series problem then I generally make a forecast from classical time series techniques as benchmark.
Try and increase the number of memory cells. If the performance is not increasing much then move on to next step.
Start making the network deep i.e. add another layer with a small number of memory cells.
Aside :
There is no limit to the amount of labor that can be devoted to reach that global minima of the loss function and tune the best hyper-parameters. So, having focus on the end goal for modeling should be the strategy rather than trying to increase the accuracy as much as possible.
Most of the problems can be handled using 2-3 layers of the network.