Bir LSTM'de gizli katman sayısı ve hafıza hücrelerinin sayısı nasıl seçilir?

28

Gizli katmanların sayısını ve bunların LSTM tabanlı bir RNN'nin boyutunun nasıl seçileceğiyle ilgili bazı araştırmalar bulmaya çalışıyorum.

Bu sorunun araştırıldığı bir makale var mı, yani kaç tane bellek hücresi kullanmalı? Genel olarak uygulamaya ve modelin hangi bağlamda kullanıldığına bağlı olduğunu düşünüyorum, ancak araştırma ne diyor?

— Stephen Johnson
kaynak

15

Your question is quite broad, but here are some tips:

For feedforward networks, see this question:

@doug's answer has worked for me. There's one additional rule of thumb that helps for supervised learning problems. The upper bound on the number of hidden neurons that won't result in over-fitting is:

$N_{h} = \frac{N_{s}}{(α * (N_{i} + N_{o}))}$ $N_h = \frac{N_s} {(\alpha * (N_i + N_o))}$

$N_i$ = number of input neurons. $N_o$ = number of output neurons. $N_s$ = number of samples in training data set. $\alpha$ = an arbitrary scaling factor usually 2-10.
Others recommend setting $alpha$ to a value between 5 and 10, but I find a value of 2 will often work without overfitting. As explained by this excellent NN Design text, you want to limit the number of free parameters in your model (its degree or number of nonzero weights) to a small portion of the degrees of freedom in your data. The degrees of freedom in your data is the number samples * degrees of freedom (dimensions) in each sample or $N_s * (N_i + N_o)$ (assuming they're all independent). So $\alpha$ is a way to indicate how general you want your model to be, or how much you want to prevent overfitting.

For an automated procedure you'd start with an alpha of 2 (twice as many degrees of freedom in your training data as your model) and work your way up to 10 if the error for training data is significantly smaller than for the cross-validation data set.

And specifically on LSTM's, you might want to check out this.

But the main point: there is no rule for the amount of hidden nodes you should use, it is something you have to figure out for each case by trial and error.

— Thomas W
kaynak

7

Select the number of hidden layers and number of memory cells in LSTM is always depend on application domain and context where you want to apply this LSTM.

For hidden Layers. The introduction of hidden layer(s) makes it possible for the network to exhibit non-linear behaviour.

The optimal number of hidden units could easily be smaller than the number of inputs, there is no rule like multiply the number of inputs with N... If you have a lot of training examples, you can use multiple hidden units, but sometimes just 2 hidden units works best with little data. Usually people use one hidden layer for simple tasks, but nowadays research in deep neural network architectures show that many hidden layers can be fruitful for difficult object, handwritten character, and face recognition problems.

I assume it totally depends on the application and in which context the model is being used.

— Maheshwar Ligade
kaynak

5

Non-linearity is due to the use of non-linear activation functions. The number of layers only increases the expressivity of the NN. You should correct this answer. Combinations of linear functions are still linear functions (so, if you had multiple layers that only performed a linear combination of the inputs, then the combination of these layers would still be linear).

— nbro

4

In general, there are no guidelines on how to determine the number of layers or the number of memory cells in a LSTM.

The number of layers and cells required in an LSTM might depend on several aspects of the problem:

The complexity of the dataset. The amount of features, number of data points etc.
The the data generating process. Following example of how data generating process can play significant part.

Ex - Prediction of oil prices compared to the prediction of GDP of a well understood economy. The latter is much easier than the former. Thus, predicting oil prices might as well need more number of the LSTM memory cells to predict with the same accuracy as compared to the GDP.

The accuracy required for the use case.The number of memory cells will heavily depend on this. If the goal is to beat the state-of-the-art -- one needs more LSTM cells in general. Compare that to the goal of coming up with reasonable predictions -- which would need lesser number of LSTM cells.

I follow these steps when modelling using LSTM:

Try a single hidden layer with 2 or 3 memory cells. See how it performs against a benchmark. If it is a time series problem then I generally make a forecast from classical time series techniques as benchmark.
Try and increase the number of memory cells. If the performance is not increasing much then move on to next step.
Start making the network deep i.e. add another layer with a small number of memory cells.

Aside :

There is no limit to the amount of labor that can be devoted to reach that global minima of the loss function and tune the best hyper-parameters. So, having focus on the end goal for modeling should be the strategy rather than trying to increase the accuracy as much as possible.

Most of the problems can be handled using 2-3 layers of the network.

— naive
kaynak

2

Maybe you should have a look at this: https://wiki.inf.ed.ac.uk/twiki/pub/CSTR/ListenTerm1201415/sak2.pdf

Here they show that 2 layers are nice, 5 layers are better and 7 layers are very hard to train.

— Dieshe
kaynak