Önyargılı düğümler neden sinir ağlarında kullanılıyor?


29
  1. Önyargılı düğümler neden sinir ağlarında kullanılıyor?
  2. Kaç tane kullanmalısın?
  3. Bunları hangi katmanlarda kullanmalısınız: tüm gizli katmanlar ve çıktı katmanlar?

1
This question is a bit broad for this forum. I think it would be best to consult a textbook discussing neural networks, such as Bishop Neural Networks for Pattern Recognition or Hagan Neural Network Design.
Sycorax says Reinstate Monica

2
FTR, I don't think this is too broad.
gung - Reinstate Monica

Yanıtlar:


24

The bias node in a neural network is a node that is always 'on'. That is, its value is set to 1 without regard for the data in a given pattern. It is analogous to the intercept in a regression model, and serves the same function. If a neural network does not have a bias node in a given layer, it will not be able to produce output in the next layer that differs from 0 (on the linear scale, or the value that corresponds to the transformation of 0 when passed through the activation function) when the feature values are 0.

enter image description here

Consider a simple example: You have a feed forward perceptron with 2 input nodes x1 and x2, and 1 output node y. x1 and x2 are binary features and set at their reference level, x1=x2=0. Multiply those 2 0's by whatever weights you like, w1 and w2, sum the products and pass it through whatever activation function you prefer. Without a bias node, only one output value is possible, which may yield a very poor fit. For instance, using a logistic activation function, y must be .5, which would be awful for classifying rare events.

A bias node provides considerable flexibility to a neural network model. In the example given above, the only predicted proportion possible without a bias node was 50%, but with a bias node, any proportion in (0,1) can be fit for the patterns where x1=x2=0. For each layer, j, in which a bias node is added, the bias node will add Nj+1 additional parameters / weights to be estimated (where Nj+1 is the number of nodes in layer j+1). More parameters to be fitted means it will take proportionately longer for the neural network to be trained. It also increases the chance of overfitting, if you don't have considerably more data than weights to be learned.

With this understanding in mind, we can answer your explicit questions:

  1. Bias nodes are added to increase the flexibility of the model to fit the data. Specifically, it allows the network to fit the data when all input features are equal to 0, and very likely decreases the bias of the fitted values elsewhere in the data space.
  2. Typically, a single bias node is added for the input layer and every hidden layer in a feedforward network. You would never add two or more to a given layer, but you might add zero. The total number is thus determined largely by the structure of your network, although other considerations could apply. (I am less clear on how bias nodes are added to neural network structures other than feedforward.)
  3. Mostly this has been covered, but to be explicit: you would never add a bias node to the output layer; that wouldn't make any sense.

Is CNN different in this regard? since when I add bias to my conv layers, the performance (accuracy)degrades! and when I remove them, it actually goes higher!
Rika

@Hossein, not that I know of, but you could ask a new question. I'm not much of an expert there.
gung - Reinstate Monica

Would I still need bias nodes if my inputs never go to 0?
alec_djinn

1
@alec_djinn, yes. Almost certainly the model would be biased without them, even if you never have 0 for an input value. By analogy, it may help to read: When is it ok to remove the intercept in a linear regression model?
gung - Reinstate Monica

1
@krupeshAnadkat, "The bias node in a neural network is a node that is always 'on'. That is, its value is set to 1 without regard for the data in a given pattern." So you can connect if if you like, just always change the resulting value of the node back to 1 before you multiply it by the weight, since a bias node is a node whose value is always 1.
gung - Reinstate Monica

2

Simple, short answers:

  1. To shift the input function / be more flexible about the learned function.
  2. A single bias node per layer.
  3. Add them to all hidden layers and the input layer - with some footnotes

In a couple of experiments in my masters thesis (e.g. page 59), I found that the bias might be important for the first layer(s), but especially at the fully connected layers at the end it seems not to play a big role. Hence one can have them at the first few layers and not at the last ones. Simply train a network, plot the distribution of weights of the bias nodes and prune them if the weights seem to be too close to zero.

This might be highly dependent on the network architecture / dataset.


would bias node have arrows connecting to it from previous layer? or it just contributes to next layer by multiplying its value "1" with weight in the weighted sum passed to activation. Answer to this will save hours, please do help
krupesh Anadkat

1
The bias is just an added number to the next layers activation. One way to visualize it is by having a constant 1 value in the previous layer and one weight (one bias value) for each of the next layers neurons.
Martin Thoma

2

In the context of neural networks, Batch Normalization is currently the gold-standard for making smart "bias nodes." Instead of clamping a neuron's bias value, you instead adjust for the covariance of the neuron's input. So in a CNN, you would apply a batch normalization just between the convolutional layer and the next fully connected layer (of say, ReLus). In theory, all fully connected layers could benefit from Batch Normalization but this in practice becomes very expensive to implement since each batch normalization carries its own parameters.

Concerning why, most of the answers already have explained that, in particular, neurons are susceptible to saturated gradients when the input pushes the activation to an extreme. In the case of ReLu's this would be pushed to the left, giving a gradient of 0. In general, when you train a model, you first normalize the inputs to the neural network. Batch Normalization is a way of normalizing the inputs inside the neural network, between layers.

Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.