Deep learning framework by BAIR
Created by
Yangqing Jia
Lead Developer
Evan Shelhamer
To create a Caffe model you need to define the model architecture in a protocol buffer definition file (prototxt).
Caffe layers and their parameters are defined in the protocol buffer definitions for the project in caffe.proto.
Data enters Caffe through data layers: they lie at the bottom of nets. Data can come from efficient databases (LevelDB or LMDB), directly from memory, or, when efficiency is not critical, from files on disk in HDF5 or common image formats.
Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) is available by specifying TransformationParameter
s by some of the layers.
The bias, scale, and crop layers can be helpful with transforming the inputs, when TransformationParameter
isn’t available.
Layers:
Note that the Python Layer can be useful for create custom data layers.
Vision layers usually take images as input and produce other images as output, although they can take data of other types and dimensions. A typical “image” in the real-world may have one color channel (), as in a grayscale image, or three color channels () as in an RGB (red, green, blue) image. But in this context, the distinguishing characteristic of an image is its spatial structure: usually an image has some non-trivial height and width . This 2D geometry naturally lends itself to certain decisions about how to process the input. In particular, most of the vision layers work by applying a particular operation to some region of the input to produce a corresponding region of the output. In contrast, other layers (with few exceptions) ignore the spatial structure of the input, effectively treating it as “one big vector” with dimension .
Layers:
Deconvolution Layer - transposed convolution.
Layers:
Layers:
The bias and scale layers can be helpful in combination with normalization.
In general, activation / Neuron layers are element-wise operators, taking one bottom blob and producing one top blob of the same size. In the layers below, we will ignore the input and out sizes as they are identical:
Layers:
Layers:
Silence - prevent top-level blobs from being printed during training.
Loss drives learning by comparing an output to a target and assigning cost to minimize. The loss itself is computed by the forward pass and the gradient w.r.t. to the loss is computed by the backward pass.
Layers: