Learning Compositional
Models
Jeova Farias
Compositional nature of the objects
Natural objects can be represented in terms of object parts and their spatial relations.
These parts are represented recursively in terms of subparts (with spatial relations), and sub-subparts, ...
Learning Structure
The estimation of such compositional structure structure is
very desired:
Composition allows explicit part-sharing/part-reuse, yielding big
gains in computational efficiency
It makes context transfer, generalization and extrapolation possible
It also tells us a lot about the object.
It would help solve some CV tasks such as Scene Sampling /
Object Recognition.
Learning Structure
Assumptions:
“How can the graph structure of hierarchical compositional models (HCM) be learned from natural
images without detailed human supervision?”
In order to segment the object from the background
an object model is needed.
In order to learn the graph structure of an HCM the
object must be segmented from the background.
The structure of the object is known a-priori, in terms of the number of parts and their hierarchical relation.
A1
The object can be discriminated from the background solely based on local image information.
A2
The object in the training images is already segmented from the background
.
A3
Chicken-and-egg problem:
Greedy Structure Learning
The focus in this presentation will be the following paper:
Greedy Structure Learning of Hierarchical Compositional Models. Adam Kortylewski, Clemens Blumer, Aleksander
Wieczorek, Mario Wieser, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, Thomas Vetter. CVPR 2019
The authors propose a framework for learning the graph structure of hierarchical compositional
models without relying on the assumptions A1-A3:
Active Basis Model (ABM)
In this model, the image is decomposed into a set of Gabor filters with fixed frequency band,
coefficients and a residual image:
denotes the position and orientation of a basis filter w.r.t. an object :
Defining and , the generative model is:
With: , and
The learning of is done via ML in the training data and is computed via matching pursuit.
Compositional Active Basis Model (CABM)
The hierarchical version of ABM:
Generative Model:
Proposed: Multi-Layer CABM
Generalization of the CABM model to an
arbitrary numbers of hierarchical layers L:
Generative Model:
More expressivity learn the full dependency structure of the probabilistic model, including the
number of layers, the number of parts per layer and their hierarchical dependency structure.
Solution: greedy structure learning.
Greedy structure learning
Two phases:
Bottom-up compositional clustering.
top-down model composition phase.
Bottom up phase (“EM”): at each layer,
select two part models , (learned with
matching pursuit) from image patches that
are randomly sampled from the training
data. Then repeat:
Detection (E-step): Detect part models in
the training images at different locations
and orientations. Cut out patches at the
detected positions which serve as new
training data for the M-step.
Learning (M-step): Learn a part model from
the training patches with matching pursuit.
Dealing with background:
Update only one of the models at each
iteration. The other only participates in the
E-step.
It will serve as a generic background model.
Greedy structure learning
The first t = 22 iterations of the greedy learning scheme. Each row shows the evolution of a part model
over time. Each column shows the learning result at one iteration of the learning process. When a new
part is initialized (t = 1, 6, 11, . . . ), also a generic background model is learned from the training image
(marked by dashed rectangles). The background model and the learned part models are not adapted in
the subsequent iterations (gray background) but serve as competitors for data in the E-step.
Greedy structure learning
Top-down model building (“Composition”)
Dictionaries were learnt in the bottom-up phase;
The training images are first aligned by detecting the part model of the highest layer;
The alignment is such that the models are in a canonical orientation and position;
After this alignment step, we proceed in a top-down manner with matching pursuit.
Results
Samples Bottom-up Top-down CABM Result