Learning Compositional

Models

Jeova Farias

Compositional nature of the objects

● Natural objects can be represented in terms of object parts and their spatial relations.

● These parts are represented recursively in terms of subparts (with spatial relations), and sub-subparts, ...

Learning Structure

● The estimation of such compositional structure structure is

very desired:

○ Composition allows explicit part-sharing/part-reuse, yielding big

gains in computational efﬁciency

○ It makes context transfer, generalization and extrapolation possible

○ It also tells us a lot about the object.

○ It would help solve some CV tasks such as Scene Sampling /

Object Recognition.

Learning Structure

● Assumptions:

“How can the graph structure of hierarchical compositional models (HCM) be learned from natural

images without detailed human supervision?”

In order to segment the object from the background

an object model is needed.

In order to learn the graph structure of an HCM the

object must be segmented from the background.

The structure of the object is known a-priori, in terms of the number of parts and their hierarchical relation.

The object can be discriminated from the background solely based on local image information.

The object in the training images is already segmented from the background

● Chicken-and-egg problem:

Greedy Structure Learning

● The focus in this presentation will be the following paper:

Greedy Structure Learning of Hierarchical Compositional Models. Adam Kortylewski, Clemens Blumer, Aleksander

Wieczorek, Mario Wieser, Sonali Parbhoo, Andreas Morel-Forster, Volker Roth, Thomas Vetter. CVPR 2019

● The authors propose a framework for learning the graph structure of hierarchical compositional

models without relying on the assumptions A1-A3:

Active Basis Model (ABM)

● In this model, the image is decomposed into a set of Gabor ﬁlters with ﬁxed frequency band,

coefﬁcients and a residual image:

denotes the position and orientation of a basis ﬁlter w.r.t. an object :

● Deﬁning and , the generative model is:

With: , and

● The learning of is done via ML in the training data and is computed via matching pursuit.

Compositional Active Basis Model (CABM)

● The hierarchical version of ABM:

● Generative Model:

Proposed: Multi-Layer CABM

● Generalization of the CABM model to an

arbitrary numbers of hierarchical layers L:

● Generative Model:

● More expressivity ⇒ learn the full dependency structure of the probabilistic model, including the

number of layers, the number of parts per layer and their hierarchical dependency structure.

● Solution: greedy structure learning.

Greedy structure learning

● Two phases:

○ Bottom-up compositional clustering.

○ top-down model composition phase.

● Bottom up phase (“EM”): at each layer,

select two part models , (learned with

matching pursuit) from image patches that

are randomly sampled from the training

data. Then repeat:

○ Detection (E-step): Detect part models in

the training images at different locations

and orientations. Cut out patches at the

detected positions which serve as new

training data for the M-step.

○ Learning (M-step): Learn a part model from

the training patches with matching pursuit.

● Dealing with background:

○ Update only one of the models at each

iteration. The other only participates in the

E-step.

○ It will serve as a generic background model.

Greedy structure learning

The ﬁrst t = 22 iterations of the greedy learning scheme. Each row shows the evolution of a part model

over time. Each column shows the learning result at one iteration of the learning process. When a new

part is initialized (t = 1, 6, 11, . . . ), also a generic background model is learned from the training image

(marked by dashed rectangles). The background model and the learned part models are not adapted in

the subsequent iterations (gray background) but serve as competitors for data in the E-step.

Greedy structure learning

● Top-down model building (“Composition”)

○ Dictionaries were learnt in the bottom-up phase;

○ The training images are ﬁrst aligned by detecting the part model of the highest layer;

○ The alignment is such that the models are in a canonical orientation and position;

○ After this alignment step, we proceed in a top-down manner with matching pursuit.

Results

Samples Bottom-up Top-down CABM Result