Audio Source Separation

Traitement du Signal et Intelligence artifielle - TSIA206

matfontaine.github.io/MASS_Course

Roland BADEAU & Mathieu FONTAINE
{mathieu.fontaine, roland.badeau}@telecom-paris.fr

June 10th, 17th 2022

Outline

I - Introduction

II - Mathematical reminders

III - Linear instantaneous mixtures

IV - Independent component analysis

V - Second order methods

VI - Time-frequency methods

VII- Convolutive mixtures

VIII- Under-determined mixtures

IX - Conclusion

I - Introduction

Source separation

Art of estimating "source" signals, assumed independent, from the observation of one or several "mixtures" of these sources

Applications examples

Source separation

Art of estimating "source" signals, assumed independent, from the observation of one or several "mixtures" of these sources

Applications examples

Denoising (cocktail party, suppression of vuvuzela, karaoke)

           noisy speech

          multi speeech

            clean speech

              one speeech

Source separation

Art of estimating "source" signals, assumed independent, from the observation of one or several "mixtures" of these sources

Applications examples

Denoising (cocktail party, suppression of vuvuzela, karaoke)

           noisy speech

          multi speeech

            clean speech

              one speeech
Separation of the instruments in polyphonic music

                  song

                 drums

                 vocals

                 bass

Typology of the mixture models (1/2)

Definition of the problem

Observations: $M$ mixtures $x_m(t)$ concatenated in a vector $\bold{x}(t)$
Unknowns: $K$ sources $s_k(t)$ concatenated in a vector $\bold{s}(t)$
General mixture model: function $\mathcal{A}$ which transforms $\bold{s}(t)$ into $\bold{x}(t)$

Property typology

Typology of the mixture models (1/2)

Definition of the problem

Observations: $M$ mixtures $x_m(t)$ concatenated in a vector $\bold{x}(t)$
Unknowns: $K$ sources $s_k(t)$ concatenated in a vector $\bold{s}(t)$
General mixture model: function $\mathcal{A}$ which transforms $\bold{s}(t)$ into $\bold{x}(t)$

Property typology

Stationarity: $\mathcal{A}$ is translation invariant
Linearity: $\mathcal{A}$ is a linear map

Memory

Typology of the mixture models (1/2)

Definition of the problem

Observations: $M$ mixtures $x_m(t)$ concatenated in a vector $\bold{x}(t)$
Unknowns: $K$ sources $s_k(t)$ concatenated in a vector $\bold{s}(t)$
General mixture model: function $\mathcal{A}$ which transforms $\bold{s}(t)$ into $\bold{x}(t)$

Property typology

Stationarity: $\mathcal{A}$ is translation invariant
Linearity: $\mathcal{A}$ is a linear map

Memory

Convolutive mixtures
Instantaneous mixtures: $\bold{x}(t)=\bold{A}\bold{s}(t) \\ \quad\rightarrow \mathcal{A}$ is defined by the "mixture matrix" $\bold{A}$ (of dimension $M\times K$)

Typology of the mixture models (2/2)

Inversibility

Determined mixtures: $M=K$
Overdetermined mixtures: $M>K$
Under-determined mixtures: $M < K$

Instantaneous linear mixtures

Anechoic linear mixtures

Convolutive linear mixtures

II - Mathematical reminders

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$
$\kappa^n[\bold{x}]$ is an $n^{\mathrm{th}}$ order tensor of coefficients $\kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]$
$\kappa^1[\bold{x}]$ is the mean vector, $\kappa^2[\bold{x}]$ is the covariance matrix

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$
$\kappa^n[\bold{x}]$ is an $n^{\mathrm{th}}$ order tensor of coefficients $\kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]$
$\kappa^1[\bold{x}]$ is the mean vector, $\kappa^2[\bold{x}]$ is the covariance matrix
$p(\bold{x})$ symmetric $\implies$ $\kappa^{n}[\bold{x}]=0, \forall n~ \mathrm{odd}$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.

$\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$
$\kappa^n[\bold{x}]$ is an $n^{\mathrm{th}}$ order tensor of coefficients $\kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]$
$\kappa^1[\bold{x}]$ is the mean vector, $\kappa^2[\bold{x}]$ is the covariance matrix
$p(\bold{x})$ symmetric $\implies$ $\kappa^{n}[\bold{x}]=0, \forall n~ \mathrm{odd}$
The ratio $\kappa^4_{k,k,k,k}[\bold{x}]/\kappa^2_{k,k}[\bold{x}]$ is called the "kurtosis"

Real Gaussian random vector

The Gaussian is the one such that all cumulants of order $n>2$ are zero

Real Gaussian random vector

The Gaussian is the one such that all cumulants of order $n>2$ are zero
Characteristic function $$ \phi_x(\bold{f})=\exp(-2i\pi\bold{f}^\top\mu_x-2\pi^2\bold{f}^\top\Sigma_{xx}\bold{f}) $$

Real Gaussian random vector

The Gaussian is the one such that all cumulants of order $n>2$ are zero
Characteristic function $$ \phi_x(\bold{f})=\exp(-2i\pi\bold{f}^\top\mu_x-2\pi^2\bold{f}^\top\Sigma_{xx}\bold{f}) $$
Probability density function (defined if $\Sigma_{xx}$ is invertible) $$ p(\bold{x})=\frac{1}{(2\pi)^{\frac{M}{2}}\det(\Sigma_{xx})^{\frac{1}{2}}} \exp(-\frac{1}{2}(\bold{x}-\mu_x)^\top\Sigma_{xx}^{-1}(\bold{x}-\mu_x)) $$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

$\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

$\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
$\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

$\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
$\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$
$R_{xx}(0)=\Sigma_{xx}$ is Hermitian and positive semi-definite.

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

$\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
$\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$
$R_{xx}(0)=\Sigma_{xx}$ is Hermitian and positive semi-definite.

Power spectral density (PSD) of a WSS-VP $\bold{x}$

$\bold{S}_{xx}(\nu)= \sum\limits_{\tau\in\mathbb{Z}}\bold{R}_{xx}(\tau)e^{-2i\pi\nu\tau} \quad \texttt{(PSD)}$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

$\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
$\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$
$R_{xx}(0)=\Sigma_{xx}$ is Hermitian and positive semi-definite.

Power spectral density (PSD) of a WSS-VP $\bold{x}$

$\bold{S}_{xx}(\nu)= \sum\limits_{\tau\in\mathbb{Z}}\bold{R}_{xx}(\tau)e^{-2i\pi\nu\tau} \quad \texttt{(PSD)}$
$\forall \nu, \bold{S}_{xx}(\nu)$ is Hermitian and positive semi-definite

Information Theory

Shannon entropy

Information Theory

Shannon entropy

$\mathbb{H}[\bold{x}] = -\mathbb{E}[\ln(p(\bold{x}))]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{(Shannon entropy)}$
$\mathbb{H}[\bold{x}]$ is not necessarily non-negative for a continuous random vector

Information Theory

Shannon entropy

$\mathbb{H}[\bold{x}] = -\mathbb{E}[\ln(p(\bold{x}))]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{(Shannon entropy)}$
$\mathbb{H}[\bold{x}]$ is not necessarily non-negative for a continuous random vector

Kullback-Leibler (KL) divergence

$D_{KL}(p\mid\mid q) = \int p(\bold{x})\ln(\frac{p(\bold{x})}{q(\bold{x})})d\bold{x}~~~~~~~~~~~~~~\texttt{(KL-Divergence)}$
$D_{KL}(p\mid\mid q) \geq 0, ~ D_{KL}(p\mid\mid q) = 0 ~ \mathrm{iff.} ~ p=q$

Information Theory

Shannon entropy

$\mathbb{H}[\bold{x}] = -\mathbb{E}[\ln(p(\bold{x}))]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{(Shannon entropy)}$
$\mathbb{H}[\bold{x}]$ is not necessarily non-negative for a continuous random vector

Kullback-Leibler (KL) divergence

$D_{KL}(p\mid\mid q) = \int p(\bold{x})\ln(\frac{p(\bold{x})}{q(\bold{x})})d\bold{x}~~~~~~~~~~~~~~\texttt{(KL-Divergence)}$
$D_{KL}(p\mid\mid q) \geq 0, ~ D_{KL}(p\mid\mid q) = 0 ~ \mathrm{iff.} ~ p=q$

Mutual Information

$\mathbb{I}[\bold{x}]=D_{KL}(p(\bold{x})\mid\mid p(x_1)\dots p(x_M))~~~~~~\texttt{(Mutual Information)}$
$\mathbb{I}[\bold{x}]=0 ~\mathrm{iff.}~ x_1,\dots,x_M$ are mutually independent
$\mathbb{I}[\bold{x}]=\sum\limits_{m=1}^{M}\mathbb{H}[x_m]-\mathbb{H}[\bold{x}]$

III - Linear instantaneous mixtures

Blind source separation (BSS) (1/2)

Observation model

Linear instantaneous mixture model: $$\forall t, \bold{x}(t)=\bold{A}\bold{s}(t)$$ $\quad\rightarrow\bold{A}\in\mathbb{R}^{M\times K}$: "mixing matrix"
Sources are assumed IID: $$p(\{s_k(t)\}_{k,t})=\prod\limits_{k=1}^{K}\prod\limits_{t=1}^{T}p_k(s_k(t))$$

Blind source separation (BSS) (1/2)

Observation model

Linear instantaneous mixture model: $$\forall t, \bold{x}(t)=\bold{A}\bold{s}(t)$$ $\quad\rightarrow\bold{A}\in\mathbb{R}^{M\times K}$: "mixing matrix"
Sources are assumed IID: $$p(\{s_k(t)\}_{k,t})=\prod\limits_{k=1}^{K}\prod\limits_{t=1}^{T}p_k(s_k(t))$$

BSS problem: estimate $\bold{A}$ and sources $\bold{s}(t)$ given $\bold{x}(t)$

Blind source separation (BSS) (2/2)

Non-mixing matrix

A matrix $\bold{C}$ of dimension $K\times K$ is non-mixing iff. it has a unique non-zero entry in each row and each column
If $\tilde{\bold{s}}(t) = \bold{C}\bold{s}(t)$ and $\tilde{\bold{A}}=\bold{A}\bold{C}^{-1}$, then $\bold{x}(t)= \tilde{\bold{A}}\tilde{\bold{s}}(t)$ is another admissible decomposition of the observations
$\quad\rightarrow$ Sources can be recovered up to a permutation and a multiplicative factor

Linear separation of sources

Model

Let $$\bold{y}(t)=\bold{B}\bold{x}(t)$$ $\quad\rightarrow\bold{B}\in\mathbb{R}^{K\times M}$: "separation matrix"

Feasibility

Linear separation is feasible if $\mathrm{rank}(\bold{A})=K$
Under the previous condition, we get: $$\bold{B} = \begin{cases} \bold{B} = \bold{A}^{-1} & \mathrm{if} ~ M=K \\ \bold{B} = \bold{A}^{\dagger} = (\bold{A}^\top\bold{A})^{-1}\bold{A}^\top & \mathrm{if} ~ M>K \quad\quad \texttt{(pseudo-inverse)}\\ \emptyset & \mathrm{if} ~ M< K \end{cases} $$

Linear separation of sources

Model

Let $$\bold{y}(t)=\bold{B}\bold{x}(t)$$ $\quad\rightarrow\bold{B}\in\mathbb{R}^{K\times M}$: "separation matrix"

Feasibility

Linear separation is feasible if $\mathrm{rank}(\bold{A})=K$
Under the previous condition, we get: $$\bold{B} = \begin{cases} \bold{B} = \bold{A}^{-1} & \mathrm{if} ~ M=K \\ \bold{B} = \bold{A}^{\dagger} = (\bold{A}^\top\bold{A})^{-1}\bold{A}^\top & \mathrm{if} ~ M>K \quad\quad \texttt{(pseudo-inverse)}\\ \emptyset & \mathrm{if} ~ M< K \end{cases} $$

In practice, the matrix $\bold{A}$ is unknown

IV - Independent component analysis

Indepedent component analysis (ICA) (1/2)

Problem Statement

$\bold{A}$ is unknown and we look for a matrix $\bold{B}$ that makes the $y_k(t)$ independent (ICA)
We get the equation: $$\bold{y}(t) = \bold{C}\bold{s}(t)$$ $\quad\rightarrow$ where $\bold{C}=\bold{BA}$
$\quad\rightarrow$ $\bold{C}$ is non-mixing $\implies$ the problem is solved.

Indepedent component analysis (ICA) (2/2)

Theorem (identifiability)

Let $\{s_k(t)\}_{k=1\dots K}$ be $K$ IID sources, among which at most one is Gaussian and: $$ \bold{y}(t) = \bold{C}\bold{s}(t) $$ with $\bold{C}$ invertible (i.e. $M \geq K$).

Theorem: If the signals $y_k(t)$ are independent, then $\bold{C}$ is non-mixing

Whitening (1/3)

Assumption and canonical problem (CP)

Let suppose that $\mathbb{E}[\bold{s}(t)]=0$, $M\geq K$ and that $\bold{A}$ invertible.

(CP) We assume without loss of generality that: $$ \Sigma_{ss} = \mathbb{E}[\bold{s}(t)\bold{s}(t)^\top] = \bold{I}_{K} \quad\quad \texttt{(spatially white)} $$

Then $$ \Sigma_{xx} = \bold{A}\Sigma_{ss}\bold{A}^\top =\bold{A}\bold{A}^\top $$ $\quad\rightarrow$ $\bold{A}$ is a square root matrix of $\bold{\Sigma}_{xx}$

Whitening (2/3)

Decorrelation (Whitening) of $\Sigma_{xx}$

$\Sigma_{xx}$ is diagonalizable in an orthonormal basis: $$ \Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top $$ $\quad\rightarrow \Lambda= \mathrm{diag}(\lambda_1, \dots, \lambda_M)$ with $\lambda_1 \geq \dots \geq \lambda_K >\lambda_{K+1} = \dots =\lambda_{M} = 0$ $\quad\quad (\mathrm{rank}(\Sigma_{xx}) = K)$
Let $\bold{S} = \bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)} \in \mathbb{R}^{M\times K}$ then $\Sigma_{xx} = \bold{S}\bold{S}^\top$
Let $\bold{W} = \bold{S}^{\dagger}, \bold{z}(t) = \bold{W}\bold{x}(t)$ then: $$ \begin{cases} \mathbb{E}[\bold{z}(t)] = 0, \\ \Sigma_{zz} = \bold{W}\Sigma_{xx}\bold{W}^\top = \bold{I} \end{cases}\quad\quad\texttt{(z is a white signal)} $$

Whitening (3/3)

Conclusion

Without loss of generality : $\bold{U} := \bold{WA}$ is a rotation matrix ($\bold{UU}^\top = \bold{I}$)
Then : $$ \bold{y}(t) = \bold{U}^\top \bold{z}(t) = \bold{U}^\top\bold{W}\bold{x}(t) = \bold{s}(t) $$
We can thus assume $\bold{B} = \bold{U}^\top\bold{W}$ where $\bold{U}$ is a rotation matrix

Higher order statistics

One can estimate $\Sigma_{xx}$ from the observations and get $\bold{W}$
The whiteness property (second order cumulants) determines $\bold{W}$ and leaves $\bold{U}$ unknown
If sources are Gaussian, the $z_k$ are independent and $\bold{U}$ cannot be determined

In order to determine a rotation $\bold{U}$, we need to exploit the non-Gaussianity of sources and characterize the independence by using cumulants of ordrer greater than 2.

Contrast functions

Definition and optimization problem

$\phi$ is a contrast function (CF) iff. $$ \begin{cases} \phi[\bold{Cs}(t)] \geq \phi[\bold{s}(t)], \forall \bold{C} \\ \phi[\bold{Cs}(t)] = \phi[\bold{s}(t)] \Leftrightarrow \bold{C} \mathrm{\ is\ non-mixing} \end{cases} $$
Separation is performed by minimizing $\phi[\bold{y}(t)=\bold{Cs}(t)]$ wrt. $\bold{U}$ (or $\bold{B}$)

Contrast functions example

Contrast functions

Definition and optimization problem

$\phi$ is a contrast function (CF) iff. $$ \begin{cases} \phi[\bold{Cs}(t)] \geq \phi[\bold{s}(t)], \forall \bold{C} \\ \phi[\bold{Cs}(t)] = \phi[\bold{s}(t)] \Leftrightarrow \bold{C} \mathrm{\ is\ non-mixing} \end{cases} $$
Separation is performed by minimizing $\phi[\bold{y}(t)=\bold{Cs}(t)]$ wrt. $\bold{U}$ (or $\bold{B}$)

Contrast functions example

$\phi_{IM}[\bold{y}(t)]=\mathbb{I}[\bold{y}(t)] \qquad\qquad\qquad\qquad\quad\quad\texttt{(Canonical CF)}$
$\phi_{IM}^{\circ}[\bold{y}(t)]=\sum_{k=1}^{K}\mathbb{H}[y_k(t)] \mathrm{~with~} \Sigma_{yy} = \bold{I} \quad\texttt{(Orthogonal contrast)}$
$\phi_{ICA}^{\circ}[\bold{y}(t)]=\sum_{ijkl \neq iiii}(\kappa_{ijkl}^4[\bold{y}(t)])^2 \quad~~\quad\texttt{(Approximation of} ~ \phi^{\circ}_{IM}\texttt{)} $

Contrast functions

Definition and optimization problem

$\phi$ is a contrast function (CF) iff. $$ \begin{cases} \phi[\bold{Cs}(t)] \geq \phi[\bold{s}(t)], \forall \bold{C} \\ \phi[\bold{Cs}(t)] = \phi[\bold{s}(t)] \Leftrightarrow \bold{C} \mathrm{\ is\ non-mixing} \end{cases} $$
Separation is performed by minimizing $\phi[\bold{y}(t)=\bold{Cs}(t)]$ wrt. $\bold{U}$ (or $\bold{B}$)

Contrast functions example

$\phi_{IM}[\bold{y}(t)]=\mathbb{I}[\bold{y}(t)] \qquad\qquad\qquad\qquad\quad\quad\texttt{(Canonical CF)}$
$\phi_{IM}^{\circ}[\bold{y}(t)]=\sum_{k=1}^{K}\mathbb{H}[y_k(t)] \mathrm{~with~} \Sigma_{yy} = \bold{I} \quad\texttt{(Orthogonal contrast)}$
$\phi_{ICA}^{\circ}[\bold{y}(t)]=\sum_{ijkl \neq iiii}(\kappa_{ijkl}^4[\bold{y}(t)])^2 \quad~~\quad\texttt{(Approximation of} ~ \phi^{\circ}_{IM}\texttt{)} $

Descent algorithms ( Minimizing $\phi$ wrt. $\bold{B}$ or $\bold{U}$)

Gradient algorihm applied to $\bold{B}$
Parameterization of $\bold{U}$ as a product of Givens rotations and coordinate descent

Algorithm

Estimation of $\Sigma_{xx}$

Algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$

Algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$

Algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$

Algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
Estimation of $\bold{U}$ by minimizing a contrast function $\phi$

Algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
Estimation of $\bold{U}$ by minimizing a contrast function $\phi$
Estimation of source signals via $\bold{y}(t) = \bold{U}^\top \bold{z}(t)$

V - Second order methods

Temporal coherence of sources

Model and assumption

Model:
$ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau)) \end{cases} \quad\texttt{(centered WSS vector process)} $
Canonical problem: we assume that $\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}$

Spatial whitening of $\bold{x}$

Temporal coherence of sources

Model and assumption

Model:
$ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau)) \end{cases} \quad\texttt{(centered WSS vector process)} $
Canonical problem: we assume that $\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}$

Spatial whitening of $\bold{x}$

We define as before $\bold{S}$ a matrix square root of $\Sigma_{xx}$, $\bold{W}=\bold{S}^{\dagger}$ and $\bold{z}(t)=\bold{Wx}(t)$
Again because $\Sigma_{xx}=\bold{AA}^\top, \bold{U}:=\bold{WA}$ is a rotation matrix
However this time, $\forall \tau \in \mathbb{Z}, \bold{R}_{zz}(\tau) = \bold{U}\bold{R}_{ss}(\tau)\bold{U}^\top$

Temporal coherence of sources

Model and assumption

Model:
$ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau)) \end{cases} \quad\texttt{(centered WSS vector process)} $
Canonical problem: we assume that $\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}$

Spatial whitening of $\bold{x}$

We define as before $\bold{S}$ a matrix square root of $\Sigma_{xx}$, $\bold{W}=\bold{S}^{\dagger}$ and $\bold{z}(t)=\bold{Wx}(t)$
Again because $\Sigma_{xx}=\bold{AA}^\top, \bold{U}:=\bold{WA}$ is a rotation matrix
However this time, $\forall \tau \in \mathbb{Z}, \bold{R}_{zz}(\tau) = \bold{U}\bold{R}_{ss}(\tau)\bold{U}^\top$

The joint diagonalization of $\bold{R}_{zz}(\tau)$ for several $\tau$ will provide a rotation matrix $\bold{U}$

Joint diagonalization

Unicity theorem

Let a set $\{\bold{R}_{zz}(\tau)\}_{\tau} \in \mathbb{R}^{K\times K}$ such that: $$ \bold{R}_{zz}(\tau) = \bold{U}\bold{R_{ss}(\tau)}\bold{U}^\top $$ $\quad\rightarrow \bold{U}$ unitary and $\bold{R}_{ss}(\tau)=\mathrm{diag}(r_{s_k}(\tau))$.

$\bold{U}$ is unique $\Leftrightarrow \forall 1\leq k \neq l \leq K~ \exists \tau, r_{s_k}(\tau) \neq r_{s_l}(\tau)$

Joint diagonalization methods

minimize : $J(\bold{U}) = \sum_\tau\mid\mid \bold{U}^\top\bold{R}_{zz}(\tau)\bold{U} - \mathrm{diag}(\bold{U}^\top\bold{R}_{zz}(\tau)\bold{U}) \mid\mid_{F}^2$
Parameterization of $\bold{U}$ as a Givens rotations and coordinate descent

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
Estimation of $\bold{R}_{zz}(\tau)$ for various delays $\tau$

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
Estimation of $\bold{R}_{zz}(\tau)$ for various delays $\tau$
Approximate joint diagonalization of $\bold{R}_{zz}(\tau)$ in a common basis $\bold{U}$

Second Order Blind Identification (SOBI) algorithm

Estimation of $\Sigma_{xx}$
Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
Estimation of $\bold{R}_{zz}(\tau)$ for various delays $\tau$
Approximate joint diagonalization of $\bold{R}_{zz}(\tau)$ in a common basis $\bold{U}$
Estimation of source signals via $\bold{y}(t) = \bold{U}^{\top}\bold{z}(t)$

Non-stationnarity of sources (1/2)

Model and assumption

Model: $ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \Sigma_{ss}(t) = \mathbb{E}[\bold{s}(t)\bold{s}(t)^\top] \\ \cancel{\bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau))} \end{cases} \quad\texttt{(centered ind. vector process)} $
~~Canonical problem: we assume that $\cancel{\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}}$~~

Method

Then $\Sigma_{xx}(t) = \bold{A}\Sigma_{ss}(t)\bold{A}^\top$ (we can assume $\Sigma_{xx}=\sum_t\Sigma_{xx}(t)=\bold{A}\bold{A}^\top$)
We minimize the criterion:
$J(\bold{U}) = \sum_\tau\mid\mid \bold{U}\Sigma_{zz}(\tau)\bold{U}^\top - \mathrm{diag}(\bold{U}\Sigma_{zz}(\tau)\bold{U}^\top) \mid\mid_{F}^2$

Non-stationnarity of sources (2/2)

Variant SOBI algorithm

Segmentation of mixture signals and estimation of covariance matrices $\Sigma_{xx}(t)$ on windows centered at different times $t$
Joint diagonalization of matrices $\Sigma_{xx}(t)$ in a common basis $\bold{B}$
Estimation of source signals via $\bold{y}(t) = \bold{Bx}(t)$

Conclusion of the first part

The use of higher order cumulants is only necessary for the non-Gaussian IID source model
Second order statistics are sufficient for sources that are:
$\quad\rightarrow$ stationary but not IID (→ spectral dynamics)
$\quad\rightarrow$non stationary (→ temporal dynamics)
Remember that classical tools (based on second order statistics) are appropriate for blind separation of independent (and possibly Gaussian) sources, on condition that the spectral / temporal source dynamics is taken into account

VI - Time-frequency methods

Time-frequency (TF) representations

Motivations

Spectral and temporal dynamics are highlighted by a TF representation of signals
TF domain: adequate to process convolutive and/or under-determined mixture

Use of a filter bank (STFT and MDCT)

Decomposition in $F$ sub-bands and decimation of factor $T \leq F$
Analysis filters $h_f$ and synthesis filters $g_f$
TF representation of mixture: $x_m(f,n) = (h_f\ast x_m)(nT)$
Perfect reconstruction: $x_m(t) = \sum_{f=1}^F\sum_{n\in \mathbb{Z}} g_f(t-nT)x_m(f,n)$

Then $\forall f,n ~ \bold{x}(f,n) = \bold{As}(f,n)\quad\texttt{(linear instantaneous mixture)}$

Non-stationary source model

Assumption: independent and centered second order processes

Model of non-stationary sources

if the time frames $n_1$ and $n_2$ are disjoint, then $s_k(., n_1)$ and $s_k(., n_2)$ are uncorrelated and of distinct variances

Model of WSS sources

if sub-bands $f_1$ and $f_2$ are disjoint then $s_k(f_1, .)$ and $s_k(f_2, .)$ are WSS, uncorrelated and of distinct variances

Time-frequency source model

all $s_k(f,n)$ are uncorrelated for all $n$ and $f$, of distinct variances $\sigma^2_{k}(f,n)$ ($\implies$ time-frequency dynamics)

Separation method

Separation by joint matrix diagonalization

We have $\Sigma_{xx}(f,n) = \bold{A}\Sigma_{ss}(f,n)\bold{A}^\top$ where $\Sigma_{ss}(f,n) = \mathrm{diag}(\sigma_k^2(f,n))$

Variant of the SOBI algorithm

TF analysis of the mixtures $x_k(f,n)$
Estimation and diagonalization of $\Sigma_{xx}(f,n)$ in a basis $\bold{B}$
Estimation of the source signals via $\bold{y}(f,n) = \bold{Bx}(f,n)$
TF synthesis of the sources: $$ y_k(t) = \sum_{f=1}^{F} \sum_{n\in \mathbb{Z}} g_f(t-nT)y_k(f,n) $$

VII - Convolutive mixtures

Source images

Instantanenous mixture model: unsuitable for real acoustic mixtures

Mixture of source image model

Let $\bold{x}_k(f,n) \in \mathbb{R}^M$ be the source image of $s_k(f,n)$
$\quad\rightarrow$ received multichannel signal if only source $s_k(f,n)$ was active
Mixture model: $\bold{x}(f,n) = \sum_{k=1}^K \bold{x}_k(f,n)$

Decomposition of the source separation problem

Separation: estimate $\bold{x}_k(f,n)$ from the mixture $\bold{x}(f,n)$
Deconvolution: estimate $s_k(f,n)$ from the source image $\bold{x}_k(f,n)$

Convolution linear mixture

Convolutive mixture model

$x_m(t) = \sum_{k=1}^{K}(a_{mk} \ast s_k)(t)$, i.e. $\bold{x}(t) = \bold{A} \ast \bold{s}(t)$

Identifiability theorem

Let $s_k(t)$ be $K$ IID sources, among which at most one is Gaussian, and $\bold{y}(t) = \bold{C} \ast \bold{s}(t)$ with $C$ invertible ((over)-determined case). If signals $y_k(t)$ are independent, then $\bold{C}$ is non-mixing.

Time-Frequency approach

Mixture model and narrow-band approximation

$x_m(t) = \sum_{k=1}^{K}(a_{mk} \ast s_k)(t)$,
the filter bank corresponds to an STFT
the IR of $a_{mk}$ is short compared with the window length
$\forall m,k,f, a_{mk}(\nu)$ varies slowly compared with $h_f(\nu)$

Approximation of the convolutive mixture model

$x_m(f,n)=\sum_{k=1}^K a_{mk}(f)s_k(f,n)$ i.e. $\bold{x}(f,n)=\bold{A}(f)\bold{s}(f,n)$
$\quad\rightarrow$ $F$ instantaneous mixture models in every sub-band
$\quad\rightarrow$ we can use any ICA method in every sub-band

Independent component analysis

Let $\bold{y}(f,n) = \bold{B}(f)\bold{x}(f,n)$ where $\bold{B}(f)\in\mathbb{C}^{K\times M}$
Linear separation is feasible if $\bold{A}(f)$ has rank $K$.
$\quad\rightarrow\bold{y}(f,n)=\bold{s}(f,n)$ with $\bold{B}(f)= \begin{cases} \bold{A}(f)^{-1} & \mathrm{if~} M=K \\ \bold{A}(f)^{\dagger} & \mathrm{if~} M>K \\ \emptyset & \mathrm{if~} M< K \end{cases} $
In practice $\bold{A}(f)$ is unknown:
$\quad\rightarrow$ We look for $\bold{B}(f)$ that makes $y_k(f,n)$ independent (ICA)
$\quad\rightarrow$ We get $\bold{y}(f,n) = \bold{C}(f)\bold{s}(f,n)$ where $\bold{C}(f) = \bold{B}(f)\bold{A}(f)$
$\quad\rightarrow \bold{C}(f)$ is non-mixing

Indeterminacies

Indeterminacies (perumatations and multiplicatives factors) in matrices $\bold{C}(f)$

$\forall k$, identify indexes $k, f$ such that $\forall f, y_{k_f}(f,n)=c_{k_f,k}s_k(f,n)$
identify the multiplicative factors $c_{k_f,k}$

Infinitely many solutions $\implies$ need to constrain the model

Assumptions on the mixture and sources

continuity of the frequency responses $a_{mk}(f)$ with respect to $f$
$\quad\rightarrow$ beamforming model or anechoic model
similarity of the temporal dynamics of $\sigma_k^2(f,n)$ (or NMF model)

Convolutive mixture models

Beamforming model

Assumptions: plane waves, far field, no reverberation, linear antenna
Model: $a_{mk}(f)=e^{-2i\pi f\tau_{mk}}$ where $\tau_{mk}=\frac{d_m}{c}\sin(\theta_k)$
Parameters: positions $d_m$ of the sensors and angles $\theta_k$ of the sources

Anechoic model

Assumptions: punctual sources, no reverberation
Model: $a_{mk}(f)=\alpha_{mk}e^{-2i\pi f\tau_{mk}}$ where $\tau_{mk}=\frac{r_{mk}}{c}$ and $\alpha_{mk} = \frac{1}{\sqrt{4\pi}r_{mk}}$
Parameters: distances $r_{mk}$ between the sensors and sources

VIII - Under-determined mixtures

Under-determined convolutive mixtures

Usual case in audio: monophonic $(M = 1)$ or stereophonic $(M = 2)$ mixtures, with a number of sources $K > M$

Convolutive mixture model and assumption

$\bold{x}(f,n)=\bold{A}(f)\bold{s}(f,n)$ with $M< K$
We assume that $\bold{A}(f)$ and $\Sigma_{ss}(f,n) = \mathrm{diag}(\sigma^2_k(f,n))$ are known
Even in this case, there is no matrix $\bold{B}(f)$ such that $\bold{B}(f)\bold{A}(f)=\bold{I}_K$

Separation via non-stationary filtering

Let $\bold{y}(f,n) = \bold{B}(f,n)\bold{x}(f,n)$ where $\bold{B}(f,n) \in \mathbb{C}^{K\times M}$

Mean-Squared error estimation

We look for $\bold{B}(f,n)$ which minimizes $\mathbb{E}[\mid\mid\bold{y}(f,n)-\bold{s}(f,n)\mid\mid^2_{2} ]$
Sol.: $\bold{B}(f,n) = \Sigma_{sx}(f,n)\Sigma_{xx}(f,n)^{-1} ~~~~~~ \texttt{(Multichannel Wiener filter)}$
$\quad\rightarrow \Sigma_{xx}(f,n) = \bold{A}(f)\Sigma_{ss}(f,n)\bold{A}(f)^{\mathrm{H}}, \Sigma_{sx}(f,n) = \Sigma_{ss}(f,n)\bold{A}(f)^{\mathrm{H}} $
$\bold{x}(f,n)=\bold{A}(f)\bold{y}(f,n)$ (exact reconstruction)

Particular case: monophonic mixtures

Without loss of generality, we define $\bold{A}(f)=\left[\begin{array}{ccc} 1 & \dots & 1\\ 0 & \cdots & 0\\ \vdots & \cdots & \vdots \end{array}\right]$
Then $y_k(f,n) = \frac{\sigma_k^2(f,n)}{\sum_{k^\prime=1}^K\sigma_{k^\prime}(f,n)} x(f,n) ~~~~~~~\texttt{(Wiener filter)}$

Separation algorithm

TF analysis: $x_k(f,n) = (h_f \ast x_k)(f,nT)$
Estimation of $\bold{A}$ and $\sigma^2_k(f,n)$
$\quad\rightarrow$ instantaneous mixture model
$\quad\rightarrow$ sparse source model
Compute $\bold{B}(f,n) = \Sigma_{sx}(f,n)\Sigma_{xx}(f,n)^{-1}$
Estimation of the source signals via $\bold{y}(f,n)=\bold{B}(f,n)\bold{x}(f,n)$
TF synthesis of the sources: $y_k(t) = \sum_{f=1}^F \sum_{n\in \mathbb{Z}} g_f(t-nT)y_k(f,n)$

Stereophonic mixtures: temporal sparsity

Case of a linear instantanous stereophonic mixture: $\bold{x}(t)= \bold{A}\bold{s}(t)$

Sparsity in a transformed domain

Case of a linear instantanous stereophonic mixture: $\bold{x}(f,n)= \bold{A}\bold{s}(f,n)$

Degenerate Unmixing Estimation Technique (DUET) (1/2)

Mixture model and directivity assumption

Model in sterephonic case $(K=2)$: $\bold{x}(f,n)=\bold{A}\bold{s}(f,n)$
Without loss of generality, we assume $\bold{A}_{(:,k)}=[\cos(\theta_k), \sin(\theta_k)]^{\top}, \forall k$

Sparse source model

$\forall (f,n), \exists ! k_{(f,n)}$ such that $\sigma^2_{l_{(f,n)}} = \begin{cases} 0 & \mathrm{if~} l \neq k_{(f,n)}, \\ u_{l_{(f,n)}} > 0 & \mathrm{otherwise} \end{cases} $
If only source $k$ is active at $(f,n),$ then $\bold{x}(f,n) = \bold{a}_{k}s_k(f,n)$

Degenerate Unmixing Estimation Technique (DUET) (2/2)

TF analysis: $x_k(f,n) = (h_f \ast x_k)(f,nT)$
Estimation of $\theta_{k}$ and of the active source $k_{(f,n)}$
$\quad\rightarrow$ computation of the histogram of the angles of vectors $\bold{x}(f,n)$
$\quad\rightarrow$ peak detection in order to estimate $\theta_k$
$\quad\rightarrow$ determination of the active source by proximity with $\theta_{k}$
Source separation: for all $k$,
$\quad\rightarrow$ estimation of source images via binary masking:
$\quad \bold{y}_k(f,n)= \begin{cases} 0 & \mathrm{if~} k \neq k_{(f,n)}, \\ \bold{x}(f,n) & \mathrm{otherwise} \end{cases} $
MMSE estimation of the sources : $y_k(f,n) = \frac{\bold{a}_k(f)^{\mathrm{H}}}{\mid\mid\bold{a}_k(f)\mid\mid^2_2}\bold{y}_{k}(f,n)$
TF synthesis of the sources: $y_k(t) = \sum_{f=1}^F \sum_{n\in \mathbb{Z}} g_f(t-nT)y_k(f,n)$

IX - Conclusion

Conclusion

Summary

Source separation requires to make assumptions about the mixture and sources
For an (over-)determined instantaneous linear mixture, the assumption of independent sources is sufficient
In all other cases, we need to model the mixture and/or the sources

Perspectives

Non-stationary mixtures (adaptive algorithms)
Informed source separation (e.g. from music score)
Deep learning techniques
Objective assessment of audio source separation

Bibliography

Audio source separation and Blind source separation

Emmanuel Vincent, Tuomas Virtanen, and Sharon Gannot. Audio Source Separation and Speech Enhancement. Wiley Publishing, 1st edition, 2018.
Jean-François Cardoso. Blind signal separation: statistical principles. Proceedings of the IEEE, 86(10):2009–2025, 1998.

Independent Component Analysis

Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287 – 314, April 1994. Special issue on Higher-Order Statistics.
Pierre Comon and Christian Jutten. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Inc. (Elsevier), USA, 1st edition, 2010.

Informed Source separation

Sebastian Ewert and Meinard Müller. Multimodal Music Processing, volume 3, chapter Score- Informed Source Separation for Music Signals, pages 73–94. January 2012.
Simon Leglaive, Roland Badeau, Gael Richard. Multichannel Audio Source Separation with Probabilistic Reverberation Priors. IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2016, 24 (12), pp.2453-2465.

Multichannel Nonnegative Matrix Factorization for audio source separation

Alexey Ozerov and Cédric Févotte. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Process- ing, 18(3):550–563, 2010.

Multichannel Deep learning audio source separation

Aditya Arie Nugraha, Antoine Liutkus, Emmanuel Vincent. Multichannel audio source separation with deep neural networks. [Research Report] RR-8740, INRIA. 2015.