Audio Source Separation

Traitement du Signal et Intelligence artifielle - TSIA206

matfontaine.github.io/MASS_Course

Roland BADEAU & Mathieu FONTAINE
{mathieu.fontaine, roland.badeau}@telecom-paris.fr

June 10th, 17th 2022

Outline

I - Introduction

II - Mathematical reminders

III - Linear instantaneous mixtures

IV - Independent component analysis

V - Second order methods

VI - Time-frequency methods

VII- Convolutive mixtures

VIII- Under-determined mixtures

IX - Conclusion

I - Introduction

Source separation

  • Art of estimating "source" signals, assumed independent, from the observation of one or several "mixtures" of these sources

Applications examples

Source separation

  • Art of estimating "source" signals, assumed independent, from the observation of one or several "mixtures" of these sources

Applications examples

  • Denoising (cocktail party, suppression of vuvuzela, karaoke)

Source separation

  • Art of estimating "source" signals, assumed independent, from the observation of one or several "mixtures" of these sources

Applications examples

  • Denoising (cocktail party, suppression of vuvuzela, karaoke)
  • Separation of the instruments in polyphonic music

Typology of the mixture models (1/2)

Definition of the problem

  • Observations: $M$ mixtures $x_m(t)$ concatenated in a vector $\bold{x}(t)$
  • Unknowns: $K$ sources $s_k(t)$ concatenated in a vector $\bold{s}(t)$
  • General mixture model: function $\mathcal{A}$ which transforms $\bold{s}(t)$ into $\bold{x}(t)$

Property typology

Typology of the mixture models (1/2)

Definition of the problem

  • Observations: $M$ mixtures $x_m(t)$ concatenated in a vector $\bold{x}(t)$
  • Unknowns: $K$ sources $s_k(t)$ concatenated in a vector $\bold{s}(t)$
  • General mixture model: function $\mathcal{A}$ which transforms $\bold{s}(t)$ into $\bold{x}(t)$

Property typology

  • Stationarity: $\mathcal{A}$ is translation invariant
  • Linearity: $\mathcal{A}$ is a linear map

Memory

Typology of the mixture models (1/2)

Definition of the problem

  • Observations: $M$ mixtures $x_m(t)$ concatenated in a vector $\bold{x}(t)$
  • Unknowns: $K$ sources $s_k(t)$ concatenated in a vector $\bold{s}(t)$
  • General mixture model: function $\mathcal{A}$ which transforms $\bold{s}(t)$ into $\bold{x}(t)$

Property typology

  • Stationarity: $\mathcal{A}$ is translation invariant
  • Linearity: $\mathcal{A}$ is a linear map

Memory

  • Convolutive mixtures
  • Instantaneous mixtures: $\bold{x}(t)=\bold{A}\bold{s}(t) \\ \quad\rightarrow \mathcal{A}$ is defined by the "mixture matrix" $\bold{A}$ (of dimension $M\times K$)

Typology of the mixture models (2/2)

Inversibility

  • Determined mixtures: $M=K$
  • Overdetermined mixtures: $M>K$
  • Under-determined mixtures: $M < K$

Instantaneous linear mixtures

Anechoic linear mixtures

Convolutive linear mixtures

II - Mathematical reminders

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
  • Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
  • Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
  • Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
  • Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
  • Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

  • Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
  • Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
  • Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

  • Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$
  • $\kappa^n[\bold{x}]$ is an $n^{\mathrm{th}}$ order tensor of coefficients $\kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]$
  • $\kappa^1[\bold{x}]$ is the mean vector, $\kappa^2[\bold{x}]$ is the covariance matrix

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
  • Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
  • Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

  • Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$
  • $\kappa^n[\bold{x}]$ is an $n^{\mathrm{th}}$ order tensor of coefficients $\kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]$
  • $\kappa^1[\bold{x}]$ is the mean vector, $\kappa^2[\bold{x}]$ is the covariance matrix
  • $p(\bold{x})$ symmetric $\implies$ $\kappa^{n}[\bold{x}]=0, \forall n~ \mathrm{odd}$

Real random vectors

Notations

$\bold{x}$ is a random vector of dimension $M$.
  • $\phi[\bold{x}]$ denotes a function of $p(\bold{x})$
  • Mean: $\mu_{x}=\mathbb{E}[\bold{x}]$
  • Covariance matrix: $\Sigma_{xx}=\mathbb{E}[(\bold{x}-\mu_x)(\bold{x}-\mu_x)^\top]$
  • Characteristic function: $\phi_{x}(\bold{f})=\mathbb{E}[e^{-2i\pi\bold{f}^\top\bold{x}}]=\int_{\mathbb{R}^M}p(\bold{x})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{x}$
  • Probability density function: $p(\bold{x})=\int_{\mathbb{R}^M}\phi_{x}(\bold{f})e^{-2i\pi\bold{f}^\top\bold{x}}d\bold{f}$

Cumulants

  • Definition: $ \ln\left(\phi_x(\bold{f})\right) = \sum\limits_{n=1}^{+\infty}\frac{(-2i\pi)^n}{n!}\sum\limits_{k_1=1}^M\dots\sum\limits_{k_n=1}^M \kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]f_{k_1}\dots f_{k_n}$
  • $\kappa^n[\bold{x}]$ is an $n^{\mathrm{th}}$ order tensor of coefficients $\kappa^{n}_{k_{1}\dots k_{n}}[\bold{x}]$
  • $\kappa^1[\bold{x}]$ is the mean vector, $\kappa^2[\bold{x}]$ is the covariance matrix
  • $p(\bold{x})$ symmetric $\implies$ $\kappa^{n}[\bold{x}]=0, \forall n~ \mathrm{odd}$
  • The ratio $\kappa^4_{k,k,k,k}[\bold{x}]/\kappa^2_{k,k}[\bold{x}]$ is called the "kurtosis"

Real Gaussian random vector

  • The Gaussian is the one such that all cumulants of order $n>2$ are zero

Real Gaussian random vector

  • The Gaussian is the one such that all cumulants of order $n>2$ are zero
  • Characteristic function
    $$ \phi_x(\bold{f})=\exp(-2i\pi\bold{f}^\top\mu_x-2\pi^2\bold{f}^\top\Sigma_{xx}\bold{f}) $$

Real Gaussian random vector

  • The Gaussian is the one such that all cumulants of order $n>2$ are zero
  • Characteristic function
    $$ \phi_x(\bold{f})=\exp(-2i\pi\bold{f}^\top\mu_x-2\pi^2\bold{f}^\top\Sigma_{xx}\bold{f}) $$
  • Probability density function (defined if $\Sigma_{xx}$ is invertible)
    $$ p(\bold{x})=\frac{1}{(2\pi)^{\frac{M}{2}}\det(\Sigma_{xx})^{\frac{1}{2}}} \exp(-\frac{1}{2}(\bold{x}-\mu_x)^\top\Sigma_{xx}^{-1}(\bold{x}-\mu_x)) $$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

  • $\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

  • $\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
  • $\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

  • $\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
  • $\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$
  • $R_{xx}(0)=\Sigma_{xx}$ is Hermitian and positive semi-definite.

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

  • $\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
  • $\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$
  • $R_{xx}(0)=\Sigma_{xx}$ is Hermitian and positive semi-definite.

Power spectral density (PSD) of a WSS-VP $\bold{x}$

  • $\bold{S}_{xx}(\nu)= \sum\limits_{\tau\in\mathbb{Z}}\bold{R}_{xx}(\tau)e^{-2i\pi\nu\tau} \quad \texttt{(PSD)}$

Wide-sense stationnary vector processes (WSS-VP)

Definition & Properties

  • $\bold{x}:=\{\bold{x}(t)\}_{t\in\mathbb{Z}}$ WSS-VP $\Leftrightarrow$ $$ \begin{cases} \forall t,\quad~~ \mathbb{E}[\bold{x}(t)] = \mu_{x} \\ \forall t,\tau,~~ \mathbb{E}[(\bold{x}(t+\tau) - \mu_x)(\bold{x}(t) - \mu_x)^\top] = \bold{R}_{xx}(\tau)~~~ \texttt{(autocovariance function)} \end{cases}$$
  • $\bold{x}, \bold{y}$ centered WSS-VP, $\bold{R}_{xy}(\tau) = \mathbb[\bold{x}(t+\tau)\bold{y}(t)^\top] \quad\texttt{(intercovariance function)}$
  • $R_{xx}(0)=\Sigma_{xx}$ is Hermitian and positive semi-definite.

Power spectral density (PSD) of a WSS-VP $\bold{x}$

  • $\bold{S}_{xx}(\nu)= \sum\limits_{\tau\in\mathbb{Z}}\bold{R}_{xx}(\tau)e^{-2i\pi\nu\tau} \quad \texttt{(PSD)}$
  • $\forall \nu, \bold{S}_{xx}(\nu)$ is Hermitian and positive semi-definite

Information Theory

Shannon entropy

Information Theory

Shannon entropy

  • $\mathbb{H}[\bold{x}] = -\mathbb{E}[\ln(p(\bold{x}))]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{(Shannon entropy)}$
  • $\mathbb{H}[\bold{x}]$ is not necessarily non-negative for a continuous random vector

Information Theory

Shannon entropy

  • $\mathbb{H}[\bold{x}] = -\mathbb{E}[\ln(p(\bold{x}))]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{(Shannon entropy)}$
  • $\mathbb{H}[\bold{x}]$ is not necessarily non-negative for a continuous random vector

Kullback-Leibler (KL) divergence

  • $D_{KL}(p\mid\mid q) = \int p(\bold{x})\ln(\frac{p(\bold{x})}{q(\bold{x})})d\bold{x}~~~~~~~~~~~~~~\texttt{(KL-Divergence)}$
  • $D_{KL}(p\mid\mid q) \geq 0, ~ D_{KL}(p\mid\mid q) = 0 ~ \mathrm{iff.} ~ p=q$

Information Theory

Shannon entropy

  • $\mathbb{H}[\bold{x}] = -\mathbb{E}[\ln(p(\bold{x}))]~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\texttt{(Shannon entropy)}$
  • $\mathbb{H}[\bold{x}]$ is not necessarily non-negative for a continuous random vector

Kullback-Leibler (KL) divergence

  • $D_{KL}(p\mid\mid q) = \int p(\bold{x})\ln(\frac{p(\bold{x})}{q(\bold{x})})d\bold{x}~~~~~~~~~~~~~~\texttt{(KL-Divergence)}$
  • $D_{KL}(p\mid\mid q) \geq 0, ~ D_{KL}(p\mid\mid q) = 0 ~ \mathrm{iff.} ~ p=q$

Mutual Information

  • $\mathbb{I}[\bold{x}]=D_{KL}(p(\bold{x})\mid\mid p(x_1)\dots p(x_M))~~~~~~\texttt{(Mutual Information)}$
  • $\mathbb{I}[\bold{x}]=0 ~\mathrm{iff.}~ x_1,\dots,x_M$ are mutually independent
  • $\mathbb{I}[\bold{x}]=\sum\limits_{m=1}^{M}\mathbb{H}[x_m]-\mathbb{H}[\bold{x}]$

III - Linear instantaneous mixtures

Blind source separation (BSS) (1/2)

Observation model

  • Linear instantaneous mixture model:
    $$\forall t, \bold{x}(t)=\bold{A}\bold{s}(t)$$
    $\quad\rightarrow\bold{A}\in\mathbb{R}^{M\times K}$: "mixing matrix"
  • Sources are assumed IID:
    $$p(\{s_k(t)\}_{k,t})=\prod\limits_{k=1}^{K}\prod\limits_{t=1}^{T}p_k(s_k(t))$$

Blind source separation (BSS) (1/2)

Observation model

  • Linear instantaneous mixture model:
    $$\forall t, \bold{x}(t)=\bold{A}\bold{s}(t)$$
    $\quad\rightarrow\bold{A}\in\mathbb{R}^{M\times K}$: "mixing matrix"
  • Sources are assumed IID:
    $$p(\{s_k(t)\}_{k,t})=\prod\limits_{k=1}^{K}\prod\limits_{t=1}^{T}p_k(s_k(t))$$
BSS problem: estimate $\bold{A}$ and sources $\bold{s}(t)$ given $\bold{x}(t)$

Blind source separation (BSS) (2/2)

Non-mixing matrix

  • A matrix $\bold{C}$ of dimension $K\times K$ is non-mixing iff. it has a unique non-zero entry in each row and each column
  • If $\tilde{\bold{s}}(t) = \bold{C}\bold{s}(t)$ and $\tilde{\bold{A}}=\bold{A}\bold{C}^{-1}$, then $\bold{x}(t)= \tilde{\bold{A}}\tilde{\bold{s}}(t)$ is another admissible decomposition of the observations
    $\quad\rightarrow$ Sources can be recovered up to a permutation and a multiplicative factor

Linear separation of sources

Model

  • Let
    $$\bold{y}(t)=\bold{B}\bold{x}(t)$$
    $\quad\rightarrow\bold{B}\in\mathbb{R}^{K\times M}$: "separation matrix"

Feasibility

  • Linear separation is feasible if $\mathrm{rank}(\bold{A})=K$
  • Under the previous condition, we get:
    $$\bold{B} = \begin{cases} \bold{B} = \bold{A}^{-1} & \mathrm{if} ~ M=K \\ \bold{B} = \bold{A}^{\dagger} = (\bold{A}^\top\bold{A})^{-1}\bold{A}^\top & \mathrm{if} ~ M>K \quad\quad \texttt{(pseudo-inverse)}\\ \emptyset & \mathrm{if} ~ M< K \end{cases} $$

Linear separation of sources

Model

  • Let
    $$\bold{y}(t)=\bold{B}\bold{x}(t)$$
    $\quad\rightarrow\bold{B}\in\mathbb{R}^{K\times M}$: "separation matrix"

Feasibility

  • Linear separation is feasible if $\mathrm{rank}(\bold{A})=K$
  • Under the previous condition, we get:
    $$\bold{B} = \begin{cases} \bold{B} = \bold{A}^{-1} & \mathrm{if} ~ M=K \\ \bold{B} = \bold{A}^{\dagger} = (\bold{A}^\top\bold{A})^{-1}\bold{A}^\top & \mathrm{if} ~ M>K \quad\quad \texttt{(pseudo-inverse)}\\ \emptyset & \mathrm{if} ~ M< K \end{cases} $$
In practice, the matrix $\bold{A}$ is unknown

IV - Independent component analysis

Indepedent component analysis (ICA) (1/2)

Problem Statement

  • $\bold{A}$ is unknown and we look for a matrix $\bold{B}$ that makes the $y_k(t)$ independent (ICA)
  • We get the equation:
    $$\bold{y}(t) = \bold{C}\bold{s}(t)$$
    $\quad\rightarrow$ where $\bold{C}=\bold{BA}$
    $\quad\rightarrow$ $\bold{C}$ is non-mixing $\implies$ the problem is solved.

Indepedent component analysis (ICA) (2/2)

Theorem (identifiability)

  • Let $\{s_k(t)\}_{k=1\dots K}$ be $K$ IID sources, among which at most one is Gaussian and:
    $$ \bold{y}(t) = \bold{C}\bold{s}(t) $$
    with $\bold{C}$ invertible (i.e. $M \geq K$).
Theorem: If the signals $y_k(t)$ are independent, then $\bold{C}$ is non-mixing

Whitening (1/3)

Assumption and canonical problem (CP)

  • Let suppose that $\mathbb{E}[\bold{s}(t)]=0$, $M\geq K$ and that $\bold{A}$ invertible.
  • (CP) We assume without loss of generality that:
    $$ \Sigma_{ss} = \mathbb{E}[\bold{s}(t)\bold{s}(t)^\top] = \bold{I}_{K} \quad\quad \texttt{(spatially white)} $$
  • Then
    $$ \Sigma_{xx} = \bold{A}\Sigma_{ss}\bold{A}^\top =\bold{A}\bold{A}^\top $$
    $\quad\rightarrow$ $\bold{A}$ is a square root matrix of $\bold{\Sigma}_{xx}$

Whitening (2/3)

Decorrelation (Whitening) of $\Sigma_{xx}$

  • $\Sigma_{xx}$ is diagonalizable in an orthonormal basis:
    $$ \Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top $$
    $\quad\rightarrow \Lambda= \mathrm{diag}(\lambda_1, \dots, \lambda_M)$ with $\lambda_1 \geq \dots \geq \lambda_K >\lambda_{K+1} = \dots =\lambda_{M} = 0$ $\quad\quad (\mathrm{rank}(\Sigma_{xx}) = K)$
  • Let $\bold{S} = \bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)} \in \mathbb{R}^{M\times K}$ then $\Sigma_{xx} = \bold{S}\bold{S}^\top$
  • Let $\bold{W} = \bold{S}^{\dagger}, \bold{z}(t) = \bold{W}\bold{x}(t)$ then:
    $$ \begin{cases} \mathbb{E}[\bold{z}(t)] = 0, \\ \Sigma_{zz} = \bold{W}\Sigma_{xx}\bold{W}^\top = \bold{I} \end{cases}\quad\quad\texttt{(z is a white signal)} $$

Whitening (3/3)

Conclusion

  • Without loss of generality : $\bold{U} := \bold{WA}$ is a rotation matrix ($\bold{UU}^\top = \bold{I}$)
  • Then :
    $$ \bold{y}(t) = \bold{U}^\top \bold{z}(t) = \bold{U}^\top\bold{W}\bold{x}(t) = \bold{s}(t) $$
  • We can thus assume $\bold{B} = \bold{U}^\top\bold{W}$ where $\bold{U}$ is a rotation matrix

Higher order statistics

  • One can estimate $\Sigma_{xx}$ from the observations and get $\bold{W}$
  • The whiteness property (second order cumulants) determines $\bold{W}$ and leaves $\bold{U}$ unknown
  • If sources are Gaussian, the $z_k$ are independent and $\bold{U}$ cannot be determined
In order to determine a rotation $\bold{U}$, we need to exploit the non-Gaussianity of sources and characterize the independence by using cumulants of ordrer greater than 2.

Contrast functions

Definition and optimization problem

  • $\phi$ is a contrast function (CF) iff.
    $$ \begin{cases} \phi[\bold{Cs}(t)] \geq \phi[\bold{s}(t)], \forall \bold{C} \\ \phi[\bold{Cs}(t)] = \phi[\bold{s}(t)] \Leftrightarrow \bold{C} \mathrm{\ is\ non-mixing} \end{cases} $$
  • Separation is performed by minimizing $\phi[\bold{y}(t)=\bold{Cs}(t)]$ wrt. $\bold{U}$ (or $\bold{B}$)

Contrast functions example

Contrast functions

Definition and optimization problem

  • $\phi$ is a contrast function (CF) iff.
    $$ \begin{cases} \phi[\bold{Cs}(t)] \geq \phi[\bold{s}(t)], \forall \bold{C} \\ \phi[\bold{Cs}(t)] = \phi[\bold{s}(t)] \Leftrightarrow \bold{C} \mathrm{\ is\ non-mixing} \end{cases} $$
  • Separation is performed by minimizing $\phi[\bold{y}(t)=\bold{Cs}(t)]$ wrt. $\bold{U}$ (or $\bold{B}$)

Contrast functions example

  • $\phi_{IM}[\bold{y}(t)]=\mathbb{I}[\bold{y}(t)] \qquad\qquad\qquad\qquad\quad\quad\texttt{(Canonical CF)}$
  • $\phi_{IM}^{\circ}[\bold{y}(t)]=\sum_{k=1}^{K}\mathbb{H}[y_k(t)] \mathrm{~with~} \Sigma_{yy} = \bold{I} \quad\texttt{(Orthogonal contrast)}$
  • $\phi_{ICA}^{\circ}[\bold{y}(t)]=\sum_{ijkl \neq iiii}(\kappa_{ijkl}^4[\bold{y}(t)])^2 \quad~~\quad\texttt{(Approximation of} ~ \phi^{\circ}_{IM}\texttt{)} $

Contrast functions

Definition and optimization problem

  • $\phi$ is a contrast function (CF) iff.
    $$ \begin{cases} \phi[\bold{Cs}(t)] \geq \phi[\bold{s}(t)], \forall \bold{C} \\ \phi[\bold{Cs}(t)] = \phi[\bold{s}(t)] \Leftrightarrow \bold{C} \mathrm{\ is\ non-mixing} \end{cases} $$
  • Separation is performed by minimizing $\phi[\bold{y}(t)=\bold{Cs}(t)]$ wrt. $\bold{U}$ (or $\bold{B}$)

Contrast functions example

  • $\phi_{IM}[\bold{y}(t)]=\mathbb{I}[\bold{y}(t)] \qquad\qquad\qquad\qquad\quad\quad\texttt{(Canonical CF)}$
  • $\phi_{IM}^{\circ}[\bold{y}(t)]=\sum_{k=1}^{K}\mathbb{H}[y_k(t)] \mathrm{~with~} \Sigma_{yy} = \bold{I} \quad\texttt{(Orthogonal contrast)}$
  • $\phi_{ICA}^{\circ}[\bold{y}(t)]=\sum_{ijkl \neq iiii}(\kappa_{ijkl}^4[\bold{y}(t)])^2 \quad~~\quad\texttt{(Approximation of} ~ \phi^{\circ}_{IM}\texttt{)} $

Descent algorithms ( Minimizing $\phi$ wrt. $\bold{B}$ or $\bold{U}$)

  • Gradient algorihm applied to $\bold{B}$
  • Parameterization of $\bold{U}$ as a product of Givens rotations and coordinate descent

Algorithm

  1. Estimation of $\Sigma_{xx}$

Algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$

Algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$

Algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$

Algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
  5. Estimation of $\bold{U}$ by minimizing a contrast function $\phi$

Algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
  5. Estimation of $\bold{U}$ by minimizing a contrast function $\phi$
  6. Estimation of source signals via $\bold{y}(t) = \bold{U}^\top \bold{z}(t)$

V - Second order methods

Temporal coherence of sources

Model and assumption

  • Model:
    $ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau)) \end{cases} \quad\texttt{(centered WSS vector process)} $
  • Canonical problem: we assume that $\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}$

Spatial whitening of $\bold{x}$

Temporal coherence of sources

Model and assumption

  • Model:
    $ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau)) \end{cases} \quad\texttt{(centered WSS vector process)} $
  • Canonical problem: we assume that $\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}$

Spatial whitening of $\bold{x}$

  • We define as before $\bold{S}$ a matrix square root of $\Sigma_{xx}$, $\bold{W}=\bold{S}^{\dagger}$ and $\bold{z}(t)=\bold{Wx}(t)$
  • Again because $\Sigma_{xx}=\bold{AA}^\top, \bold{U}:=\bold{WA}$ is a rotation matrix
  • However this time, $\forall \tau \in \mathbb{Z}, \bold{R}_{zz}(\tau) = \bold{U}\bold{R}_{ss}(\tau)\bold{U}^\top$

Temporal coherence of sources

Model and assumption

  • Model:
    $ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau)) \end{cases} \quad\texttt{(centered WSS vector process)} $
  • Canonical problem: we assume that $\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}$

Spatial whitening of $\bold{x}$

  • We define as before $\bold{S}$ a matrix square root of $\Sigma_{xx}$, $\bold{W}=\bold{S}^{\dagger}$ and $\bold{z}(t)=\bold{Wx}(t)$
  • Again because $\Sigma_{xx}=\bold{AA}^\top, \bold{U}:=\bold{WA}$ is a rotation matrix
  • However this time, $\forall \tau \in \mathbb{Z}, \bold{R}_{zz}(\tau) = \bold{U}\bold{R}_{ss}(\tau)\bold{U}^\top$
The joint diagonalization of $\bold{R}_{zz}(\tau)$ for several $\tau$ will provide a rotation matrix $\bold{U}$

Joint diagonalization

Unicity theorem

  • Let a set $\{\bold{R}_{zz}(\tau)\}_{\tau} \in \mathbb{R}^{K\times K}$ such that:
    $$ \bold{R}_{zz}(\tau) = \bold{U}\bold{R_{ss}(\tau)}\bold{U}^\top $$
    $\quad\rightarrow \bold{U}$ unitary and $\bold{R}_{ss}(\tau)=\mathrm{diag}(r_{s_k}(\tau))$.
$\bold{U}$ is unique $\Leftrightarrow \forall 1\leq k \neq l \leq K~ \exists \tau, r_{s_k}(\tau) \neq r_{s_l}(\tau)$

Joint diagonalization methods

  • minimize : $J(\bold{U}) = \sum_\tau\mid\mid \bold{U}^\top\bold{R}_{zz}(\tau)\bold{U} - \mathrm{diag}(\bold{U}^\top\bold{R}_{zz}(\tau)\bold{U}) \mid\mid_{F}^2$
  • Parameterization of $\bold{U}$ as a Givens rotations and coordinate descent

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
  5. Estimation of $\bold{R}_{zz}(\tau)$ for various delays $\tau$

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
  5. Estimation of $\bold{R}_{zz}(\tau)$ for various delays $\tau$
  6. Approximate joint diagonalization of $\bold{R}_{zz}(\tau)$ in a common basis $\bold{U}$

Second Order Blind Identification (SOBI) algorithm

  1. Estimation of $\Sigma_{xx}$
  2. Diagonalization: $\Sigma_{xx} = \bold{Q}\Lambda^2\bold{Q}^\top$
  3. Compute $\bold{S}=\bold{Q}_{(:, 1:K)}\Lambda_{(1:K, 1:K)}$ and $\bold{W}= \bold{S}^{\dagger}$
  4. Data whitening: $\bold{z}(t) = \bold{W}\bold{x}(t)$
  5. Estimation of $\bold{R}_{zz}(\tau)$ for various delays $\tau$
  6. Approximate joint diagonalization of $\bold{R}_{zz}(\tau)$ in a common basis $\bold{U}$
  7. Estimation of source signals via $\bold{y}(t) = \bold{U}^{\top}\bold{z}(t)$

Non-stationnarity of sources (1/2)

Model and assumption

  • Model: $ \begin{cases} \mathbb{E}[\bold{s}(t)]=0 \\ \Sigma_{ss}(t) = \mathbb{E}[\bold{s}(t)\bold{s}(t)^\top] \\ \cancel{\bold{R}_{ss}(\tau) = \mathbb{E}[\bold{s}(t+\tau)\bold{s}(t)^\top] = \mathrm{diag}(r_{s_k}(\tau))} \end{cases} \quad\texttt{(centered ind. vector process)} $
  • Canonical problem: we assume that $\cancel{\Sigma_{ss}=\bold{R}_{ss}(0)=\bold{I}}$

Method

  • Then $\Sigma_{xx}(t) = \bold{A}\Sigma_{ss}(t)\bold{A}^\top$ (we can assume $\Sigma_{xx}=\sum_t\Sigma_{xx}(t)=\bold{A}\bold{A}^\top$)
  • We minimize the criterion:
    $J(\bold{U}) = \sum_\tau\mid\mid \bold{U}\Sigma_{zz}(\tau)\bold{U}^\top - \mathrm{diag}(\bold{U}\Sigma_{zz}(\tau)\bold{U}^\top) \mid\mid_{F}^2$

Non-stationnarity of sources (2/2)

Variant SOBI algorithm

  1. Segmentation of mixture signals and estimation of covariance matrices $\Sigma_{xx}(t)$ on windows centered at different times $t$
  2. Joint diagonalization of matrices $\Sigma_{xx}(t)$ in a common basis $\bold{B}$
  3. Estimation of source signals via $\bold{y}(t) = \bold{Bx}(t)$

Conclusion of the first part

  • The use of higher order cumulants is only necessary for the non-Gaussian IID source model
  • Second order statistics are sufficient for sources that are:
    $\quad\rightarrow$ stationary but not IID (→ spectral dynamics)
    $\quad\rightarrow$non stationary (→ temporal dynamics)
  • Remember that classical tools (based on second order statistics) are appropriate for blind separation of independent (and possibly Gaussian) sources, on condition that the spectral / temporal source dynamics is taken into account

VI - Time-frequency methods

Time-frequency (TF) representations

Motivations

  • Spectral and temporal dynamics are highlighted by a TF representation of signals
  • TF domain: adequate to process convolutive and/or under-determined mixture

Use of a filter bank (STFT and MDCT)

  • Decomposition in $F$ sub-bands and decimation of factor $T \leq F$
  • Analysis filters $h_f$ and synthesis filters $g_f$
  • TF representation of mixture: $x_m(f,n) = (h_f\ast x_m)(nT)$
  • Perfect reconstruction: $x_m(t) = \sum_{f=1}^F\sum_{n\in \mathbb{Z}} g_f(t-nT)x_m(f,n)$
Then $\forall f,n ~ \bold{x}(f,n) = \bold{As}(f,n)\quad\texttt{(linear instantaneous mixture)}$

Non-stationary source model

Assumption: independent and centered second order processes

Model of non-stationary sources

  • if the time frames $n_1$ and $n_2$ are disjoint, then $s_k(., n_1)$ and $s_k(., n_2)$ are uncorrelated and of distinct variances

Model of WSS sources

  • if sub-bands $f_1$ and $f_2$ are disjoint then $s_k(f_1, .)$ and $s_k(f_2, .)$ are WSS, uncorrelated and of distinct variances

Time-frequency source model

  • all $s_k(f,n)$ are uncorrelated for all $n$ and $f$, of distinct variances $\sigma^2_{k}(f,n)$ ($\implies$ time-frequency dynamics)

Separation method

Separation by joint matrix diagonalization

  • We have $\Sigma_{xx}(f,n) = \bold{A}\Sigma_{ss}(f,n)\bold{A}^\top$ where $\Sigma_{ss}(f,n) = \mathrm{diag}(\sigma_k^2(f,n))$

Variant of the SOBI algorithm

  1. TF analysis of the mixtures $x_k(f,n)$
  2. Estimation and diagonalization of $\Sigma_{xx}(f,n)$ in a basis $\bold{B}$
  3. Estimation of the source signals via $\bold{y}(f,n) = \bold{Bx}(f,n)$
  4. TF synthesis of the sources:
    $$ y_k(t) = \sum_{f=1}^{F} \sum_{n\in \mathbb{Z}} g_f(t-nT)y_k(f,n) $$

VII - Convolutive mixtures

Source images

Instantanenous mixture model: unsuitable for real acoustic mixtures

Mixture of source image model

  • Let $\bold{x}_k(f,n) \in \mathbb{R}^M$ be the source image of $s_k(f,n)$
    $\quad\rightarrow$ received multichannel signal if only source $s_k(f,n)$ was active
  • Mixture model: $\bold{x}(f,n) = \sum_{k=1}^K \bold{x}_k(f,n)$

Decomposition of the source separation problem

  • Separation: estimate $\bold{x}_k(f,n)$ from the mixture $\bold{x}(f,n)$
  • Deconvolution: estimate $s_k(f,n)$ from the source image $\bold{x}_k(f,n)$

Convolution linear mixture

Convolutive mixture model

  • $x_m(t) = \sum_{k=1}^{K}(a_{mk} \ast s_k)(t)$, i.e. $\bold{x}(t) = \bold{A} \ast \bold{s}(t)$

Identifiability theorem

  • Let $s_k(t)$ be $K$ IID sources, among which at most one is Gaussian, and $\bold{y}(t) = \bold{C} \ast \bold{s}(t)$ with $C$ invertible ((over)-determined case). If signals $y_k(t)$ are independent, then $\bold{C}$ is non-mixing.

Time-Frequency approach

Mixture model and narrow-band approximation

  • $x_m(t) = \sum_{k=1}^{K}(a_{mk} \ast s_k)(t)$,
  • the filter bank corresponds to an STFT
  • the IR of $a_{mk}$ is short compared with the window length
  • $\forall m,k,f, a_{mk}(\nu)$ varies slowly compared with $h_f(\nu)$

Approximation of the convolutive mixture model

  • $x_m(f,n)=\sum_{k=1}^K a_{mk}(f)s_k(f,n)$ i.e. $\bold{x}(f,n)=\bold{A}(f)\bold{s}(f,n)$
    $\quad\rightarrow$ $F$ instantaneous mixture models in every sub-band
    $\quad\rightarrow$ we can use any ICA method in every sub-band

Independent component analysis

  • Let $\bold{y}(f,n) = \bold{B}(f)\bold{x}(f,n)$ where $\bold{B}(f)\in\mathbb{C}^{K\times M}$
  • Linear separation is feasible if $\bold{A}(f)$ has rank $K$.
    $\quad\rightarrow\bold{y}(f,n)=\bold{s}(f,n)$ with $\bold{B}(f)= \begin{cases} \bold{A}(f)^{-1} & \mathrm{if~} M=K \\ \bold{A}(f)^{\dagger} & \mathrm{if~} M>K \\ \emptyset & \mathrm{if~} M< K \end{cases} $
  • In practice $\bold{A}(f)$ is unknown:
    $\quad\rightarrow$ We look for $\bold{B}(f)$ that makes $y_k(f,n)$ independent (ICA)
    $\quad\rightarrow$ We get $\bold{y}(f,n) = \bold{C}(f)\bold{s}(f,n)$ where $\bold{C}(f) = \bold{B}(f)\bold{A}(f)$
    $\quad\rightarrow \bold{C}(f)$ is non-mixing

Indeterminacies

Indeterminacies (perumatations and multiplicatives factors) in matrices $\bold{C}(f)$
  • $\forall k$, identify indexes $k, f$ such that $\forall f, y_{k_f}(f,n)=c_{k_f,k}s_k(f,n)$
  • identify the multiplicative factors $c_{k_f,k}$
Infinitely many solutions $\implies$ need to constrain the model

Assumptions on the mixture and sources

  • continuity of the frequency responses $a_{mk}(f)$ with respect to $f$
    $\quad\rightarrow$ beamforming model or anechoic model
  • similarity of the temporal dynamics of $\sigma_k^2(f,n)$ (or NMF model)

Convolutive mixture models

Beamforming model

  • Assumptions: plane waves, far field, no reverberation, linear antenna
  • Model: $a_{mk}(f)=e^{-2i\pi f\tau_{mk}}$ where $\tau_{mk}=\frac{d_m}{c}\sin(\theta_k)$
  • Parameters: positions $d_m$ of the sensors and angles $\theta_k$ of the sources

Anechoic model

  • Assumptions: punctual sources, no reverberation
  • Model: $a_{mk}(f)=\alpha_{mk}e^{-2i\pi f\tau_{mk}}$ where $\tau_{mk}=\frac{r_{mk}}{c}$ and $\alpha_{mk} = \frac{1}{\sqrt{4\pi}r_{mk}}$
  • Parameters: distances $r_{mk}$ between the sensors and sources

VIII - Under-determined mixtures

Under-determined convolutive mixtures

Usual case in audio: monophonic $(M = 1)$ or stereophonic $(M = 2)$ mixtures, with a number of sources $K > M$

Convolutive mixture model and assumption

  • $\bold{x}(f,n)=\bold{A}(f)\bold{s}(f,n)$ with $M< K$
  • We assume that $\bold{A}(f)$ and $\Sigma_{ss}(f,n) = \mathrm{diag}(\sigma^2_k(f,n))$ are known
  • Even in this case, there is no matrix $\bold{B}(f)$ such that $\bold{B}(f)\bold{A}(f)=\bold{I}_K$

Separation via non-stationary filtering

Let $\bold{y}(f,n) = \bold{B}(f,n)\bold{x}(f,n)$ where $\bold{B}(f,n) \in \mathbb{C}^{K\times M}$

Mean-Squared error estimation

  • We look for $\bold{B}(f,n)$ which minimizes $\mathbb{E}[\mid\mid\bold{y}(f,n)-\bold{s}(f,n)\mid\mid^2_{2} ]$
  • Sol.: $\bold{B}(f,n) = \Sigma_{sx}(f,n)\Sigma_{xx}(f,n)^{-1} ~~~~~~ \texttt{(Multichannel Wiener filter)}$
    $\quad\rightarrow \Sigma_{xx}(f,n) = \bold{A}(f)\Sigma_{ss}(f,n)\bold{A}(f)^{\mathrm{H}}, \Sigma_{sx}(f,n) = \Sigma_{ss}(f,n)\bold{A}(f)^{\mathrm{H}} $
  • $\bold{x}(f,n)=\bold{A}(f)\bold{y}(f,n)$ (exact reconstruction)

Particular case: monophonic mixtures

  • Without loss of generality, we define $\bold{A}(f)=\left[\begin{array}{ccc} 1 & \dots & 1\\ 0 & \cdots & 0\\ \vdots & \cdots & \vdots \end{array}\right]$
  • Then $y_k(f,n) = \frac{\sigma_k^2(f,n)}{\sum_{k^\prime=1}^K\sigma_{k^\prime}(f,n)} x(f,n) ~~~~~~~\texttt{(Wiener filter)}$

Separation algorithm

  1. TF analysis: $x_k(f,n) = (h_f \ast x_k)(f,nT)$
  2. Estimation of $\bold{A}$ and $\sigma^2_k(f,n)$
    $\quad\rightarrow$ instantaneous mixture model
    $\quad\rightarrow$ sparse source model
  3. Compute $\bold{B}(f,n) = \Sigma_{sx}(f,n)\Sigma_{xx}(f,n)^{-1}$
  4. Estimation of the source signals via $\bold{y}(f,n)=\bold{B}(f,n)\bold{x}(f,n)$
  5. TF synthesis of the sources: $y_k(t) = \sum_{f=1}^F \sum_{n\in \mathbb{Z}} g_f(t-nT)y_k(f,n)$

Stereophonic mixtures: temporal sparsity

Case of a linear instantanous stereophonic mixture: $\bold{x}(t)= \bold{A}\bold{s}(t)$

Sparsity in a transformed domain

Case of a linear instantanous stereophonic mixture: $\bold{x}(f,n)= \bold{A}\bold{s}(f,n)$

Degenerate Unmixing Estimation Technique (DUET) (1/2)

Mixture model and directivity assumption

  • Model in sterephonic case $(K=2)$: $\bold{x}(f,n)=\bold{A}\bold{s}(f,n)$
  • Without loss of generality, we assume $\bold{A}_{(:,k)}=[\cos(\theta_k), \sin(\theta_k)]^{\top}, \forall k$

Sparse source model

  • $\forall (f,n), \exists ! k_{(f,n)}$ such that $\sigma^2_{l_{(f,n)}} = \begin{cases} 0 & \mathrm{if~} l \neq k_{(f,n)}, \\ u_{l_{(f,n)}} > 0 & \mathrm{otherwise} \end{cases} $
  • If only source $k$ is active at $(f,n),$ then $\bold{x}(f,n) = \bold{a}_{k}s_k(f,n)$

Degenerate Unmixing Estimation Technique (DUET) (2/2)

  1. TF analysis: $x_k(f,n) = (h_f \ast x_k)(f,nT)$
  2. Estimation of $\theta_{k}$ and of the active source $k_{(f,n)}$
    $\quad\rightarrow$ computation of the histogram of the angles of vectors $\bold{x}(f,n)$
    $\quad\rightarrow$ peak detection in order to estimate $\theta_k$
    $\quad\rightarrow$ determination of the active source by proximity with $\theta_{k}$
  3. Source separation: for all $k$,
    $\quad\rightarrow$ estimation of source images via binary masking:
    $\quad \bold{y}_k(f,n)= \begin{cases} 0 & \mathrm{if~} k \neq k_{(f,n)}, \\ \bold{x}(f,n) & \mathrm{otherwise} \end{cases} $
  4. MMSE estimation of the sources : $y_k(f,n) = \frac{\bold{a}_k(f)^{\mathrm{H}}}{\mid\mid\bold{a}_k(f)\mid\mid^2_2}\bold{y}_{k}(f,n)$
  5. TF synthesis of the sources: $y_k(t) = \sum_{f=1}^F \sum_{n\in \mathbb{Z}} g_f(t-nT)y_k(f,n)$

IX - Conclusion

Conclusion

Summary

  • Source separation requires to make assumptions about the mixture and sources
  • For an (over-)determined instantaneous linear mixture, the assumption of independent sources is sufficient
  • In all other cases, we need to model the mixture and/or the sources

Perspectives

  • Non-stationary mixtures (adaptive algorithms)
  • Informed source separation (e.g. from music score)
  • Deep learning techniques
  • Objective assessment of audio source separation

Bibliography

Audio source separation and Blind source separation

  • Emmanuel Vincent, Tuomas Virtanen, and Sharon Gannot. Audio Source Separation and Speech Enhancement. Wiley Publishing, 1st edition, 2018.
  • Jean-François Cardoso. Blind signal separation: statistical principles. Proceedings of the IEEE, 86(10):2009–2025, 1998.

Independent Component Analysis

  • Pierre Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287 – 314, April 1994. Special issue on Higher-Order Statistics.
  • Pierre Comon and Christian Jutten. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press, Inc. (Elsevier), USA, 1st edition, 2010.

Informed Source separation

  • Sebastian Ewert and Meinard Müller. Multimodal Music Processing, volume 3, chapter Score- Informed Source Separation for Music Signals, pages 73–94. January 2012.
  • Simon Leglaive, Roland Badeau, Gael Richard. Multichannel Audio Source Separation with Probabilistic Reverberation Priors. IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2016, 24 (12), pp.2453-2465.

Multichannel Nonnegative Matrix Factorization for audio source separation

  • Alexey Ozerov and Cédric Févotte. Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation. IEEE Transactions on Audio, Speech, and Language Process- ing, 18(3):550–563, 2010.

Multichannel Deep learning audio source separation

  • Aditya Arie Nugraha, Antoine Liutkus, Emmanuel Vincent. Multichannel audio source separation with deep neural networks. [Research Report] RR-8740, INRIA. 2015.

To go further: Toolboxes

SOBI algorithm

Independent vector analysis, Independent low-rank matrix analysis, FastMNMF

Matrix tensor factorization

Deep Learning techniques