Anatoly Khina

Assistant Professor
School of Electrical Engineering
Tel Aviv University

anatolyk@eng.tau.ac.il

More information can be found here

About the Lab

We study the interface between statistics, information theory, and learning theory. We apply our results in signal processing, communications, and control theory.

If you're interested in the math behind systems research, let us know! (and keep reading...)

In parallel, we are setting up a juggling robot lab. The setup is rather primitive at this point but the work on it is rewarding. Check its page

If you're interested in working on this setup (BSc/MSc project), let us know!

Correlation vs. Causation

Lately, we've been studying various aspects of correlation and causation:

How to measure correlation and dependence?
How to determine causation and its strength? Did the roof collapse under the cat's weight or did the cat simply locate a cozy spot on the roof?
How to distinguish causation and efficacy (and what is it exactly?) from correlation and dependence?

What can be inferred from static data? Is it even possible to determine causation from (cat) still images?
What can be inferred passively and actively for processes in time?

Correlation vs. causation

Initialization of Neural Networks

The performance of neural networks is known to be highly susceptible to initialization. To understand what initialization to use for different types of data, we study the geometry of the outputs of randomly initialized fully-connected neural nets (FNNs) and convolutional neural nets (CNNs) as a function of the inputs (and their type). For example, in a recent paper, we demonstrate that the cosine similarity ρ between two natural images x and y,

ρ = cos(θ) = <x,y> / ‖x‖‖y‖

is essentially preserved, after a CNN layer with ReLU activation. That is, randomly initialized CNNs behave quite diffrently from FNNs, the cosine similarity of which is known to shrink according to a known function. We further notice that, for unstructured (Gaussian i.i.d.) inputs, random CNNs and random FNNs behave the same, and derive inner and upper bound for the similarity contraction after each layer.

More details

Neural net initialization

Information Velocity

What is the maximal speed —the Information Velocity—at which information can propagate reliably in a network through a cascade of links?

Information theory tells us what is the maximal rate of reliable communications through a small number of such links when delay is not important. However, if we want to communicate in real time, the answer to this question becomes much more challenging.

In fact, even for the simple case of transmitting a single bit over a tandem of links, where each flips its input independently (across time & links), an answer was given only very recently.

For more bits/packets and more links, the question becomes much more challenging to answer.

We study this problem for various link & network models. In particular, for the problem of packet-based links with ACK signals in the presence of independent packet erasures, we determine the information velocity. Interestingly, the information velocity does not depend on the exact arrival process but rather only on its arrival rate, when the number of links is large. Moreover, we determine how the end-to-end arrive-failure probability decays when transmission is carried at a speed below the information velocity.

Packet arrival are i.i.d. with probability 0.5. The probability of erasure at each link is 0.01.

Here you see how the probability Pe of arrive-faulire decays as a function of the speed α, which is defined as the ratio between the number of links r and number of time steps N. The step graphs are empirical, whereas the dashed lines are theoretical lower bounds. The vertical black dashed line denotes the information velocity, at which all the curves intersect.

More details

What happens when there are no ACK/NACK signals? What happens when the erasure events depend across time or links? What happens for other links (check our new work on Gaussian channels with feedback)? What happens for more complex networks? These are all great question we wish to study!

Information velocity

Two-Sided Prediction

"It is difficult to make predictions, especially about the future."—Karl Kristian Steincke

What about two-sided prediction, that is, prediction of the present from both the

future and the past?

Why is it interesting?
How does it relate to properties of covariance (sub)matrices?
How can it be used when "learning to control"?

More details

Two-sided prediction

Degrees-of-Freedom Mismatch

Consider the following three riddles.

Riddle 1: We want to transmit one Gaussian source sample S~N (0, 1) over one additive Gaussian noise channel use

Y = X + Z,

where

X = f(S ) is the channel input, subject to a channel-input power constraint E[X ²] ≤ 1
Z ~ N (0, 1/SNR) is Gaussian noise (independent of S )
SNR is the signal-to-noise ratio
Y is the output measurement

Goal: Construct an estimate Ŝ = g(Y ) of S given Y that minimizes the mean square error (MSE): E[(S - Ŝ )²].

Q: What are the mappings f(·) and g(·) that minimize the MMSE?

A: The best you can do is send the source "as is": X = S
(can be proved using information theory)

Answer

That wasn't too surprising. Try now solving the following variations.

Riddle 2: Now we want to transmit one Gaussian source sample S ~ N (0,1) over two additive white Gaussian noise (AWGN) channel uses

Y [1] = X [1] + Z [1]
Y [2] = X [2] + Z [2]

where

X [1] = f (S ), X [2] = h(S ) are the channel inputs, subject to a total input power constraint E{X ²[1]} + E{X ²[2]} ≤ 1
Z [1], Z [2] ~ i.i.d. N (0, 1/SNR) are the Gaussian noises (independent of S )
Y [1], Y [2] are the output measurements

Goal: Construct an estimate Ŝ = g(Y [1], Y [2] ) of S given (Y [1], Y [2] ) that minimizes the mean square error (MSE): E[(S - Ŝ )²].

Q: What is the mapping f(·) that minimizes the MMSE?

A1: X [1] = X [2] = S is bad! It achieves the same performance as in Riddle 1!

Hint

A3: The optimal solution is not known!

A2: Use a non-linear solution!

MSE improves quadratically with the SNR instead of linearly!

Answer(s)

This scenario pops up in control over communication links when the control sampling rate differs from the communications signaling rate. Read more here

Riddle 3: Now we transmit one Gaussian source sample S ~ N (0, 1) over infinitely many AWGN channel uses

Y [t ] = X [t ] + Z [t ], t =1, 2, ...

where

X is the channel input process, subject to a channel-input power constraint ΣE[X ²] ≤ 1
Z ~ i.i.d. N(0, 1/SNR) is Gaussian noise (independent of S )
SNR is the signal-to-noise ratio
Y is the output measurement