Anatoly Khina
Assistant Professor
School of Electrical Engineering
Tel Aviv University
anatolyk@eng.tau.ac.il
More information can be found here
About the Lab
We study the interface between statistics, information theory, and learning theory. We apply our results in signal processing, communications, and control theory.
If you're interested in the math behind systems research, let us know! (and keep reading...)
In parallel, we are setting up a juggling robot lab. The setup is rather primitive at this point but the work on it is rewarding. Check its page
â€‹
If you're interested in working on this setup (BSc/MSc project), let us know!
Correlation vs. Causation
Lately, we've been studying various aspects of correlation and causation:

How to measure correlation and dependence?

How to determine causation and its strength? Did the roof collapse under the cat's weight or did the cat simply locate a cozy spot on the roof?

How to distinguish causation and efficacy (and what is it exactly?) from correlation and dependence?

What can be inferred from static data? Is it even possible to determine causation from (cat) still images?

What can be inferred passively and actively for processes in time?
Initialization of Neural Networks
The performance of neural networks is known to be highly susceptible to initialization. To understand what initialization to use for different types of data, we study the geometry of the outputs of randomly initialized fullyconnected neural nets (FNNs) and convolutional neural nets (CNNs) as a function of the inputs (and their type). For example, in a recent paper, we demonstrate that the cosine similarity ρ between two natural images x and y,
ρ = cos(θ) = <x,y> / â€–xâ€–â€–yâ€–
is essentially preserved, after a CNN layer with ReLU activation. That is, randomly initialized CNNs behave quite diffrently from FNNs, the cosine similarity of which is known to shrink according to a known function. We further notice that, for unstructured (Gaussian i.i.d.) inputs, random CNNs and random FNNs behave the same, and derive inner and upper bound for the similarity contraction after each layer.
Information Velocity
What is the maximal speed —the Information Velocity—at which information can propagate reliably in a network through a cascade of links?
Information theory tells us what is the maximal rate of reliable communications through a small number of such links when delay is not important. However, if we want to communicate in real time, the answer to this question becomes much more challenging.
In fact, even for the simple case of transmitting a single bit over a tandem of links, where each flips its input independently (across time & links), an answer was given only very recently.
For more bits/packets and more links, the question becomes much more challenging to answer.
We study this problem for various link & network models. In particular, for the problem of packetbased links with ACK signals in the presence of independent packet erasures, we determine the information velocity. Interestingly, the information velocity does not depend on the exact arrival process but rather only on its arrival rate, when the number of links is large. Moreover, we determine how the endtoend arrivefailure probability decays when transmission is carried at a speed below the information velocity.
Packet arrival are i.i.d. with probability 0.5. The probability of erasure at each link is 0.01.
Here you see how the probability Pe of arrivefaulire decays as a function of the speed α, which is defined as the ratio between the number of links r and number of time steps N. The step graphs are empirical, whereas the dashed lines are theoretical lower bounds. The vertical black dashed line denotes the information velocity, at which all the curves intersect.
What happens when there are no ACK/NACK signals? What happens when the erasure events depend across time or links? What happens for other links (check our new work on Gaussian channels with feedback)? What happens for more complex networks? These are all great question we wish to study!
TwoSided Prediction
"It is difficult to make predictions, especially about the future."—Karl Kristian Steincke

What about twosided prediction, that is, prediction of the present from both the
future and the past?

Why is it interesting?

How does it relate to properties of covariance (sub)matrices?

How can it be used when "learning to control"?
â€‹â€‹
DegreesofFreedom Mismatch
Consider the following three riddles.
Riddle 1: We want to transmit one Gaussian source sample S~N (0, 1) over one additive Gaussian noise channel use
Y = X + Z,
where

X = f(S ) is the channel input, subject to a channelinput power constraint E[X ²] ≤ 1

Z ~ N (0, 1/SNR) is Gaussian noise (independent of S )

SNR is the signaltonoise ratio

Y is the output measurement
Goal: Construct an estimate Åœ = g(Y )â€‹ of S given Y that minimizes the mean square error (MSE): E[(S  Åœ )²].
Q: What are the mappings f(·) and g(·) that minimize the MMSE?
A: The best you can do is send the source "as is": X = S
(can be proved using information theory)
Answer
That wasn't too surprising. Try now solving the following variations.
Riddle 2: Now we want to transmit one Gaussian source sample S ~ N (0,1) over two additive white Gaussian noise (AWGN) channel uses
Y [1] = X [1] + Z [1]
Y [2] = X [2] + Z [2]
where

X [1] = f (S ), X [2] = h(S ) are the channel inputs, subject to a total input power constraint E{X ²[1]} + E{X ²[2]} ≤ 1

Z [1], Z [2] ~ i.i.d. N (0, 1/SNR) are the Gaussian noises (independent of S )

Y [1], Y [2] are the output measurements
â€‹
Goal: Construct an estimate Åœ = g(Y [1], Y [2] )â€‹ of S given (Y [1], Y [2] ) that minimizes the mean square error (MSE): E[(S  Åœ )²].
â€‹
Q: What is the mapping f(·) that minimizes the MMSE?
A1: X [1] = X [2] = S is bad! It achieves the same performance as in Riddle 1!
Hint
A3: The optimal solution is not known!
A2: Use a nonlinear solution!

MSE improves quadratically with the SNR instead of linearly!
Answer(s)
This scenario pops up in control over communication links when the control sampling rate differs from the communications signaling rate. Read more here
Riddle 3: Now we transmit one Gaussian source sample S ~ N (0, 1) over infinitely many AWGN channel uses
Y [t ] = X [t ] + Z [t ], t =1, 2, ...
where

X is the channel input process, subject to a channelinput power constraint ΣE[X ²] ≤ 1

Z ~ i.i.d. N(0, 1/SNR) is Gaussian noise (independent of S )

SNR is the signaltonoise ratio

Y is the output measurement
Goal: Construct an estimate Åœ = g(Y )â€‹ of S given the entire process Y that minimizes the mean square error (MSE): E[(S  Åœ )²].
Q: What are the mappings f(·) and g(·) that minimize the MMSE?
Want to know more? Check these:
Control Over Noisy Communication Media (ConCom)
Check our ConCom research project webpage