Work in progress
This document is a curated collection of brainteasers frequently encountered in finance-related job interviews. It originated from recurring student questions and aims to serve as both a reference and a learning tool.
Each problem has been selected not just for its difficulty or elegance, but because it illustrates a specific reasoning technique or mathematical insight that can be valuable to remember. The goal is to help you recognize common patterns, sharpen your problem-solving skills, and build confidence in interview settings.
We flip a fair coin infinitely many times. What is the probability that the sequence HHT appears before the sequence HTT?
To approach this problem, we begin by identifying relevant states that summarize the history of coin flips in a way that captures all the necessary information. These states form the nodes of a decision tree (where heads are placed upwards and tails downwards):
After analysis, the process can be reduced to six key states: \({S_0, H, HH, HT, HHT, HTT}\)
We can now write the transition matrix between these states. Each entry \(p_{ij}\) represents the probability of transitioning from state \(i\) to state \(j\) in the next flip:
[ M = \[\begin{array}{c|cccccc} & S_0 & H & HH & HT & HHT & HTT \\ \hline S_0 & 1/2 & 1/2 & 0 & 0 & 0 & 0 \\ H & 0 & 0 & 1/2 & 1/2 & 0 & 0 \\ HH & 0 & 0 & 1/2 & 0 & 1/2 & 0 \\ HT & 0 & 1/2 & 0 & 0 & 0 & 1/2 \\ HHT & 0 & 0 & 0 & 0 & 1 & 0 \\ HTT & 0 & 0 & 0 & 0 & 0 & 1 \\ \end{array}\]]
Let \(p_0\) be the initial state distribution, and \(p_n\) the state distribution after \(n\) flips. In our case:
\(p_0 = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}^\top \quad \text{and} \quad p_n = (M^{\top})^n p_0\)
and \[p_n=(M^{\top})^n p_0 \tag{#eq:dynaMarkov} \] ### Solving
We simulate the evolution of the state probabilities over time thanks to formula @ref{eq: dynaMarkov} to observe the dynamics of the system.
One can see that there are 2/3 chances to get HHT before HTT.
We flip a fair coin infinitely many times. What is the average number of flips before I get two H in a row for the first time?
We again define a set of states that capture the key information needed to follow the evolution toward the target pattern HH.
The model in this case is simpler and only involves four states:
\(S_0\): the starting point, where no heads have been observed.
\(H\): the last flip was a Head, and the previous one was not.
\(HH1\): this is the first time we reach two consecutive Heads.
\(HH2\): \(HH\) has already occurred before (for completeness, we include this as a terminal state).
The Markov transition matrix becomes:
\[ M = \begin{array}{c|cccc} & S_0 & H & HH1 & HH2 \\ \hline S_0 & 1/2 & 1/2 & 0 & 0 \\ H & 1/2 & 0 & 1/2 & 0\\ HH1 & 0 & 0 & 0 & 1\\ HH2 & 0 & 0 & 0 & 1 \\ \end{array} \] Letting the initial distribution be:\(p_0 = \begin{pmatrix} 1 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}^\top \quad\)
we again evolve the state distribution as: \[ p_n=M^{\top n} p_0 \]
with \[p_0 = \begin{pmatrix} 1 & 0 & 0 & 0 \end{pmatrix}^\top\]
We are now interested in the expected number of flips until the first occurrence of the \(HH\) pattern.
\[ \begin{aligned} \bar{n} &=\sum_{1}^{\infty} i\times p_i(HH1) \\ &= \sum_{1}^{\infty} i\times ( M^{\top i} p_0)_3 \end{aligned} \]
We can either find an analytical solution or solve this numerically.
Numerically, we observe that the expected number of draws converges to approximately 6, as the number of simulations increases.
A Standard Brownian Motion is the only process \(W_t\) which:
The Wiener integral is defined as \[ \begin{align} \int_{s}^{t} f_u \, dW_u \equiv \lim_{n \to \infty} \sum_{i=0}^{n-1} f_{u_i}(W_{u_{i+1}} - W_{u_i}) \end{align} \] where \(f_u\) is an adapted process. Since all the \((W_{u_{i+1}} - W_{u_i})\) are independent, so are the \(dWu\). Hence the Wiener can be seen as a sum of independent Gaussian variables of mean 0 and local variance \(f_u^2du\) which is deterministic when \(f\) is deterministic.
Itô isometry The Wiener integral has zero mean and variance \(\int_{s}^{t} f_u^2 \, du\). As a corollary \[ \begin{align} cov(\int_{s}^{t} f_u \, dW_u, \int_{s}^{t} g_u \, dW_u)= \int_{s}^{t} f_u g_u du. \tag{3.1} \end{align} \]
It also interesting to note that \(W_t\) and be seen as a Wiener integral with integrand \(f_u=1_{u\leq t}\)
What is the covariance between \(W_t\) and \(W_s\) ? What is the correlation between \(W_t\) and \(W_s\) ?
Because \(W_t=\int_{0}^{\infty}1_{u\leq t}dW_u\) and \(W_s=\int_{0}^{\infty}1_{u\leq s}dW_u\), a straight application of (3.1) yields:
\[\begin{align} cov(W_t,W_s) &=\int_{0}^{\infty}1_{u\leq s}1_{u\leq t}du \\ &=min(s,t) \end{align}\]
Clearly also \[Var(W_t)=t\]
and hence, if \(s\leq t\)
\[ \begin{align} cor(W_t,W_s) &=\frac{cov(W_t,W_s)}{\sqrt{Var(W_t)Var(W_s)}} \\ &= \frac{s}{\sqrt{st}} \\ &=\sqrt{\frac{s}{t}} \end{align} \]
We roll two dices with 6 faces. We are paid the maximum of the two dice numbers. What is our average gain ?
While the outcomes range from 1 to 6, we generalize the result for a dice with \(n\) faces.
For a given dice, the probability of rolling a number less than or equal to is \(P_i=\frac{i}{n}\)
Because the dice are independent, the probability that both rolls are less than or equal to \(i\) is equal to \(P_i^2\).
Note that, the probability that the maximum of \(n\) independent random variables is lower than a certain level is equal to the probability of each of them being lower. Hence the cdf of the maximum is equal to the cdf of each variables to the power of \(n\)
Hence the probability \(p_i\) of being exactly equal to \(i\) is \[ \begin{align} \forall i \geq 2,\quad p_i &= P_i^2 - P_{i-1}^2\\ &= \frac{1}{n^2} (2i-1) \end{align} \] and \(p_1=\frac{1}{n^2}\) as the base case.
Thus, the expected value of the maximum is: \[ \begin{align} \hat{g_n} &= \frac{1}{n^2} \sum_{i=1}^{n} i \cdot(2i - 1) \\ &= \frac{1}{n^2} \left( 2 \sum_{i=1}^{n} i^2 - \sum_{i=1}^{n} i \right) \\ &= \frac{1}{n^2} \left( 2 \cdot \frac{n(n+1)(2n+1)}{6} - \frac{n(n+1)}{2} \right) \\ &= \frac{(n+1)(4n - 1)}{6n} \end{align} \] In our case with \(n=6\), \(\hat{g_6} = \frac{161}{36} \approx 4.47.\)
You are in a place where taxis arrive at a constant rate. At that place, the probability that at least one taxi arrives before 3 minutes is equal to 97.3%. What is the probability that at least one taxi arrives before 1 minute?
Let \(\tau\) the (random) arrival time of the first taxi. When the arrival rate is constant, \(\tau\) follows an exponential distribution, and thus:
\[ \Pr(\tau \geq t) = e^{-\lambda t} \]
The probability that no taxi arrives before 3 minutes is: \[ P_3 = \Pr(\tau \geq 3) = e^{-3\lambda} \]
By the properties of the exponential distribution, we have: \[ \Pr(\tau \geq 1) = \Pr(\tau \geq 3) ^{1/3} \quad \Rightarrow \quad P_1 = P_3^{1/3} \]
In our case, since \(P_3 = 2.7\% = 0.027\), we get: \[ P_1 = 0.027^{1/3} = 30\% \]
Therefore, the probability that at least one taxi arrives before 1 minute is: \[ 1 - P_1 = 1 - 0.3 = \boxed{70\%} \] which is our result.
Let \(X\) be an asset and \(X_1,\;X_2\) two other assers. \(X\) has correlation 1/2 with both \(X_i\). What is the range of possible correlations \(c\) between \(X\) and \(\frac{1}{2}(X_1+X_2)\) ?
First we wonder
Let us solve this for a general correlation a.
Consider the symmetric matrix:
\[ M = \begin{pmatrix} 1 & a & a \\\\ a & 1 & b \\\\ a & b & 1 \end{pmatrix} \]
where \( a \in [-1, 1] \) represents a correlation coefficient and \(b\) the correlation between \(X_1\) and \(X_2\).
Standard calculations yield: \[ \det M = -\left(b - a^2\right)^2 + (1 - a^2)^2 \]
which is positive if and only if
\[ \boxed{ b \in \left(2a^2 - 1,\; 1\right) } \]
In the case where \(a=1/2\), \(b_min=-1/2\).
\(X_1\) and \(X_2\) can be thought as two vectors belonging to a cone which center is \(X\) and with angle equal to \(\alpha=\arccos(a)\). Let \(\beta=\arccos(b)\)
The angle \(\beta\) can vary between 0 (when \(X_1\) and \(X_2\) are aligned), and \(2\alpha\) (when \(X_1\) and \(X_2\) are symmetrical w.r.t. to \(X\)).
Thus the conditions on angle yields \[0 \leq \beta \leq 2 \alpha\] By applying cos to the previous \[ \cos(2\alpha) \leq b \leq 1\] Now \(\cos(2\alpha)=2\cos^2(\alpha)-1\) which is equivalent to the previous condition.
Assume without loss of generality that all vectors are normalized: \(\left\Vert X\right\Vert=\left\Vert X_1\right\Vert=\left\Vert X_2\right\Vert=1\) of
\[ \begin{align} c=cor(X,\frac{1}{2}(X_1+X_2)) & =cor(X,X_1+X_2) \\ &=\frac{cov_1+cov_2}{\left\Vert X_1+X_2\right\Vert } \end{align} \] - if \(\operatorname{cor}(X_1,X_2)=1\), then \(\left\Vert X_1+X_2\right\Vert=2\) and \(c=1/2\). Adding more of the same does not change anything. - if \(\operatorname{cor}(X_1,X_2)=-1/2\), then \(\left\Vert X_1+X_2\right\Vert=1\) and \(c=1\)
Thus, in that specific situation, adding a new asset to a portfolio can only increase correlation up to a point where this correlation can be equal to one. this happens when the combination of two assets reveals to offset what makes them a diversifier to a third asset. Even though diversification is often expected to reduce correlation with a reference asset, this example shows that under specific configurations, it can actually increase it.
2.1.2 Comments
For more background on these types of problems, refer to the Khan Academy (Khan Academy n.d.) or any of the many online resources available.