Before Stochastic Differential Equations

17 minute read

Published: October 06, 2025

When I first read Øksendal’s Stochastic Differential Equations, I kept having to backtrack to piece together various concepts. Here’s what I wish I’d reviewed beforehand, it’ll make the book much more approachable.

Random Variables

Given a probability space $(\Omega, \mathcal{F}, P)$, a random variable is a measurable mapping $X: \Omega \to \mathbb{R}^n$ such that for all $B \in \mathcal{B}_n$ (the Borel σ-algebra on $\mathbb{R}^n$), we have:

\[X^{-1}(B) \in \mathcal{F}\]

In some texts, the term “random variable” is reserved for mappings $X: \Omega \to \mathbb{R}$, while “random vector” is used for mappings to $\mathbb{R}^n$ with $n > 1$. We will not make this distinction here and use “random variable” for both cases.

Note: If $X = (X_1, X_2, \ldots, X_n): \Omega \to \mathbb{R}^n$ is a random variable, then each component $X_i: \Omega \to \mathbb{R}$ is itself a random variable.

This is possible because every linear transformation is Borel measurable, and if $X$ is Borel measurable, then the composition is Borel measurable.

In fact, the converse is true as well, stacking two random variables will give us another random variable.

σ-Algebra Generated by Random Variables

The σ-algebra generated by a random variable $X$, denoted $\sigma(X)$, is the smallest σ-algebra on $\Omega$ that makes $X$ measurable. Formally:

\[\sigma(X) = \{X^{-1}(B) : B \in \mathcal{B}_n\}\]

This σ-algebra captures all the information about $X$ that can be observed. Knowledge of $\sigma(X)$ is sufficient to determine the distribution of $X$, including all moments, probabilities, and other distributional properties.

Generated σ-Algebras and Preimages

For any function $f$ (could be non-measurable) mapping from $X$ to $Y$ of measurable spaces $(X, \mathcal{S})$, $(Y, \mathcal{T})$:

\[\sigma(f) = \{f^{-1}(B) : B \in \mathcal{T}\}\]

is the same as:

\[\sigma(\{f^{-1}(B) : B \in \mathcal{G}\})\]

where $\sigma(\mathcal{G}) = \mathcal{T}$.

The σ-algebra generated by $f$ (all preimages of measurable sets in $\mathcal{T}$) equals the σ-algebra generated by just the preimages of a generating set $\mathcal{G}$ for $\mathcal{T}$.

Notation: $\sigma(\text{generating set})$ denotes the σ-algebra generated by that set.

This allows us more flexibility with how we define the σ-algebra generated by a random variable. Typically we map to Borel σ-algebra in Euclidean space, which is a product space, and we can use the set of rectangles in the case of product spaces as the generating sets. Further, we can even consider the set of half-boxes (intervals) as generating sets.

Alternative Representations

In conclusion, for a random variable $X: \Omega \to \mathbb{R}^n$:

\[\sigma(X) = \{X^{-1}(B): B \in \mathcal{B}_n\}\]

or equivalently:

\[\sigma(X) = \sigma(\{X^{-1}(B_1 \times B_2 \times \cdots \times B_n): B_i \in \mathcal{B}_1 \text{ (Borel in } \mathbb{R})\})\]

or:

\[\sigma(X) = \sigma(\{X^{-1}((-\infty, b_1] \times (-\infty, b_2] \times \cdots \times (-\infty, b_n]): b_i \in \mathbb{R}\})\]

Since we can break the random variables into the components $X = (X_1, X_2, \ldots, X_n)$ where $X_i: \Omega \to \mathbb{R}$ are the component random variables, we can write:

\[X^{-1}(B_1 \times B_2 \times \cdots \times B_n) = X_1^{-1}(B_1) \cap X_2^{-1}(B_2) \cap \cdots \cap X_n^{-1}(B_n)\]

Therefore:

\[\sigma(X) = \sigma\left(\{X_1^{-1}(B_1) \cap X_2^{-1}(B_2) \cap \cdots \cap X_n^{-1}(B_n): B_i \in \mathcal{B}_1\}\right)\]

which interestingly equals:

\[\sigma(X) = \sigma\left(\bigcup_{i=1}^{n} \{X_i^{-1}(B): B \in \mathcal{B}_1\}\right) = \sigma\left(\bigcup_{i=1}^{n} \sigma(X_i)\right)\]

Collections of Random Variables

Finite Collections

For any finite collection of random variables ${X_1, X_2, \ldots, X_n}$, we can treat them as a single random variable $X = (X_1, \ldots, X_n): \Omega \to \mathbb{R}^n$, and the σ-algebra generated by the collection is:

\[\sigma(X_1, X_2, \ldots, X_n) = \sigma(X)\]

Infinite Collections and Cylinder Sets

When the index set is not finite (for example, in a continuous stochastic process where we take $[0, \infty)$ as the time index), we need a different approach. Consider a collection of random variables ${X_t : t \in T}$ where $T$ is an arbitrary (possibly uncountable) index set.

The σ-algebra generated by this collection is:

\[\sigma(\{X_t : t \in T\}) = \sigma\left(\bigcup_{t \in T} \sigma(X_t)\right)\]

However, working directly with this definition is often difficult. Instead, we use cylinder sets as a generating set.

Cylinder Sets: For a finite subset ${t_1, t_2, \ldots, t_n} \subseteq T$ and Borel sets $B_1, B_2, \ldots, B_n \in \mathcal{B}_n$, a cylinder set is defined as:

\[C_{t_1, \ldots, t_n}(B_1, \ldots, B_n) = \{\omega \in \Omega : X_{t_1}(\omega) \in B_1, X_{t_2}(\omega) \in B_2, \ldots, X_{t_n}(\omega) \in B_n\}\] \[= X_{t_1}^{-1}(B_1) \cap X_{t_2}^{-1}(B_2) \cap \cdots \cap X_{t_n}^{-1}(B_n)\]

The collection of all cylinder sets forms a generating set for $\sigma({X_t : t \in T})$:

\[\sigma(\{X_t : t \in T\}) = \sigma(\{\text{all cylinder sets}\})\]

More precisely:

\[\sigma(\{X_t : t \in T\}) = \sigma\left(\left\{\bigcap_{i=1}^{n} X_{t_i}^{-1}(B_i) : n \in \mathbb{N}, t_1, \ldots, t_n \in T, B_i \in \mathcal{B}_n\right\}\right)\]

Doob-Dynkin Lemma

Since we have had a detailed conversation regarding the sigma algebra generated by a random variable. Its natural to consider what if another random variable is measurable with respect to it.

Let $(\Omega, F, P)$ be a probability space, and let $X: \Omega \to S$ and $Y: \Omega \to T$ be random variables, where $S$ and $T$ are measurable spaces.

The following are equivalent:

Y is $\sigma(X)$-measurable, i.e., Y is measurable with respect to $\sigma(X)$
$Y = f(X)$ for some measurable function $f: S \to T$

In other words: Y is a measurable function of X if and only if Y is measurable with respect to the σ-algebra generated by X.

Stochastic Process

A stochastic process is a collection of random variables ${X_t : t \in T}$ defined on a probability space $(\Omega, \mathcal{F}, P)$ and indexed by a set $T$. The index set $T$ is typically interpreted as time and can be:

Discrete: $T = {0, 1, 2, \ldots}$ or $T = \mathbb{Z}$
Continuous: $T = [0, \infty)$ or $T = \mathbb{R}$

For each fixed $t \in T$, $X_t: \Omega \to \mathbb{R}$ (or $\mathbb{R}^n$) is a random variable. For each fixed $\omega \in \Omega$, the mapping $t \mapsto X_t(\omega)$ is called a sample path or trajectory of the process.

Note that if we consider Borel sets in range $[0,\infty)$ (denoted $\mathcal{B}_T$) for $T$, then $X(t,\omega) = X_t(\omega)$ is a product space measurable map, and hence the stochastic process is simply a random variable from $(T \times \Omega, \mathcal{B}_T \otimes \mathcal{F}, P’)$.

In practice, we are not concerned P’ or for that matter P is. This is because while we are tempted to model any real-world phenomenon as a stochastic process, we should recognize that we often don’t explicitly know the underlying sample space $\Omega$. In practice, we work with the observed values of the process and their distributions, rather than the abstract probability space itself.

The Trajectory Space

As discussed above, modeling a real world phenomenon as a stochastic process directly may not be feasible. The sample space is hard to define and especially challenging in the case the range of the variable is uncountable. All we can assume is that we can make observation at various time steps. As making infinite observations is also kind of impractical. By observation, I meant for any time index $t$, and associated random variable, we can see $X_t(w)$ for an unknown $w$.

By fixing a $\omega \in \Omega$, we can create a trajectory function $\phi_\omega: T \to \mathbb{R}$ defined by:

\[\phi_\omega(t) = X_t(\omega)\]

Since, we don’t know what the sample space is, we can consider $(\mathbb{R}^N)^T$ as the set of trajectories (functions mapping from $T \to \mathbb{R}^N$) as a canonical sample space. The reason behind this is no matter what the sample space is, we can transfer the modelling to the canonical space.

Also notice that the mapping $w \to \phi_w $ need not be one to one as two different sample points may result in same observations. For example, in a die throw, let the random variables be 1 when even number is facing upwards, and 0 when its odd. One might argue that in case the probability mass is not uniform and observing “2” maybe more likely than “4”, are we modelling this incorrectly?

The answer is no. What matters is not the original sample space $\Omega$, but the induced probability measure on the canonical space $(\mathbb{R}^N)^T$. Since, we dont get to “observe” whether its 2 or 4. It’s like someone else sees the experiment and then tells us whether its even or odd. All we can observe is the random variable $X(w)$! This is actually a good thing, we dont have to truly understand the nature’s generative process in order to approximately generate from it.

The next immediate thing to ask is what is the sigma algebra for the canonical space.

Well, for any random variable $X$, as long as we know the $\sigma(X)$ and the probability measure on it, no information is lost. So whatever, sigma algbera we consider must atleast contain $\sigma(X)$.

If we did know the sample space, then the sigma $\sigma(X)$ is simply given by using the generating set

\[G = \left(\left\{\bigcap_{i=1}^{n} X_{t_i}^{-1}(B_i) : n \in \mathbb{N}, t_1, \ldots, t_n \in T, B_i \in \mathcal{B}_n\right\}\right)\]

Where

\[X_{t_i}^{-1}(B_i) = \{ w : X_{t_i}(w) \in B_i \}\]

In the trajectory space the corresponding set is contained by ${ \phi : \phi(t_i) \in B_i }$

Hence

\[G' = \left\{\bigcap_{i=1}^{n} \{\phi : \phi(t_i) \in B_i\} : n \in \mathbb{N}, t_1, \ldots, t_n \in T, B_i \in \mathcal{B}_n\right\}\]

where $\mathcal{B}_n$ denotes the Borel σ-algebra on $\mathbb{R}^N$ for the vector-valued process.

Kolmogrov’s extension theorem

If we look at a finite collection of the random variables ${X_{t_1}, \ldots, X_{t_k} : k \in \mathbb{N}}$, the push forward probability measure is of the form:

\[\mu_{t_1, \ldots, t_k}(B_1 \times \cdots \times B_k) = P(\omega : X_{t_1}(\omega) \in B_1, \ldots, X_{t_k}(\omega) \in B_k)\]

where $B_i \subseteq \mathbb{R}^n$ are Borel sets, and this defines a probability measure on $(\mathbb{R}^n)^k$ with the Borel σ-algebra.

It’s easy to define a probability measure on a finite collection of random variables, as a function over rectangles which could be extended using the Carathéodory extension theorem to $\mathcal{B}((\mathbb{R}^n)^k)$. So, we would desire a sort of converse: the existence of a stochastic process on some abstract space given any family of finite collections measures.

In other words, can we “glue together” a family of finite-dimensional distributions to construct an actual stochastic process?

The answer is yes, under some conditions we can and that’s what the theorem says.

Consistency Conditions

Consistency Conditions

For the family ${\mu_{t₁,…,tₖ}}$ to come from a stochastic process, it must satisfy two natural consistency conditions:

Consistency Condition 1 (Symmetry/Permutation Invariance):

For any permutation σ of {1, 2, …, k}:

\[\mu_{t_{\sigma(1)},...,t_{\sigma(k)}}(B_{\sigma(1)} \times \cdots \times B_{\sigma(k)}) = \mu_{t_1,...,t_k}(B_1 \times \cdots \times B_k)\]

The order in which we list the random variables shouldn’t matter. If we look at $(X_{t_1}, X_{t_2}, X_{t_3})$ versus $(X_{t_2}, X_{t_1}, X_{t_3})$, we’re looking at the same random vector, just with coordinates permuted.

Consistency Condition 2 (Marginalization/Projection):

For any $t_1, …, t_k, t_{k+1}, …, t_{k+m} \in T$:

\[\mu_{t_1,...,t_k}(B_1 \times \cdots \times B_k) = \mu_{t_1,...,t_k,t_{k+1},...,t_{k+m}}(B_1 \times \cdots \times B_k \times \underbrace{\mathbb{R}^n \times \cdots \times \mathbb{R}^n}_{m \text{ times}})\]

If we know the joint distribution of $(X_{t_1}, …, X_{t_k}, X_{t_{k+1}})$, we should be able to recover the distribution of $(X_{t_1}, …, X_{t_k})$ by “ignoring” or “marginalizing out” $X_{t_{k+1}}$. This is just the standard notion of marginal distributions.

Kolmogorov’s Extension Theorem

Theorem: Let $T$ be an arbitrary index set and let $\mathbb{R}^n$ be equipped with its Borel σ-algebra. For each finite subset ${t_1, \ldots, t_k} \subseteq T$, suppose we are given a probability measure $\mu_{t_1, \ldots, t_k}$ on $(\mathbb{R}^n)^k$ such that the consistency conditions (1) and (2) above hold.

Then there exists a probability space $(\Omega, \mathcal{F}, P)$ and a stochastic process ${X_t}_{t \in T}$ with $X_t : \Omega \to \mathbb{R}^n$ such that:

\[P(X_{t_1} \in B_1, \ldots, X_{t_k} \in B_k) = \mu_{t_1, \ldots, t_k}(B_1 \times \cdots \times B_k)\]

for all finite subsets ${t_1, \ldots, t_k} \subseteq T$ and all Borel sets $B_i \subseteq \mathbb{R}^n$.

Filtrations and Martingales

Let’s take a detour and first discuss a few key concepts from measure theory.

Absolute Continuity: Let $(\Omega, \mathcal{F})$ be a measurable space. Let measures $P$ and $Q$ be defined on the space. If $P(A) = 0 \implies Q(A) = 0$, then $Q$ is said to be absolutely continuous with respect to $P$, and we write $Q \ll P$.

An equivalent statement which looks at the limiting behavior:

For all $\epsilon > 0$, there exists $\delta > 0$ such that $P(A) < \delta \implies Q(A) < \epsilon$.

This tells us that if one measure is integral of another then its absolutely continous. If $Q$ can be written as an integral with respect to $P$, i.e., there exists a measurable function $f : \Omega \to [0, \infty)$ such that

\[Q(A) = \int_A f \, dP\]

for all $A \in \mathcal{F}$, then $Q \ll P$ (i.e., $Q$ is absolutely continuous with respect to $P$).

The converse of this statment which is not imediately obivous is known as Raydon-nykodym theorem.

Radon-Nikodym Theorem: Let $(\Omega, \mathcal{F})$ be a measurable space and let $P$ and $Q$ be σ-finite measures on $(\Omega, \mathcal{F})$. If $Q \ll P$, then there exists a measurable function $f : \Omega \to [0, \infty)$ such that

\[Q(A) = \int_A f \, dP\]

for all $A \in \mathcal{F}$. The function $f$ is called the Radon-Nikodym derivative of $Q$ with respect to $P$, denoted $f = \frac{dQ}{dP}$.

Intuition: To see this, we can take any finite partition of $\Omega$ and define a simple function as a sum where the constants are ratios of the measures:

\[f_n = \sum_{i=1}^{n} \frac{Q(A_i)}{P(A_i)} \mathbf{1}_{A_i}\]

where ${A_i}$ is a partition of $\Omega$. Its easy to see over any set which is union of subsets of partitions, the integral equation holds true. As we refine the partitions to enclose any arbitrary subset in the limit, these simple functions converge to the Radon-Nikodym derivative $\frac{dQ}{dP}$.

If two measures are concentrated on different disjoint subsets then they are mutally singular.

Mutual Singularity: If two measures are concentrated on different disjoint subsets, then they are mutually singular. We write $P \perp Q$ if there exist disjoint sets $A, B \in \mathcal{F}$ with $A \cup B = \Omega$ such that $P(A) = Q(B) = 0$.

Lebesgue Decomposition Theorem: Any two σ-finite measures $P$ and $Q$ on the same measurable space $(\Omega, \mathcal{F})$ can be uniquely decomposed as:

\[Q = Q_{ac} + Q_s\]

where $Q_{ac} \ll P$ (the absolutely continuous part) and $Q_s \perp P$ (the singular part).

Continuous Random Variable: A random variable $X : \Omega \to \mathbb{R}^n$ is called continuous if its distribution has a probability density function (PDF). That is, there exists a non-negative measurable function $f_X : \mathbb{R}^n \to [0, \infty)$ such that

\[P(X \in A) = \int_A f_X(x) \, dx\]

for all Borel sets $A \subseteq \mathbb{R}^n$. Equivalently, the distribution of $X$ is absolutely continuous with respect to the Lebesgue measure on $\mathbb{R}^n$, and $f_X = \frac{dP_X}{dm_n}$ is the Radon-Nikodym derivative.

Conditional Expectation Relative to a σ-field

Expected Value for $\mathbb{R}^n$-valued Random Variables: When $X : \Omega \to \mathbb{R}^n$ is a random variable with $X = (X_1, \ldots, X_n)$, the expected value is defined component-wise:

\[E[X] = (E[X_1], \ldots, E[X_n]) \in \mathbb{R}^n\]

where each component $E[X_i] = \int_{\Omega} X_i \, dP$ is the usual expected value of a real-valued random variable.

Equivalently, we can write:

\[E[X] = \int_{\Omega} X \, dP\]

where the integral is understood in the Bochner sense (integrating vector-valued functions). The random variable $X$ is integrable if $E[|X|] = \int_{\Omega} |X| \, dP < \infty$, where $|\cdot|$ is the Euclidean norm on $\mathbb{R}^n$

For a probability space $(\Omega, \mathcal{F}, P)$, we can select any sub-σ-field $\mathcal{G} \subseteq \mathcal{F}$ (a sub-collection that is itself a σ-field).

Given an integrable random variable $X : \Omega \to \mathbb{R}^n$ (i.e., $E[|X|] < \infty$), the conditional expectation of $X$ given $\mathcal{G}$, denoted $E[X \mid \mathcal{G}]$, is a $\mathcal{G}$-measurable random variable taking values in $\mathbb{R}^n$ that satisfies:

\[\int_A E[X \mid \mathcal{G}] \, dP = \int_A X \, dP\]

for all $A \in \mathcal{G}$. Both sides are vectors in $\mathbb{R}^n$, and the equality holds component-wise.

For each component $X_i$, define a signed measure on $(\Omega, \mathcal{G})$ by:

\[\nu_i(A) = \int_A X_i \, dP \quad \text{for all } A \in \mathcal{G}\]

Since $E[|X_i|] < \infty$, we can decompose $X_i = X_i^+ - X_i^-$ where $X_i^+, X_i^- \geq 0$. Then $\nu_i = \nu_i^+ - \nu_i^-$ where both $\nu_i^+$ and $\nu_i^-$ are finite measures that are absolutely continuous with respect to the restriction of $P$ to $\mathcal{G}$.

By the Radon-Nikodym theorem, there exist $\mathcal{G}$-measurable functions $f_i^+$ and $f_i^-$ such that:

\[\nu_i^+(A) = \int_A f_i^+ \, dP \quad \text{and} \quad \nu_i^-(A) = \int_A f_i^- \, dP\]

Setting $f_i = f_i^+ - f_i^-$, we have $\nu_i(A) = \int_A f_i \, dP$ for all $A \in \mathcal{G}$.

The conditional expectation is then $E[X \mid \mathcal{G}] = (f_1, \ldots, f_n)$, which is $\mathcal{G}$-measurable and satisfies the defining property. Uniqueness follows from the fact that if two $\mathcal{G}$-measurable functions agree on all sets in $\mathcal{G}$, they are equal almost surely.

[Pending]

Twitter Facebook LinkedIn