Data Science Applications to Astronomy

Week 3: Model Building I

Probability

Terminology

We'll use words that sound familiar, but have technical meanings distinct from their everyday usage. E.g.,

  • Model

  • Likelihood

Model

Scientific Model vs Statistical Model

  • What are the differences?

Scientific Model

  • Describes what's happening, irrespective of our measurements

  • Can be extremely well tested or highly uncertain

Statistical Model

  • Describes process for generating data

  • Inevitably includes assumptions

  • Reecognizes that uncertainy is inevitable

Think of example sources of uncertainty

  • pause

Example Scientific Models

  • Newton's equation of motion & gravitation

  • Keplerian motion of planets

  • Salpeter initial mass function for stars

  • ΛCMD model for growth of large scale structure

Example Statistical Models

  • Gaussian or Normal distribution for measurement uncertainties

  • Poisson distribution for number of photons hitting detector given expected photon rate

  • Binomial distribution for number of "successes" given number of trials and probability of success

All models are wrong, but some are useful.

What is a (good) model useful for?

  • Making predictions

  • Incorporating new data to refine one's knowledge

  • Planning future science

Why are all models wrong?

  • pause

Probability

  • Multiple definitions and use cases

  • Surprisingly deep philosophical question

  • We'll adopt a Bayesian interpretation of probability

  • We'll start with the Objective Bayesian interpretation of probability

  • Later we'll discuss the Subjective Bayesian interpretation

Objective Bayesian

  • Formalization of quantiative logic

  • Results of statistical inference depend only on the model and the data

Probability Function

  • Assigns a weight to each possible event/outcome of an experiment or observation.

Probability Mass Function

  • For discrete events/outcomes

  • Examples: (pause to ask class)

Examples of Probability Mass Functions

  • Flip coin: heads (H) or tails (T). P(H) = P(T) = ½

  • Roll an n-sided die: P(V) = \(\frac{1}{n}\)

  • Inspect locations a globe asking land or water:

$$p(\mathrm{Water}) = w$$

$$p(\mathrm{Land}) = 1-w$$

Axioms of Probability

  • Non-negativity: \(p(E)\ge0\)

  • Sum to unity: \(\sum_i p(E_i) = 1\) for complete list of possible events

  • Sum of probabilities of mutually exclusive events: \(p(\bigcup_i E_i) = \sum_i p(E_i)\)

Probability Density Function (PDF)

  • Allows for continuous events/outcomes

  • Examples: (ask class)

Examples of Probability Density Functions

  • Fraction of a planet covered with water: \(p(w)\)

  • Mass of a star: \(p(M_\star)\)

  • Luminosity of a star: \(p(L_\star)\)

  • Value of future measurement of flux from a star: \(p(f_{\star,obs})\)

Probability function is about the state of knowledge.

It is not an intrinsic property of the system.

Likelihood function (\(\mathcal{L(\theta)}\))

  • Probability distribution for the outcome of an experiment/observation that was made in the past, ignoring any current knowledge about the actual outcome.

  • A likelihood is a probability density function.

  • Not all PDFs are likelihoods.

  • Likelihood function assumes a statistical model (commonly written \(\mathcal{M}\)).

  • Models often have parameters (commonly written as \(\theta\))

  • Likelihood function often written: \(\mathcal{L}(\theta)\) or \(\mathcal{L_M}(\theta)\)

Scientists often perform inference based on \(\mathcal{L(\theta)}\)

  • E.g., Maximum Likelihood Estimate

$$\hat{\theta}_{\mathrm{MLE}} = \mathrm{argmax}_\theta \, \mathcal{L(\theta)}$$

  • Confidence Interval

$$[\theta_l,\theta_u] = \cup \theta \; \mathrm{s.t.} \; \mathcal{L(\theta)} \le \mathcal{L}(\hat{\theta}_{\mathrm{MLE}}) + \Delta\mathcal{L}_{\mathrm{thresh}}$$

Tip
  • The MLE is useful tool for your toolbox.

  • For some models there are very efficienct methods for computing \(\hat{\theta}\).

  • However, it is just a point estimate.

  • Be careful when many parameters, few observations, large uncertainties.

  • We'll use MLE as a short cut or starting point for more rigorous analysis.

Tip

In an ideal world, we'd use a Bayesian approach to uncertainty quantification. But confidence intervals are used commonly enough that we at least need to understand what they are (and what they aren't).

Conditional Probability

  • Consider two events: \(A\) & \(B\)

  • Let each event have its own probability function: \(p(A)\) & \(p(B)\)

  • Joint probabilty of both events: \(p(A,B)\)

  • Conditional probability of \(A\) given \(B\): \(p(A|B)\)

  • Axiom of Conditional Probability: \(p(A,B) = p(A) \, p(B|A)\)

  • Alternatively could write Conditional probability of \(B\) given \(A\): \(p(B|A) = p(B) \, p(A|B)\)

– Image Credit: WikipediaCC BY-SA 3.0

Bayes Theorem

Start with Law of Conditional Probability:

$$p(A,B) = p(A) p(B|A) = p(B) p(A|B)$$

OR we might explicitly writeout the asumption of using model \(M\):

$$p(A,B|M) = p(A|M) p(B|A,M) = p(B|M) p(A|B,M)$$

Can compute the marginal probabilities using law of total (conditional) probability:

  • Discrete case:

$$p(A) = \sum_{i} p(B_i) p(A|B_i)$$

$$p(B) = \sum_{i} p(A_i) p(B|A_i)$$

  • Continuous case:

$$p(A ) = \int dB \, p(B) p(A|B)$$

$$p(B ) = \int dA \, p(A) p(B|A)$$

To derive Baye's Theorem, substitute:

  • A → \(\theta\): model parameters

  • B → \(D\): observed data

  • Prior PDF for model parameters \(p(\theta|M)\) for \(p(A)\)

  • Likelihood for observed data given model \(\mathcal{L} = p(D|\theta, M)\) for \(p(B)\)

into \(p(D, \theta, M )\), the joint probability for both model parameters and observed data given the model \(M\).

$$p(\theta, D, M) = p(\theta | M) p(D | \theta, M ) = p(D | M ) p( \theta| M, D)$$

Solve for \(p( \theta| M, D)\):

$$p( \theta| M, D) = \frac{p(\theta | M) p(D | \theta, M )}{p(D | M )}$$

Interpretation of Bayes Theorem:

$$\mathrm{(Posterior)} = \mathrm{(Prior)} \times \mathrm{(Likelihood)} / \mathrm{(Evidence)}$$

Evidence term is often hard to calculate

  • Evidence is sometimes refered to as the "fully marginalized likelihood"

Discrete case:

$$p(D | M ) = \sum_{i} p(\theta_i|M) p(D|\theta_i,M)$$

Continuous case:

$$p(D | M ) = \int d\theta \, p(\theta|M) p(D|\theta,M)$$

Bayesian Inference

  • Bayesian inference about model parameters is based on the posterior probability distribution.

  • Ideally, one uses the full posterior distribution.

  • Often, is can be helpful to summarize the posterior:

    • Approximate posterior distribution with a Normal Distribution

    • Maximunm a posteriori Estimator (MAP): \(\hat{\theta}_{\mathrm{MAP}} = \mathrm{argmax}_\theta \, p(\theta ) \, \mathcal{L(\theta )}\)

    • Fischer Information Matrix: Curvature of \(p(\theta|D,M)\) evaluated at MAP

$$I_{i,j} = \left. \frac{\partial}{\partial \theta_i} \frac{\partial}{\partial \theta_j} \log p(\theta|D,M) \right|_{\hat{\theta}_{MAP}}$$

  • For many cases if you have a large, high-quality datasets and a model with few parameters, then the posterior distribution may be well approximated by a Gaussian Distribution, and \(\hat{\theta}_{\mathrm{MAP}} \approx \hat{\theta}_{\mathrm{MLE}}\).

  • However, there are also plenty of times when:

    • MAP & MLE differ

    • MAP & MLE throws away infortant information about uncertainties.

    • Posterior might not be well approximated by a Gaussian (e.g., skewed, not unimodal).

  • We want to build a toolbox that can work in both regimes.

Simplified Bayesian Model Building

Steps

  1. Develop a statistical model describing how your data are generated. This dictates the likelihood function and model parameters.

  2. Choose prior distributions

  3. Perform computations to estimate (or sample from) the posterior distribution.

  4. Make decisions based on posterior distribution.

See Example Applications in Labs

Setup & Helper Functions

Axioms of Conditional Probability

$$p(A|B)\ge0, \; \; \mathrm{if} \, p(B)>0$$

.

$$\sum_i p(A_i|B) = p(B)$$

$$p(\bigcup_i A_i | B) = \sum_i p(A_i | B)$$

Built with Julia 1.11.5 and

PlutoTeachingTools 0.3.1
PlutoUI 0.7.60

To run this tutorial locally, download this file and open it with Pluto.jl.