Data Science Applications to Astronomy

Week 7: Project Questions +:

(Model Building III: Cross Validation)

Logistics

Tip

Remember MSEEQ window closing soon.

Project Questions

Question:

For the project, how do we link the git repo containing our code to ROAR?

I created a starter project repository.

Question:

What level of interactivity is required for the final dashboard?

Minimum:
- Allow use to choose which data to be analyzed
Suggested:
- Allow users to adjust initial guesses for itterative fitting (if relevant)
Optional (as makes sense for your project):
- Allow user to change parameter(s)
- Allow users to adjust visualization(s)

Question:

How should we physically set up our dashboard so that it's easiest for us to work on it as a group?

Strongly recommend:

Organize code into small functions.
When initially writting functions, it can be convenient to write them in Pluto.
Once you have a working version, move functions into separate file(s).
Use a Pluto notebook to load/call functions stored in files and dispaly dashboard.

Optionally:

Organize your file(s) in src directory, so you create a Julia "package".
I find it's best to organize my file(s) in a src directory and to create a Project.toml, so I create an unregistered Julia "package".
The @ingredients macro is helpful for having Pluto call code in an unregistered package that I'm developing. (See example in Lab 2 or project template reo.)

Overview of Workflow for Assessing Models

Validate methods using simulated data
Validate computation
Compare Likelihood to expected distribution
Sensitivity to individual data points
Sensitivity to assumptions (e.g., prior distributions)
Check Predictive distribution
Cross validation

Numerical experiments

Simualted dataset & best-fit model

$$y_{true} = \sum_{i=0}^{1} a_i x^i$$

Order of polynominal to generate data:
a₀: a₁: a₂: a₃:
Number of observations: Number of outliers:
Measurement uncertainty:

Now let's repeat that analysis many times and compare the MLE estimates for each parameter with the true values.

Number of simulated datasets:

Step 7: Perform Cross validation

Previously, we fit a model based on all of our observations and then asked if the resulting model made good "postdictions". This is a relatively weak test.
A stronger test would be to see if the model can make good predictions for new data.
Often, acquiring new data is very time consuming/expensive.
This motivates a more powerful method for assessing models: cross validation (CV).

Cross validation of a single model

Divide data into two subsets (e.g., 75% "training", 25% "validation")
Fit data ("train model") using only the training set
Use resulting parameters to make predictions for validation set
How well did the model do? (evaluate "loss function")

Cross validation when considering multiple models

Divide data into three subsets (e.g., 60% "training", 20% "validation", 20% "test")
For each model you want to consider:
- Fit/Train using only data from training set
- Validate using only data from validation set
Pick which model you'll use for decissions/publication:
Evaluate predictions for test set.
Report results on test data.
Avoid temptation to make further "improvements" at the point.

Order of polynominal to use for cross validation:

Question:

Any Questions?

Regularizaton

Project Questions

Helper Code

Full Width Mode

generate_data (generic function with 1 method)

generate_θ_fit_distribution (generic function with 1 method)

plot_parameter_histograms (generic function with 1 method)

Built with Julia 1.11.5 and

LaTeXStrings 1.4.0
LinearAlgebra 1.11.0
MLUtils 0.4.5
PDMats 0.11.32
Plots 1.40.9
PlutoTeachingTools 0.3.1
PlutoUI 0.7.61
Statistics 1.11.1

To run this tutorial locally, download this file and open it with Pluto.jl.

Astro 416

Data Science Applications to Astronomy

Week 7: Project Questions +:

(Model Building III: Cross Validation)

Logistics

Project Questions

Strongly recommend:

Optionally:

Overview of Workflow for Assessing Models

Numerical experiments

Simualted dataset & best-fit model

Step 7: Perform Cross validation

Cross validation of a single model

Cross validation when considering multiple models

Regularizaton

Project Questions

Helper Code