Data Science Applications to Astronomy
Week 9: Model Building IV:
Classification
Announcements
Penn State AI Week
Events: April 14-17, 2025
Student poster submission deadline: Monday, March 31, 2025
Classification
Why would an astronomer want to classify things?
Binary classification
Input Data: \(x_i\):
Label for data: \(Y_i\) (0 or 1)
Predicted catebory: \(\hat{Y}_i\):
Object id: i=1...\(N_{\mathrm{obj}}\):
begin
regen_data
n = 100
m = 10
df_blue = DataFrame(repeat([1 1 ],n) .+ randn(n,2), [:x,:y] )
df_red = DataFrame(repeat([ -1 -1], m) .+ randn(m,2), [:x,:y] )
df_blue.label = ones(Int8,n)
df_red.label = zeros(Int8, m)
df = vcat(df_blue, df_red)
end;
linear_classifer_boundary(x) = β0 + β1*x
linear_classifer_boundary (generic function with 1 method)
β0:
| True Blue | True Red | |
|---|---|---|
| Predicited Blue | 84 | 16 |
| Predicited Red | 0 | 10 |
What is suboptimal about the following loss function?
$$\mathrm{loss}_{\mathrm{class}}(\theta) = \frac{1}{N_{\mathrm{targ}}} \sum_{i=1}^{N_{\mathrm{targ}}} \left[1-\delta(Y_i,\hat{Y}_i(\theta))\right] = (1-\mathrm{accuracy})$$
Relaxing the outputs
General: \(\hat{y}_i(\beta) = f(x_i)\)
Generalized Linear Model: \(\hat{y}_i(\beta) = f(\beta \cdot x_i)\)
Logistic Regression: \(\hat{y}_i(\beta) = f(\beta \cdot x_i)\), where \(f(z) = \frac{1}{1+\exp\left(-z\right)}\)
Logistic Regression Likelihood
$$L(\beta) = \prod_{i:\; Y_i=1} \hat{y}(\beta)_i \prod_{i:\; Y_i=0} (1-\hat{y}_i(\beta))$$
$$\mathrm{loss}(\beta) = \sum_{i=1}^{N_{obj}} \left[ Y_i \ln(\hat{y}_i(\beta)) + (1-Y_i) \ln(1-\hat{y}_i(\beta)) \right]$$
Common activation functions
logistic(x) = inv(1+exp(-x))
logistic (generic function with 1 method)
relu(x) = x > zero(x) ? x : zero(x)
relu (generic function with 1 method)
leaky_relu(x; α::Real = 1e-3) = x > zero(x) ? x : α*x
leaky_relu (generic function with 1 method)
Logistic:
Set a threshold
E.g., \(f(x) > \frac{1}{2}\) for binary classification
Can change threshold based on purpose of the classifier
Evaluating Classifiers
Contingency Table/ Confusion Matrix
E.g., if goal of classifier is to label stars with a certain type of planet:
| Truth: Planet | Truth: No Palnet | |
|---|---|---|
| Detected Planet | True Positives | False Positives |
| Detected No Planet | False Negatives | True Negatives |
Abbreviations:
| True yes | True no | |
|---|---|---|
| Predicited Yes | TP | FP |
| Predicited No | FN | TN |
Many terms to describe performance characteristics
Accuracy: (TP+TN)/(TP+TN+FP+FN)
Positive Predictive Value (Precision): TP/(TP+FP)
True Postive Rate (Recall, Sensitivty): TP/(TP+FN)
True Negative Rate (Specificity): TN/(TN+FP)
False discovery rate: FP/(TP+FP)
False omission rate: FN/(TN+FN)
Complications
Unbalanced datasets
Different cost/reward for FPs vs FNs
Measurement uncertainties
Imperfect labels
Missing labels
Unlabled data
Non-linear classification boundaries
Non-linear Classification
Interior/Exterior sets
β0:
Viewing angles:
Choosing the transformation
Distance from point at (x,y), where x:
linear_classifer2_boundary (generic function with 1 method)
XOR
Choosing a transformation
β0:
Viewing angles:
linear_classifer3_boundary (generic function with 1 method)
Other Important Classification Algorithms
Support Vector Machines & the Kernel trick
Provide computationally efficient ways to construct non-linear classifiers
Neural Networks
Replace \(f(i,k)\) with a more complicated function.
A basic neural networks is just the composition of an activation of a linear functions of inputs, several or many, many times!
Lots of AI is research in desiging neural network architectures that are particularly well-suited to a common type of problem (e.g., images, video, audio, text)
Credit: The image above is by Glosser.ca, CC BY-SA 3.0, via Wikimedia Commons, original source
Multi-category classification
Input Data: \(x_i\):
Label for data: \(Y_i\) (integer)
Convert labels to one hot encoding: \(y_{i,k}\) (each 0 or 1)
Object id: i=1...\(N_{\mathrm{obj}}\)
Category ids: k=1...K:
Generalize linear model for multi-category classification:
$$f(i,k) = \beta_k \cdot x_i$$
$$\mathrm{loss}(\beta) = - \sum_{i=1}^{N_{obj}} \sum_{k=1}^K y_{i,k} \ln(\hat{y}_{i,k}(\beta))$$
$$\mathrm{Pr}(Y_i=k) = \frac{e^{f(i,k)}}{1+\sum_{j=1}^K e^{f(i,j)}}$$
Predicted category:
$$\hat{Y}_i = k \; \; \mathrm{s.t.} \; \; \mathrm{Pr}(Y_i=k) > \mathrm{Pr}(Y_i=k')$$
Setup
Functions used
plot_classifier (generic function with 1 method)
glm_accuracy (generic function with 1 method)
classify (generic function with 1 method)
Built with Julia 1.11.5 and
DataFrames 1.7.0GLM 1.9.0
MLBase 0.9.2
Plots 1.40.9
PlutoTeachingTools 0.3.1
PlutoUI 0.7.61
SpecialFunctions 2.5.0
To run this tutorial locally, download this file and open it with Pluto.jl.