Data Science Applications to Astronomy

Week 9: Model Building IV:

Classification

Announcements

Penn State AI Week

Events: April 14-17, 2025
Student poster submission deadline: Monday, March 31, 2025

Classification

Why would an astronomer want to classify things?

Binary classification

Input Data: $x_i$:
Label for data: $Y_i$ (0 or 1)
Predicted catebory: $\hat{Y}_i$:
Object id: i=1...$N_{\mathrm{obj}}$:

begin
    regen_data
    n = 100
    m = 10
    df_blue = DataFrame(repeat([1 1 ],n) .+ randn(n,2), [:x,:y] )
    df_red = DataFrame(repeat([ -1 -1], m) .+ randn(m,2), [:x,:y] )
    df_blue.label = ones(Int8,n)
    df_red.label = zeros(Int8, m)
    df = vcat(df_blue, df_red)
end;

linear_classifer_boundary(x) = β0 + β1*x

linear_classifer_boundary (generic function with 1 method)

β0: 0.0 β1: 0.0

	True Blue	True Red
Predicited Blue	84	16
Predicited Red	0	10

What is suboptimal about the following loss function?

$$\mathrm{loss}_{\mathrm{class}}(\theta) = \frac{1}{N_{\mathrm{targ}}} \sum_{i=1}^{N_{\mathrm{targ}}} \left[1-\delta(Y_i,\hat{Y}_i(\theta))\right] = (1-\mathrm{accuracy})$$

Relaxing the outputs

General: $\hat{y}_i(\beta) = f(x_i)$
Generalized Linear Model: $\hat{y}_i(\beta) = f(\beta \cdot x_i)$
Logistic Regression: $\hat{y}_i(\beta) = f(\beta \cdot x_i)$, where $f(z) = \frac{1}{1+\exp\left(-z\right)}$

Logistic Regression Likelihood

$$L(\beta) = \prod_{i:\; Y_i=1} \hat{y}(\beta)_i \prod_{i:\; Y_i=0} (1-\hat{y}_i(\beta))$$

$$\mathrm{loss}(\beta) = \sum_{i=1}^{N_{obj}} \left[ Y_i \ln(\hat{y}_i(\beta)) + (1-Y_i) \ln(1-\hat{y}_i(\beta)) \right]$$

Common activation functions

logistic(x) = inv(1+exp(-x))

logistic (generic function with 1 method)

relu(x) = x > zero(x) ? x : zero(x)

relu (generic function with 1 method)

leaky_relu(x; α::Real = 1e-3) = x > zero(x) ? x : α*x

leaky_relu (generic function with 1 method)

Logistic: Tanh: Erf: Relu: Leaky Relu:

Set a threshold

E.g., $f(x) > \frac{1}{2}$ for binary classification
Can change threshold based on purpose of the classifier

Evaluating Classifiers

Contingency Table/ Confusion Matrix

E.g., if goal of classifier is to label stars with a certain type of planet:

	Truth: Planet	Truth: No Palnet
Detected Planet	True Positives	False Positives
Detected No Planet	False Negatives	True Negatives

Abbreviations:

	True yes	True no
Predicited Yes	TP	FP
Predicited No	FN	TN

Many terms to describe performance characteristics

Accuracy: (TP+TN)/(TP+TN+FP+FN)
Positive Predictive Value (Precision): TP/(TP+FP)
True Postive Rate (Recall, Sensitivty): TP/(TP+FN)
True Negative Rate (Specificity): TN/(TN+FP)
False discovery rate: FP/(TP+FP)
False omission rate: FN/(TN+FN)
Even more confusing names

Complications

Hint

Unbalanced datasets
Different cost/reward for FPs vs FNs
Measurement uncertainties
Imperfect labels
Missing labels
Unlabled data
Non-linear classification boundaries

Non-linear Classification

Interior/Exterior sets

β0: 0.0 β1: 0.0

Viewing angles: 4545

Choosing the transformation

Distance from point at (x,y), where x: 0.0 y: 1.0

linear_classifer2_boundary (generic function with 1 method)

XOR

Choosing a transformation

β0: 0.0 β1: 0.0

Viewing angles: 4545

linear_classifer3_boundary (generic function with 1 method)

Other Important Classification Algorithms

Support Vector Machines & the Kernel trick

Provide computationally efficient ways to construct non-linear classifiers

Neural Networks

Replace $f(i,k)$ with a more complicated function.
A basic neural networks is just the composition of an activation of a linear functions of inputs, several or many, many times!
Lots of AI is research in desiging neural network architectures that are particularly well-suited to a common type of problem (e.g., images, video, audio, text)

Credit: The image above is by Glosser.ca, CC BY-SA 3.0, via Wikimedia Commons, original source

Multi-category classification

Input Data: $x_i$:
Label for data: $Y_i$ (integer)
Convert labels to one hot encoding: $y_{i,k}$ (each 0 or 1)
Object id: i=1...$N_{\mathrm{obj}}$
Category ids: k=1...K:

Generalize linear model for multi-category classification:

$$f(i,k) = \beta_k \cdot x_i$$

$$\mathrm{loss}(\beta) = - \sum_{i=1}^{N_{obj}} \sum_{k=1}^K y_{i,k} \ln(\hat{y}_{i,k}(\beta))$$

$$\mathrm{Pr}(Y_i=k) = \frac{e^{f(i,k)}}{1+\sum_{j=1}^K e^{f(i,j)}}$$

Predicted category:

$$\hat{Y}_i = k \; \; \mathrm{s.t.} \; \; \mathrm{Pr}(Y_i=k) > \mathrm{Pr}(Y_i=k')$$

Setup

Functions used

plot_classifier (generic function with 1 method)

glm_accuracy (generic function with 1 method)

classify (generic function with 1 method)

Built with Julia 1.11.5 and

DataFrames 1.7.0
GLM 1.9.0
MLBase 0.9.2
Plots 1.40.9
PlutoTeachingTools 0.3.1
PlutoUI 0.7.61
SpecialFunctions 2.5.0

To run this tutorial locally, download this file and open it with Pluto.jl.

Astro 416

Data Science Applications to Astronomy

Week 9: Model Building IV:

Classification

Announcements

Penn State AI Week

Classification

Why would an astronomer want to classify things?

Binary classification

Relaxing the outputs

Logistic Regression Likelihood

Common activation functions

Set a threshold

Evaluating Classifiers

Contingency Table/ Confusion Matrix

Many terms to describe performance characteristics

Complications

Non-linear Classification

Interior/Exterior sets

Choosing the transformation

XOR

Choosing a transformation

Other Important Classification Algorithms

Support Vector Machines & the Kernel trick

Neural Networks

Multi-category classification

Setup

Functions used