Data Science Applications to Astronomy

Week 9: Model Building IV:

Classification

Announcements

Penn State AI Week

  • Events: April 14-17, 2025

  • Student poster submission deadline: Monday, March 31, 2025

Classification

Why would an astronomer want to classify things?

Binary classification

  • Input Data: \(x_i\):

  • Label for data: \(Y_i\) (0 or 1)

  • Predicted catebory: \(\hat{Y}_i\):

  • Object id: i=1...\(N_{\mathrm{obj}}\):

begin
    regen_data
    n = 100
    m = 10
    df_blue = DataFrame(repeat([1 1 ],n) .+ randn(n,2), [:x,:y] )
    df_red = DataFrame(repeat([ -1 -1], m) .+ randn(m,2), [:x,:y] )
    df_blue.label = ones(Int8,n)
    df_red.label = zeros(Int8, m)
    df = vcat(df_blue, df_red)
end;
linear_classifer_boundary(x) = β0 + β1*x
linear_classifer_boundary (generic function with 1 method)

β0: 0.0 β1: 0.0    

True BlueTrue Red
Predicited Blue8416
Predicited Red010

What is suboptimal about the following loss function?

$$\mathrm{loss}_{\mathrm{class}}(\theta) = \frac{1}{N_{\mathrm{targ}}} \sum_{i=1}^{N_{\mathrm{targ}}} \left[1-\delta(Y_i,\hat{Y}_i(\theta))\right] = (1-\mathrm{accuracy})$$

Relaxing the outputs

  • General: \(\hat{y}_i(\beta) = f(x_i)\)

  • Generalized Linear Model: \(\hat{y}_i(\beta) = f(\beta \cdot x_i)\)

  • Logistic Regression: \(\hat{y}_i(\beta) = f(\beta \cdot x_i)\), where \(f(z) = \frac{1}{1+\exp\left(-z\right)}\)

Logistic Regression Likelihood

$$L(\beta) = \prod_{i:\; Y_i=1} \hat{y}(\beta)_i \prod_{i:\; Y_i=0} (1-\hat{y}_i(\beta))$$

$$\mathrm{loss}(\beta) = \sum_{i=1}^{N_{obj}} \left[ Y_i \ln(\hat{y}_i(\beta)) + (1-Y_i) \ln(1-\hat{y}_i(\beta)) \right]$$

Common activation functions

logistic(x) = inv(1+exp(-x))
logistic (generic function with 1 method)
relu(x) = x > zero(x) ? x : zero(x)
relu (generic function with 1 method)
leaky_relu(x; α::Real = 1e-3) = x > zero(x) ? x : α*x
leaky_relu (generic function with 1 method)

Logistic: Tanh: Erf: Relu: Leaky Relu:

Set a threshold

  • E.g., \(f(x) > \frac{1}{2}\) for binary classification

  • Can change threshold based on purpose of the classifier

Evaluating Classifiers

Contingency Table/ Confusion Matrix

E.g., if goal of classifier is to label stars with a certain type of planet:

Truth: PlanetTruth: No Palnet
Detected PlanetTrue PositivesFalse Positives
Detected No PlanetFalse NegativesTrue Negatives

Abbreviations:

True yesTrue no
Predicited YesTPFP
Predicited NoFNTN

Many terms to describe performance characteristics

  • Accuracy: (TP+TN)/(TP+TN+FP+FN)

  • Positive Predictive Value (Precision): TP/(TP+FP)

  • True Postive Rate (Recall, Sensitivty): TP/(TP+FN)

  • True Negative Rate (Specificity): TN/(TN+FP)

  • False discovery rate: FP/(TP+FP)

  • False omission rate: FN/(TN+FN)

  • Even more confusing names

Complications

Hint
  • Unbalanced datasets

  • Different cost/reward for FPs vs FNs

  • Measurement uncertainties

  • Imperfect labels

  • Missing labels

  • Unlabled data

  • Non-linear classification boundaries

Non-linear Classification

Interior/Exterior sets

β0: 0.0 β1: 0.0

Viewing angles: 4545

Choosing the transformation

Distance from point at (x,y), where     x: 0.0     y: 1.0

linear_classifer2_boundary (generic function with 1 method)

XOR

Choosing a transformation

β0: 0.0 β1: 0.0

Viewing angles: 4545

linear_classifer3_boundary (generic function with 1 method)

Other Important Classification Algorithms

Support Vector Machines & the Kernel trick

  • Provide computationally efficient ways to construct non-linear classifiers

Neural Networks

  • Replace \(f(i,k)\) with a more complicated function.

  • A basic neural networks is just the composition of an activation of a linear functions of inputs, several or many, many times!

  • Lots of AI is research in desiging neural network architectures that are particularly well-suited to a common type of problem (e.g., images, video, audio, text)

Diagram of small dense neural network

Credit: The image above is by Glosser.ca, CC BY-SA 3.0, via Wikimedia Commons, original source

Multi-category classification

  • Input Data: \(x_i\):

  • Label for data: \(Y_i\) (integer)

  • Convert labels to one hot encoding: \(y_{i,k}\) (each 0 or 1)

  • Object id: i=1...\(N_{\mathrm{obj}}\)

  • Category ids: k=1...K:

Generalize linear model for multi-category classification:

$$f(i,k) = \beta_k \cdot x_i$$

$$\mathrm{loss}(\beta) = - \sum_{i=1}^{N_{obj}} \sum_{k=1}^K y_{i,k} \ln(\hat{y}_{i,k}(\beta))$$

$$\mathrm{Pr}(Y_i=k) = \frac{e^{f(i,k)}}{1+\sum_{j=1}^K e^{f(i,j)}}$$

  • Predicted category:

$$\hat{Y}_i = k \; \; \mathrm{s.t.} \; \; \mathrm{Pr}(Y_i=k) > \mathrm{Pr}(Y_i=k')$$

Setup

Functions used

plot_classifier (generic function with 1 method)
glm_accuracy (generic function with 1 method)
classify (generic function with 1 method)

Built with Julia 1.11.5 and

DataFrames 1.7.0
GLM 1.9.0
MLBase 0.9.2
Plots 1.40.9
PlutoTeachingTools 0.3.1
PlutoUI 0.7.61
SpecialFunctions 2.5.0

To run this tutorial locally, download this file and open it with Pluto.jl.