Data Science Applications to Astronomy
Week 9: Model Building IV:
Classification
Why is 0.5 the default threshold value?
Logistic:
What makes logistic regression a strong choice for classification tasks?
When determining classifications, how does one determine how many classes there should be?
Clustering
K-means clustering
Heirarchical clustering
What's the difference between clustering & classificaiton?
Create simulated dataset
begin
num_pts = 200
idx_grps = [1:50, 51:75, 76:num_pts]
idx_centers = [-1 0; 0 1; 1 0]'
scale_fac = 0.5
end;
using Clustering, Distances, StatsPlots, Random
dist_matrix_ordered = pairwise(Euclidean(), pts_ordered, dims=2);
Compute pair-wise distances
heatmap(dist_matrix_ordered)
Apply Heirarchical Clustering
hcl_ordered = hclust(dist_matrix_ordered, linkage=:average)
Hclust{Float64}([-132 -169; -85 -164; … ; 196 197; -48 198], [0.017273516842157548, 0.02616261467893843, 0.03537046729533142, 0.03632868074455062, 0.040455044954493063, 0.055759116531823284, 0.05625684453023014, 0.0569594971679231, 0.05820624562044343, 0.06198464043369185 … 1.332431419082579, 1.3389343978382624, 1.6674027121198514, 1.6800152731076536, 1.9307448587555502, 2.2185358454030966, 2.5318136160100755, 2.7760997402959555, 3.102837664599889, 3.4220604197839686], [48, 27, 125, 160, 89, 35, 180, 133, 109, 146 … 139, 69, 177, 181, 165, 166, 121, 167, 20, 94], :average)
plot(hcl_ordered, xticks=:none, xlabel="Point ID", ylabel="Threshold for clustering")
Scramble the order of points
hcl_unordered = hclust(dist_matrix_unordered, linkage=:average, branchorder=:optimal);
plot(hcl_unordered)
Compare results from ordered vs unordered
let
plt1 = plot(hcl_ordered, xticks=:none, title="Ordered")
plt2 = plot(hcl_unordered, xticks=:none, title="Unordered")
plot(plt1,plt2, layout=(2,1))
end
Zoom in on point labels
Limits for Dendogram: Min
Neural Networks
What is a neural network?
A basic neural networks is just the composition of an activation of a linear functions of inputs, several or many, many times!
Lots of AI is research in desiging neural network architectures that are particularly well-suited to a common type of problem (e.g., images, video, audio, text)
Credit: The image above is by Glosser.ca, CC BY-SA 3.0, via Wikimedia Commons, original source
Announcements
Plan for Next week
Notes from Monday
Common activation functions
logistic(x) = inv(1+exp(-x))
logistic (generic function with 1 method)
relu(x) = x > zero(x) ? x : zero(x)
relu (generic function with 1 method)
leaky_relu(x; α::Real = 1e-3) = x > zero(x) ? x : α*x
leaky_relu (generic function with 1 method)
Setup
Built with Julia 1.11.5 and
Clustering 0.15.8Distances 0.10.12
Plots 1.40.9
PlutoTeachingTools 0.3.1
PlutoUI 0.7.61
Random 1.11.0
SpecialFunctions 2.5.0
StatsPlots 0.15.7
To run this tutorial locally, download this file and open it with Pluto.jl.