Welcome to Astro 416:
Data Science Applications to Astronomy
Week 1
Course Overview
Course Logistics
Instructor: Eric Ford (Davey Lab 408A)
TA: Andrew Pellegrino (Davey Lab 532C)
Class workflow (for a typical week)
Mondays: Introduce topic
Wednesdays: Start computer lab (bring laptop)
Fridays: Discussion (submit question before class)
Following Monday: (deadline for previous lab)
Watch Canvas for deivations
e.g., First lab due Wednesday due to MLK day, etc.
Website
https://psuastro416.github.io/Spring2025/ contains:
Syllabus
Info for each week's lessons (e.g., goals, readings & links to additional resources)
Links & info for to labs/homework exercises.
Info for class project (e..g, explanation, key due dates, rubric)
Practical Tips (e.g., step-by-step instructions for accessing Roar Collab)
Canvas
Announcements
Deadlines for assignments
Space to upload some assignments (e.g. project checkpoints)
Links to create your personal GitHub repository for each lab
Links for TopHat Questions
Grades
Most info via embedded version of website
Canvas doesn't allow some pages to be embedded
If you get an error, try following that link in a separate window/tab.
Zoom
Backup plan avliable for classes if I'm sick, snowstorm, etc.
Creating Accounts:
Roar Collab:
Penn State's supercomputer (for non-sensitive data)
We will use Roar Collab for running labs/homework exercises starting Wednesday (& for class projects)
Please request Roar Collab account by end of today.
Please create a GitHub account (if you don't have one or want to use a separate account for this class)
I'll walk people through the setup steps for Roar Collab and starting labs this Wednesday, so you need your account to be active before class on Wednesday.
TopHat:
Mostly for you to submit your questions (e.g., "Muddiest point")
Might also include some short questions about reading or previous class.
Submit TopHat responces by 9am on day of relevant class.
Aim for at least 1 question a week.
I'll drop a few from grade no questions asked.
Create TopHat account in time to submit reading question before class on Friday
No need to give them phone number or credit card info.
Textbook & Readings
No need to buy a physical textbook.
Several readings from online sources, e.g.:
Julia Data Science (Storopoli & Huljzer; ISBN-13: 979-8489859165) or online
Think Julia (Lauwens; ISBN: 1492045039) or online
Statistical Rethinking (McElreath ; ISBN-13: 978-0367139919) (free PDF of Ch 1 & 2 online)
Safety
Please err on side of caution
Let me know if you'll be missing several classes for health reasons.
You can still submit TopHat questions, labs, etc.
Follow university policy. As of January 2025:
Stay home and away from others if you are experiencing fever or respiratory symptoms such as but not limited to cough, sore throat, runny nose, chills, fatigue, headache, body aches. Return to normal activities when, for at least 24 hours, both are true:
Your symptoms are getting better overall, AND
You have not had a fever (and are not using fever-reducing medication)
Then, take these additional precautions for the next five days to limit the spread of infection:
Wear a well-fitting mask
Keep a distance from others and/or
Get tested to inform your actions to prevent the spread to others
If you begin feeling worse and/or fever returns, stay home and away from others for at least 24 hours until both are true:
Your symptoms are getting better overall, and
You have not had a fever (and are not using fever-reducing medication)
Academic Integrity
Collaboration
Exams/quizzes: No collaboration.
Labs:
Collaboration with classmates is encouraged.
Each student should respond to questions individually.
Make liberal use of acknowledgments.
Project:
Working in teams in strongly encouraged.
Present/submit most parts of project jointly.
In separate final reports/reflections, each students describes their contributions to the project accurately and to give credit to teammates for their contributions.
Artifical Intelligence (AI)
Exams/quizzes: No AI.
Limited AI Use: Using an AI-based grammar checker is acceptible and does not need to be disclosed for the labs or project in this course.
Disclose AI Use: Otherwise, students must fully disclose any and all use of AI in completing their assignemnts at the time of submission.
Students may receive reduced credit for assignments where AI tools were used. If you're unsure what's appropriate, then ask in advance of submission.
Introduction to Course
What is Data Science?
What do you think?
pause
One oversimplified take...
The Data Science Venn Diagram is from Drew Conway and is licensed as Creative Commons Attribution-NonCommercial
This class is not meant to teach:
Programming (e.g., CMPSC 121, 122)
Numerical Methods (see ASTRO 410)
Linear Algebra (see MATH 220)
Statistics (e.g., ASTRO 415)
Machine Learning (e.g., DS 310)
Astronomical Techniques (see ASTOR 451)
How to conduct Astronomy Research (e.g., a summer project or thesis)
So what does this class do?
Data Science:
Develop Data Acumen[1]
Model building
Data visualization
Reproducible research
Practice "soft skills"
Technical collaboration
Effective visualization
Scientific communications
Provide taste of Data Science
Data Acumen
1
"We define data acumen as the ability to make good judgements about the use of data to support problem solutions." (Keller et al. 2020)
Along the way...
You'll learn and/or reinforce foundational concepts that are covered in much more detail in other courses:
Programming (build experience, likely a new language)
Numerical Methods (e.g., integration, sampling)
Linear Algebra (e.g., solving linear systems)
Statistics (e.g., likelihood, priors, distributions)
Machine Learning (e.g., optimization)
Astronomical Techniques (e.g., observational biases)
Course Overview
Students will build practical data science skills (e.g., querying astronomical databases, data storage and manipulation, data visualization, exploratory and explanatory data analysis, Bayesian modeling workflows, and reproducible research practices) and apply these lessons to analyzing data from astronomical surveys.
Goals
Increase their data acumen, and
Appreciate how building data science skills can benefit astronomy & astrophysics research.
Objectives
Ingest and manipulate data from astronomical surveys.
Build, apply, assess and update astrophysically motivated models for astronomical observations.
Create visualizations for exploratory and explanatory data analyses of observations from astronomical surveys.
Synthesize the above into a dashboard to support the efficient analysis of astronomical observations.
Incorporate principles of reproducible research into their class project.
Remebmer to request Roar account so can log in on Wednesday!
Philosophy of Data Analysis
What's the goal of an analysis?
Example Goals
Test a precisely formulated hypothesis
Make predicitons with an well-established model
Make predictions, but you don't (yet?) have a trustworthy model
Refine an exisiting model with additional data
Compare multiple competing models to gain physical insight
Compare multiple models to pick one to use for making predictions
Take a first look at a new dataset and find something interesting
Prioritize what portions of analysis will require the most attention
Different Goals require Different Approaches
Prototypical Approaches
Tradditional physics/first-principles approach
Start with a model we trust
Add data to infer model parameters
Machine learning/Data-driven approach
Start with large dataset
Find a model that would have made good predictions
Limitations of the Prototypical Approaches
First-principles approach
Start with a model we trust
Add data to infer model parameters
Machine learning/Data-driven approach
Start with large dataset
Find a model that would have made good predictions
Prototypical Astronomical Problem
Significant astrophysical knowledge
but model is an approximation & incomplete
Significant quantity of data
observing time is limited,
data collection process is complicated, and
constantly pushing limits of capabilities.
How can we integrate these effectively?
What tool(s) are best for our task?
Learning to apply an existing tool to data is (usually) relatively straightforward.
Developing experience to apply tools wisely is more challenging.
Acquiring the breath of experience to choose best tools for a task is a lifetime of learning.
Setup & Helper Code
Built with Julia 1.11.5 and
PlutoTeachingTools 0.3.1PlutoUI 0.7.60
To run this tutorial locally, download this file and open it with Pluto.jl.