Welcome to Astro 416:

Data Science Applications to Astronomy

Week 1

Course Overview

Course Logistics

Instructor: Eric Ford (Davey Lab 408A)

TA: Andrew Pellegrino (Davey Lab 532C)

Class workflow (for a typical week)

  • Mondays: Introduce topic

  • Wednesdays: Start computer lab (bring laptop)

  • Fridays: Discussion (submit question before class)

  • Following Monday: (deadline for previous lab)

  • Watch Canvas for deivations

    • e.g., First lab due Wednesday due to MLK day, etc.

Website

https://psuastro416.github.io/Spring2025/ contains:

  • Syllabus

  • Info for each week's lessons (e.g., goals, readings & links to additional resources)

  • Links & info for to labs/homework exercises.

  • Info for class project (e..g, explanation, key due dates, rubric)

  • Practical Tips (e.g., step-by-step instructions for accessing Roar Collab)

Canvas

  • Announcements

  • Deadlines for assignments

  • Space to upload some assignments (e.g. project checkpoints)

  • Links to create your personal GitHub repository for each lab

  • Links for TopHat Questions

  • Grades

  • Most info via embedded version of website

    • Canvas doesn't allow some pages to be embedded

    • If you get an error, try following that link in a separate window/tab.

Zoom

  • Backup plan avliable for classes if I'm sick, snowstorm, etc.

Creating Accounts:

Roar Collab:

  • Penn State's supercomputer (for non-sensitive data)

  • We will use Roar Collab for running labs/homework exercises starting Wednesday (& for class projects)

  • Please request Roar Collab account by end of today.

  • Please create a GitHub account (if you don't have one or want to use a separate account for this class)

  • I'll walk people through the setup steps for Roar Collab and starting labs this Wednesday, so you need your account to be active before class on Wednesday.

TopHat:

  • Mostly for you to submit your questions (e.g., "Muddiest point")

  • Might also include some short questions about reading or previous class.

  • Submit TopHat responces by 9am on day of relevant class.

  • Aim for at least 1 question a week.

  • I'll drop a few from grade no questions asked.

  • Create TopHat account in time to submit reading question before class on Friday

    • No need to give them phone number or credit card info.

Textbook & Readings

  • No need to buy a physical textbook.

Several readings from online sources, e.g.:

Safety

  • Please err on side of caution

  • Let me know if you'll be missing several classes for health reasons.

  • You can still submit TopHat questions, labs, etc.

  • Follow university policy. As of January 2025:

Stay home and away from others if you are experiencing fever or respiratory symptoms such as but not limited to cough, sore throat, runny nose, chills, fatigue, headache, body aches. Return to normal activities when, for at least 24 hours, both are true:

  • Your symptoms are getting better overall, AND

  • You have not had a fever (and are not using fever-reducing medication)

Then, take these additional precautions for the next five days to limit the spread of infection:

  • Wear a well-fitting mask

  • Keep a distance from others and/or

  • Get tested to inform your actions to prevent the spread to others

If you begin feeling worse and/or fever returns, stay home and away from others for at least 24 hours until both are true:

  • Your symptoms are getting better overall, and

  • You have not had a fever (and are not using fever-reducing medication)

Academic Integrity

Collaboration

  • Exams/quizzes: No collaboration.

  • Labs:

    • Collaboration with classmates is encouraged.

    • Each student should respond to questions individually.

    • Make liberal use of acknowledgments.

  • Project:

    • Working in teams in strongly encouraged.

    • Present/submit most parts of project jointly.

    • In separate final reports/reflections, each students describes their contributions to the project accurately and to give credit to teammates for their contributions.

Artifical Intelligence (AI)

  • Exams/quizzes: No AI.

  • Limited AI Use: Using an AI-based grammar checker is acceptible and does not need to be disclosed for the labs or project in this course.

  • Disclose AI Use: Otherwise, students must fully disclose any and all use of AI in completing their assignemnts at the time of submission.

  • Students may receive reduced credit for assignments where AI tools were used. If you're unsure what's appropriate, then ask in advance of submission.

Introduction to Course

What is Data Science?

What do you think?

  • pause

One oversimplified take...

The Data Science Venn Diagram is from Drew Conway and is licensed as Creative Commons Attribution-NonCommercial

This class is not meant to teach:

  • Programming (e.g., CMPSC 121, 122)

  • Numerical Methods (see ASTRO 410)

  • Linear Algebra (see MATH 220)

  • Statistics (e.g., ASTRO 415)

  • Machine Learning (e.g., DS 310)

  • Astronomical Techniques (see ASTOR 451)

  • How to conduct Astronomy Research (e.g., a summer project or thesis)

So what does this class do?

Data Science:

  • Develop Data Acumen[1]

    • Model building

    • Data visualization

    • Reproducible research

  • Practice "soft skills"

    • Technical collaboration

    • Effective visualization

    • Scientific communications

  • Provide taste of Data Science

Data Acumen

1

"We define data acumen as the ability to make good judgements about the use of data to support problem solutions." (Keller et al. 2020)

Along the way...

You'll learn and/or reinforce foundational concepts that are covered in much more detail in other courses:

  • Programming (build experience, likely a new language)

  • Numerical Methods (e.g., integration, sampling)

  • Linear Algebra (e.g., solving linear systems)

  • Statistics (e.g., likelihood, priors, distributions)

  • Machine Learning (e.g., optimization)

  • Astronomical Techniques (e.g., observational biases)

Course Overview

Students will build practical data science skills (e.g., querying astronomical databases, data storage and manipulation, data visualization, exploratory and explanatory data analysis, Bayesian modeling workflows, and reproducible research practices) and apply these lessons to analyzing data from astronomical surveys.

Goals

  • Increase their data acumen, and

  • Appreciate how building data science skills can benefit astronomy & astrophysics research.

Objectives

  • Ingest and manipulate data from astronomical surveys.

  • Build, apply, assess and update astrophysically motivated models for astronomical observations.

  • Create visualizations for exploratory and explanatory data analyses of observations from astronomical surveys.

  • Synthesize the above into a dashboard to support the efficient analysis of astronomical observations.

  • Incorporate principles of reproducible research into their class project.

Remebmer to request Roar account so can log in on Wednesday!

Philosophy of Data Analysis

What's the goal of an analysis?

Example Goals

  • Test a precisely formulated hypothesis

  • Make predicitons with an well-established model

  • Make predictions, but you don't (yet?) have a trustworthy model

  • Refine an exisiting model with additional data

  • Compare multiple competing models to gain physical insight

  • Compare multiple models to pick one to use for making predictions

  • Take a first look at a new dataset and find something interesting

  • Prioritize what portions of analysis will require the most attention

Different Goals require Different Approaches

Prototypical Approaches

Tradditional physics/first-principles approach

  • Start with a model we trust

  • Add data to infer model parameters

Machine learning/Data-driven approach

  • Start with large dataset

  • Find a model that would have made good predictions

Limitations of the Prototypical Approaches

First-principles approach

  • Start with a model we trust

  • Add data to infer model parameters

Machine learning/Data-driven approach

  • Start with large dataset

  • Find a model that would have made good predictions

Prototypical Astronomical Problem

  • Significant astrophysical knowledge

    • but model is an approximation & incomplete

  • Significant quantity of data

    • observing time is limited,

    • data collection process is complicated, and

    • constantly pushing limits of capabilities.

How can we integrate these effectively?

  • What tool(s) are best for our task?

  • Learning to apply an existing tool to data is (usually) relatively straightforward.

  • Developing experience to apply tools wisely is more challenging.

  • Acquiring the breath of experience to choose best tools for a task is a lifetime of learning.

Setup & Helper Code

Built with Julia 1.11.5 and

PlutoTeachingTools 0.3.1
PlutoUI 0.7.60

To run this tutorial locally, download this file and open it with Pluto.jl.