Astro 416

Week 6: Q&A

Stack/Heap

Question:

How does a programmer specify whether their variables are stored in stack or heap? What programming techniques/constructs allow for this control?

begin
    # Type is Float64, size is fixed and known at compile time
    x_stack = 5.0      
    sum_stack = 0.0
    
    # Type is Vector{Float64}, size is variable
    x_heap = fill(5.0, 1)   
    sum_heap = fill(0.0, 1)
end
1-element Vector{Float64}:
 0.0
function sum_stack!(sum, x, n)
    sum = 0
    for i in 1:n sum += x end
    sum
end
sum_stack! (generic function with 1 method)
function sum_sqrt_stack!(sum, x, n)
    sum = 0
    for i in 1:n sum += sqrt(x) end
    sum
end
sum_sqrt_stack! (generic function with 1 method)
function sum_array!(sum, x, n)
    sum[1] = 0
    for i in 1:n sum[1] += x[1] end
    sum[1]
end
sum_array! (generic function with 1 method)
function sum_sqrt_array!(sum, x, n)
    sum[1] = 0
    for i in 1:n sum[1] += sqrt(x[1]) end
    sum[1]
end
sum_sqrt_array! (generic function with 1 method)

Sum of values

@benchmark sum_stack!($sum_stack, $x_stack, 10_000_00)
BenchmarkTools.Trial: 5013 samples with 1 evaluation per sample.
 Range (min … max):  925.253 μs …   6.529 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     940.210 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   984.911 μs ± 356.939 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                                              
  ██▇▇▇▄▄▃▁▃▁▃▁▁▁▄▁▃▃▁▁▄▁▁▃▁▁▁▁▃▁▃▁▄▃▃▁▁▁▃▁▃▃▃▁▃▁▃▄▃▁▁▁▃▁▁▁▁▁▄▃ █
  925 μs        Histogram: log(frequency) by time       3.84 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark sum_array!($sum_heap, $x_heap, 10_000_000)
BenchmarkTools.Trial: 256 samples with 1 evaluation per sample.
 Range (min … max):   9.430 ms … 42.497 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     18.488 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   19.534 ms ±  4.624 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂                  █      ▄            ▁                     
  █▄▁▄▁▁▄▁▁▁▁▄▁▇▁▆▁▄▅█▇▁▁▁▅▄█▄▇▁▁▄▇▄▁▁▄▇▄█▁▁▅▄▁▄▁▁▁▁▁▁▁▁▁▁▁▁▄ ▅
  9.43 ms      Histogram: log(frequency) by time      36.5 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Sum of sqrt's

@benchmark sum_sqrt_stack!($sum_stack, $x_stack, 10_000_00)
BenchmarkTools.Trial: 2666 samples with 1 evaluation per sample.
 Range (min … max):  925.272 μs … 10.001 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     946.922 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.862 ms ±  1.476 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                             ▆                               
  █▆▄▃▄▁▄▃▁▃▆▄▅▃▄▃▃▁▄▄▇▁▃▅▃▄▄▅▅▅█▃▅▅▆▄▁▁▁▁▇▁▃▁▁▁▁▁▁▃▅▁▁▁▁▁▁▃▁▆ █
  925 μs        Histogram: log(frequency) by time      6.95 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark sum_sqrt_array!($sum_heap, $x_heap, 10_000_000)
BenchmarkTools.Trial: 99 samples with 1 evaluation per sample.
 Range (min … max):  28.239 ms … 77.458 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     55.384 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   50.593 ms ± 11.857 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆           ▁                       █   ▇▂                   
  █▁▁▅▅▁▁▁▅▅▁▅█▁▁▁▅▁▁▅▇▁▁▁▅▁▁▁▁▅▁█▅▅▁██▇▇▅██▇▁▅▁▅▁▇▁▅▁▁▁▁▁▁▁▅ ▁
  28.2 ms      Histogram: log(frequency) by time      71.8 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Using a fixed-size array (via StaticArrays.jl)

begin
    x_fixed_size = MVector{3,Float64}(5.0,6.0,7.0)
    sum_fixed_size = MVector{1,Float64}(0.0,)
end
1-element MVector{1, Float64} with indices SOneTo(1):
 0.0
@benchmark sum_array!($sum_fixed_size, $x_fixed_size, 10_000_00)
BenchmarkTools.Trial: 3499 samples with 1 evaluation per sample.
 Range (min … max):  925.773 μs … 6.959 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     946.052 μs             ┊ GC (median):    0.00%
 Time  (mean ± σ):     1.420 ms ± 1.139 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                                           ▅                
  ██▅▅▄▃▁▃▃▄▃▅▄▃▃▆▄▃▃▄▄▁▃▁▃▃▃▄▁▆▁▄▃▅▅▅▄▅▄▄▅▄▅▄█▄▃▁▁▃▁▃▁▁▁▁▁▃▄ █
  926 μs       Histogram: log(frequency) by time      4.95 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
@benchmark sum_sqrt_array!($sum_fixed_size, $x_fixed_size, 10_000_00)
BenchmarkTools.Trial: 1114 samples with 1 evaluation per sample.
 Range (min … max):  2.789 ms … 17.471 ms  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     3.144 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.482 ms ±  2.194 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █        ▂       ▂  ▆                                    ▁  
  ████▇█▆▇██▆▄▁▇▆▄▅█▄▄██▆█▄▆▆▇▄▄▆▆▄▆▁▄▄▇▄▅▄▄▄▁▄▄▅▁▄▄▄▆▄▁▄▄▄█ █
  2.79 ms      Histogram: log(frequency) by time     11.5 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

Lab

Question:

Could you please explain why the benchmark numbers at the start of the lab keep changing so much, and by big margins?If these numbers are not static, how should we go about calculating and estimating the read and write times?

File formats

Question:

Besides CSV, JLD2, and FITS, are there any other popular/commonly used file formats in physics/astro/data science?

Question:

Are there many others that can streamline different data types?

Question:

How can HDF5/JLD2 files have the same size as CSV files if they also store all the metadata info?

Queries

Question:

What’s the best practice in Julia for handling errors or timeouts when querying an ADQL service? 

Should I use Julia’s try-catch blocks, or are there any more efficient methods for managing these cases?

Should I use Julia’s try-catch blocks, or are there any more efficient methods for managing these cases?

Exam

Pick from list of practice exam questions

Project

Question:

How do we choose how much data to download at first if there’s a lot of data but we don’t want to loose too much information?

  • Start small for speed.

  • When you realize you need more, increase it.

  • Compare results if you double/half the data size

  • When selecting the top rows, make sure that the order of the data doesn't create a bias.

Question:

How much time should we spend optimizing the dashboard for speed and memory?

If things run in a few seconds, then the only reason to optimize is to gain experience doing so.  If it takes 30 minutes, then there's more motivation to look for opportunities to improve performance.

Setup

Built with Julia 1.11.5 and

BenchmarkTools 1.6.0
PlutoTeachingTools 0.3.1
PlutoUI 0.7.61
StaticArrays 1.9.11

To run this tutorial locally, download this file and open it with Pluto.jl.