from IPython.display import Image
import astroemperor as emp
import numpy as np
np.random.seed(1234)
sim = emp.Simulation()
sim.load_data('51Peg') # folder read from /datafiles/
I couldnt grab the terminal size. Trying with pandas... Terminal size with pandas successful! ~~ Simulation Successfully Initialized ~~ Reading data from 51peg.vels
General Options¶
We take a deeper look into the Emperor options, starting with parallelisation, and sampler backend.
Parallelisation¶
Parallelisation can be done with several different libraries. The available options are:
multiprocess_method: To change the parallelisation scheme.
cores__: Misnomer for how many threads (and not cores) to use.
| Code | Library | Pool |
|---|---|---|
| 0 | None | None |
| 1 | multiprocessing | Pool |
| 2 | multiprocess | Pool |
| 3 | multiprocessing | ThreadPool |
| 4 | pathos | ProcessingPool |
| 5 | schwimmbad | SerialPool |
| 6 | schwimmbad | JoblibPool |
| 7 | schwimmbad | MultiPool |
If you want a custom parallelisation scheme, you can add the script directly into the astroemperor/support/pools folder, as a file named 08.pool, and then use it directly in EMPEROR with
multiprocess_method = 8.
Sampler backend¶
reddemcee has two different backends. PTBackend is the default, working exclusively with RAM, and HDFBackend, stores the chain in an HDF5 file using h5py, saving it there step-by-step.
The first one is faster, but requires a higher RAM usage.
The second one is safer and less memory-hungry. To use the h5 backend, simply change the backend_bool attribute to True.
sim.multiprocess_method = 1 # multiprocessing Pool
sim.cores__ = 12 # threads for the run
sim.backend_bool = False # True for h5py backend
Engine configuration¶
We add set_engine with some custom options. We will use a different starting temperature ladder. A good initial ladder increases the efficiency of the adaptation, in the same way a good initial guess for the parameters increases the convergence time.
We will use a different adaptation algorithm, based on the specific heat of the system. We also change the adaptation rate, and decay timescale.
ntemps, nwalkers, nsweeps, nsteps = 11, 128, 3000, 1
sim.set_engine('reddemcee')
sim.engine_config['setup'] = [ntemps, nwalkers, nsweeps, nsteps]
sim.engine_config['betas'] = list(np.linspace(1, 0, ntemps))
sim.engine_config['tsw_history'] = True # save temperature swaps per sweep
sim.engine_config['smd_history'] = True # save swap mean distance per sweep
sim.engine_config['adapt_tau'] = 100
sim.engine_config['adapt_nu'] = 1.5
sim.engine_config['adapt_mode'] = 'SAR' # uniform Swap Acceptance Rate
Run Configuration¶
Here we can add keywords for the engine running. For reddemcee, we have:
burnin, adaptation_batches, adaptation_nsweeps, thin, discard, and logger_level.
We will set up a 10% burn-in phase.
sim.run_config['burnin'] = 0.1
Model Comparison¶
In our example, instead of comparing BIC we will compare Evidences directly. We can change the evidence estimation method between Curvature-aware Thermodynamic Integration (TI+), Geometric-Bridge Stepping Stones (SS+), and a Hybrid algorithm (for more details see the reddemcee paper).
sim.set_comparison_criteria('Evidence')
sim.set_tolerance(10) # difference between models
sim.evidence_method = 'ss' # ti, ss, hybrid
Comparison Criteria set to Evidence Comparison Criteria Tolerance set to 10
Model Setup¶
We feed the name of the instrument (optional), as well as the stellar mass for calculating the minimum-mass and semi-major axis, and stellar mass error (optional). We will use the Keplerian parameterisation $(P, K, \phi, e, \omega)$.
We add some boundaries and initial positions to speed up the process:
sim.instrument_names_RV = ['LICK']
sim.starmass = 1.12 # in solar masses
sim.starmass_err = 0.04
sim.keplerian_parameterisation = 0
sim.add_condition(['Period 1', 'limits', [3, 5]])
sim.add_condition(['Amplitude 1', 'limits', [45, 60]])
sim.add_condition(['Eccentricity 1','limits', [0, 0.5]])
sim.add_condition(['Offset 1', 'limits', [-10., 10.]])
sim.add_condition(['Period 1', 'init_pos', [4.1, 4.3]])
sim.add_condition(['Amplitude 1', 'init_pos', [50, 60]])
sim.add_condition(['Eccentricity 1','init_pos', [0, 0.1]])
Plotting Options¶
We add some plotting options to speed up this test a little. We will only plot the posteriors for the cold chain, and two intermediate chains. Also, we won't use the arviz optional plots.
sim.plot_posteriors['temps'] = [0]
sim.plot_trace['plot'] = False
sim.plot_gaussian_mixtures['plot'] = False
Run and Results¶
Finally, we run our simulation (it will take some minutes):
sim.autorun(0, 1)
Offset block added, OffsetBlock Jitter block added, JitterBlock Condition applied: Parameter Offset 1 attribute limits set to [-10.0, 10.0] ~~ Setup Info ~~ Current Engine is reddemcee 0.9.12 Number of cores is 12 Save location is datalogs/51Peg/run_5/k0 Dynamical Criteria is None Posterior fit method is Gaussian Mixtures Limits constrain method is range Model Selection method is Evidence ~~ Automatically Saving ~~ Logger : ✔ Samples : ✘ Posteriors : ✔ Likelihoods : ✔ Plots: Posteriors : ✔ Plots: Keplerian Model : ✔ Plots: Gaussian Mixture : ✘ Plots: Parameter Histograms : ✔ Plots: Corner : ✔ ~~ Pre-Run Info ~~ Parameter Prior Limits ----------- ------------------ ------------- Offset 1 ~𝓤 (-10.0, 10.0) [-10, 10] ----------- ------------------ ------------- Jitter 1 ~𝓝 (5, 5) [0, 75.852] Math for OffsetBlock: γ₀|ᵢ Math for JitterBlock: 𝝈ᵢ Generating Samples
100%|██████████| 33000/33000 [03:01<00:00, 182.21it/s]
temp_script.py took 181.242 seconds
Autocorrelation tolerance=50 fails. Setting to 0.
Calculating Gaussian Mixtures
100%|██████████| 2/2 [00:01<00:00, 1.87it/s]
~~ Best Fit ~~ Parameter Value(max) Range(±σ) Prior Limits ----------- ------------ ------------------ ------------------ ------------- Offset 1 -0.147 [-1.166, 0.966] ~𝓤 (-10.0, 10.0) [-10, 10] ----------- ------------ ------------------ ------------------ ------------- Jitter 1 36.06 [35.398, 36.706] ~𝓝 (5, 5) [0, 75.852] ~~ Run Info ~~ Info Value ----------------------------------- ---------------------------------------------------------------------------------------- Star Name : 51Peg The sample sizes are : [345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600] Temps, Walkers, Sweeps, Steps : [11, 128, 3000, 1] Model used is : ['OffsetBlock', 'JitterBlock'](2) N data : 256 t0 epoch is : 2450002.665695 Number of Dimensions : 2 Degrees of Freedom : 254 ---------------------------------------- Decay Timescale, Rate, Scheme : 100, 1.5, SAR Beta Detail : [1.0, 0.5343, 0.2997, 0.1675, 0.09302, 0.0512, 0.0259, 0.01163, 0.004531, 0.001202, 0.0] Mean Logl Detail : [-1309.583, -1313.134, -1319.912, -1332.029, -1353.087, -1389.564, -1453.304, -1568.904, -1799.818, -2371.170, -3811.617] Mean Acceptance Fraction : [0.712, 0.713, 0.714, 0.712, 0.710, 0.705, 0.698, 0.691, 0.682, 0.677, 0.657] Autocorrelation Time : [15.916, 18.314] Temperature Swap Rate : [0.396, 0.396, 0.396, 0.396, 0.396, 0.396, 0.396, 0.396, 0.396, 0.397] Mean Swap Distance : [1.342, 1.676, 2.008, 2.269, 2.486, 2.635, 2.776, 2.864, 2.978, 3.133] ~~ Statistical Details ~~ Statistic Value ---------------------------- ------------------ Evidence ln(Z) : -1330.415 +- 0.030 Posterior ln(P) (max) : -1333.444 Likelihood ln(L) (max) : -1308.798 Bayes Factor : 21.617 BIC : 2628.687 AIC : 2621.596 DIC : 2632.175 HQIC : 2624.448 χ² : 302.729 Reduced χ² : 1.192 RMSE : 39.917 RMSi : [39.917] Weights (RMS) : [1.] Plotting Posterior Scatter Plot
100%|██████████| 2/2 [00:02<00:00, 1.18s/it]
Plotting Histograms Plot
100%|██████████| 22/22 [00:05<00:00, 3.70it/s]
Plotting Keplerian Models Plotting E[log L](beta) Plot
100%|██████████| 1/1 [00:00<00:00, 7.26it/s]
Plotting Beta Density
100%|██████████| 1/1 [00:00<00:00, 5.30it/s]
Plotting Temperature Rates
100%|██████████| 10/10 [00:01<00:00, 9.43it/s]
Time Table Time RUN : 00:03:03 Time POSTPROCESS : 00:00:02 Time CALCULATE GM : 00:00:01 Time plot_posteriors : 00:00:02 Time plot_histograms : 00:00:06 Time plot_keplerian_model : 00:00:00 Time plot_betas : 00:00:00 Time plot_beta_density : 00:00:00 Time plot_rates : 00:00:01 Time plot_trace : 00:00:00 Time plot_astrometric_model: 00:00:00 present Evidence - past Evidence > 5 Evidence condition met!! -1330.415 - -inf > 10 ~~ Proceeding with the next run ! ~~ Keplerian block added, KeplerianBlock 1 Condition applied: Parameter Period 1 attribute limits set to [3, 5] Condition applied: Parameter Period 1 attribute init_pos set to [4.1, 4.3] Condition applied: Parameter Amplitude 1 attribute limits set to [45, 60] Condition applied: Parameter Amplitude 1 attribute init_pos set to [50, 60] Condition applied: Parameter Eccentricity 1 attribute limits set to [0, 0.5] Condition applied: Parameter Eccentricity 1 attribute init_pos set to [0, 0.1] Condition applied: Parameter Jitter 1 attribute limits set to [1e-05, 39.01019232281867] ~~ Pre-Run Info ~~ Parameter Prior Limits ---------------- ------------------ ------------ Period 1 ~𝓤 (3, 5) [3, 5] Amplitude 1 ~𝓤 (45, 60) [45, 60] Phase 1 ~𝓤 (0.0, 6.283) [0, 6.283] Eccentricity 1 ~𝓝 (0.0, 0.1) [0, 0.5] Longitude 1 ~𝓤 (0.0, 6.283) [0, 6.283] ---------------- ------------------ ------------ Offset 1 ~𝓤 (-10.0, 10.0) [-10, 10] ---------------- ------------------ ------------ Jitter 1 ~𝓝 (5.0, 5.0) [0, 39.01] Math for KeplerianBlock 1: K⋅(cos(ν(t,P,𝜙,e)+𝜔 )+e⋅cos(𝜔 ))|₁ Math for OffsetBlock: γ₀|ᵢ Math for JitterBlock: 𝝈ᵢ Generating Samples
100%|██████████| 33000/33000 [03:08<00:00, 175.20it/s]
temp_script.py took 188.531 seconds
Autocorrelation tolerance=50 fails. Setting to 0.
Calculating Gaussian Mixtures
100%|██████████| 9/9 [00:07<00:00, 1.27it/s]
~~ Best Fit ~~ Parameter Value(max) Range(±σ) Prior Limits ---------------- ------------ ------------------ ------------------ ------------ Period 1 4.231 [4.231, 4.231] ~𝓤 (3, 5) [3, 5] Amplitude 1 56.038 [55.665, 56.241] ~𝓤 (45, 60) [45, 60] Phase 1 1.91 [1.026, 2.408] ~𝓤 (0.0, 6.283) [0, 6.283] Eccentricity 1 0.013 [0, 0.022] ~𝓝 (0.0, 0.1) [0, 0.5] Longitude 1 0.825 [0.341, 1.718] ~𝓤 (0.0, 6.283) [0, 6.283] ---------------- ------------ ------------------ ------------------ ------------ Offset 1 3.767 [3.53, 3.942] ~𝓤 (-10.0, 10.0) [-10, 10] ---------------- ------------ ------------------ ------------------ ------------ Jitter 1 2.933 [2.882, 3.557] ~𝓝 (5.0, 5.0) [0, 39.01] ~~ Run Info ~~ Info Value ----------------------------------- ---------------------------------------------------------------------------------------- Star Name : 51Peg The sample sizes are : [345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600, 345600] Temps, Walkers, Sweeps, Steps : [11, 128, 3000, 1] Model used is : ['KeplerianBlock 1', 'OffsetBlock', 'JitterBlock'](7) N data : 256 t0 epoch is : 2450002.665695 Number of Dimensions : 7 Degrees of Freedom : 249 ---------------------------------------- Decay Timescale, Rate, Scheme : 100, 1.5, SAR Beta Detail : [1.0, 0.458, 0.2156, 0.1012, 0.04745, 0.02049, 0.01, 0.005533, 0.002423, 0.0006743, 0.0] Mean Logl Detail : [-872.942, -876.824, -884.601, -901.001, -933.508, -1001.004, -1192.199, -1545.082, -2097.690, -3240.642, -6375.028] Mean Acceptance Fraction : [0.222, 0.208, 0.210, 0.214, 0.217, 0.207, 0.160, 0.176, 0.225, 0.255, 0.249] Autocorrelation Time : [60.152, 60.531, 106.100, 73.929, 97.969, 64.210, 65.563] Temperature Swap Rate : [0.382, 0.379, 0.376, 0.372, 0.366, 0.359, 0.354, 0.351, 0.349, 0.346] Mean Swap Distance : [1.402, 1.549, 1.786, 2.182, 2.810, 3.516, 4.045, 3.943, 3.866, 3.808] ~~ Statistical Details ~~ Statistic Value ---------------------------- ----------------- Evidence ln(Z) : -932.838 +- 0.588 Posterior ln(P) (max) : -879.972 Likelihood ln(L) (max) : -869.527 Bayes Factor : 63.311 BIC : 1777.869 AIC : 1753.053 DIC : 35315.783 HQIC : 1763.034 χ² : 269.670 Reduced χ² : 1.083 RMSE : 7.620 RMSi : [7.62] Weights (RMS) : [1.] Plotting Posterior Scatter Plot
100%|██████████| 3/3 [00:10<00:00, 3.41s/it]
Plotting Histograms Plot
36%|███▋ | 12/33 [00:07<00:09, 2.10it/s]
Failed to plot the Period 1 histogram
45%|████▌ | 15/33 [00:08<00:08, 2.15it/s]
Failed to plot the Period 1 histogram
100%|██████████| 33/33 [00:19<00:00, 1.72it/s]
Plotting Keplerian Models Plotting E[log L](beta) Plot
100%|██████████| 1/1 [00:00<00:00, 7.23it/s]
Plotting Beta Density
100%|██████████| 1/1 [00:00<00:00, 5.16it/s]
Plotting Temperature Rates
100%|██████████| 10/10 [00:01<00:00, 9.03it/s]
Time Table Time RUN : 00:03:10 Time POSTPROCESS : 00:00:10 Time CALCULATE GM : 00:00:07 Time plot_posteriors : 00:00:10 Time plot_histograms : 00:00:19 Time plot_keplerian_model : 00:00:00 Time plot_betas : 00:00:00 Time plot_beta_density : 00:00:00 Time plot_rates : 00:00:01 Time plot_trace : 00:00:00 Time plot_astrometric_model: 00:00:00 present Evidence - past Evidence > 5 Evidence condition met!! -932.838 - -1330.415 > 10 ~~ End of the Run ~~
We could also recompute the evidences under different conditions:
z_ti = np.round(sim.sampler.get_evidence_ti(discard=sim.reddemcee_discard)[:2], 3)
z_ss = np.round(sim.sampler.get_evidence_ss(discard=sim.reddemcee_discard), 3)
z_hy = np.round(sim.sampler.get_evidence_hybrid(discard=sim.reddemcee_discard), 3)
print(f'Thermodynamic Integration logZ = {z_ti[0]} ± {z_ti[1]}')
print(f'Stepping Stones logZ = {z_ss[0]} ± {z_ss[1]}')
print(f'Hybrid Method logZ = {z_hy[0]} ± {z_hy[1]}')
Thermodynamic Integration logZ = -899.996 ± 4.936 Stepping Stones logZ = -1002.487 ± 1.108 Hybrid Method logZ = -999.584 ± 3.739
Remember to inspect if the temperatures have settled, and discard the part that hasn't!
Note: In this scenario, we need longer chains for a reliable evidence estimate. As an exercise we can see by eye that the temperatures have "stabilised" around 1000 steps, we can try droping those samples from the evidence estimation, and recalculate:
loc = sim.saveplace
Image(filename=f'{loc}/plots/betas/rates.png')
z_ti = sim.sampler.get_evidence_ti(discard=1000)
z_ss = sim.sampler.get_evidence_ss(discard=1000)
z_hy = sim.sampler.get_evidence_hybrid(discard=1000)
print(f'Thermodynamic Integration logZ = {z_ti[0]:.3f} ± {z_ti[1]:.3f}')
print(f'Stepping Stones logZ = {z_ss[0]:.3f} ± {z_ss[1]:.3f}')
print(f'Hybrid Method logZ = {z_hy[0]:.3f} ± {z_hy[1]:.3f}')
Thermodynamic Integration logZ = -894.893 ± 3.226 Stepping Stones logZ = -895.510 ± 0.130 Hybrid Method logZ = -895.667 ± 1.703
Which look fairly similar between them!
Now we can explore the results in the /datalogs/51Peg/run_1/ folder:
Image(filename=f'{loc}/plots/models/KeplerianBlock_1.png')