- Compiling & Running
**./input/MCMC_inputs/input_Henriques15_mcmc_MR_W1_PLANCK.par**: mcmc sampling for the standard Henriques2015 model using a sample of trees from the Millennium simulation.**./input/MCMC_inputs/input_Henriques15_mcmc_MR_plus_MRII_W1_PLANCK.par**: mcmc sampling for the standard Henriques2015 model using a sample of trees from the Millennium and Millennium II simulations. This allows the model to be constrained over a much broader dynamical range.- Quick Guide to include more Parameters or Observational Constraints:
- Makefile & Input Options
**MCMCParameterPriorsAndSwitches**: This file list the parameters sampled in the MCMC. It contains priors and a switch for each parameter.**MCMCStartingParFile**: File containing starting values for all parameters to be sampled. Parameters are only read from this file**if the output from previous runs is not available at ./output/senna_g*_**.txt**. The number of values in this file must correspond to the numbers of parameters with switch=1.
If there is a previous output available at ./output/senna_g*_**.txt
the starting values are read from these files and set to the last available step.
Therefore, when re-starting in mcmc mode, the code
continues from where it stopped (**MCMCObsConstraints**: file containing the names of observational constraints to be used.
The first number in the file is the maximum number of observational
constraints available (it must be the same as MCMCNConstraints in
mcmc_vars.h).
Then, there is the number of redshifts for which constraints are
available (**MCMCWeightsObsConstraints**: file specifying a weight for each observational constraint at each redshift.**ObsConstraintsDir**: folder containing the data for the observational constraints to be used.**ChainLength**: number of steps for each chain.**Time_Dependent_PhysPar**: switch to allow parameters values to vary between desired output redshifts.**MCMCMode**: 0 for normal MCMC, 1 to only accept parameters sets that give higher likelihood and find the maximum quicker (likely to get stuck in a local maximum if a small number of chains is used).**MCMC_LogStep_Size**: size of the log_normal step. Regulate in order to get a final acceptance rate between 10-40%.**MCMC_Initial_Par_DisplacementMCMC_LogStep_Size**: initial displacement to be applied to all parameters in log_normal space. Ideally large to start each chain in very different places. In practice it cannot be too large as it will cause a lot of the chains to wonder to zero likelihood regions.**MCMC_Minimum_Obs_Error**: force a minimum error on observational constraints (a value of 10% is probably a good idea).**MCMCSampleDir**: directory for sample file**MCMCSampleFilePrefix**: prefix of sample file**MCMCSampleFile**: number of trees in sample file**MCMCTreeSampleFile**: name of file containing the representative sample of dark matter merger trees.- Analysing the Output
- MCMC in depth
**mcmc.c**: file containing most of the functions that control the mcmc. These include a main function called senna() that controls the sampling and calls the galaxy formation model, sam(), and analyses the returned value of likelihood. It also contains functions to read information about the observational constrains and the dark matter sample of trees and to propose new values for the parameters at each step.**mcmc_likelihood.c**: file containing the functions binning model galaxies and performing tests to compare the model with the different observational constraints.**mcmc_save.c**: file containing the function that passes the properties needed to compare the model with observations from the structure inside the galaxy formation model to the structure used by the MCMC: MCMC_GAL (doing the necessary conversion of units).**At the moment the MCMC works with units that include \(h\) factors**.**mcmc_vars.h**: all variables related to the mcmc are defined here, including the galaxy structure used by the mcmc: MCMC_GAL.- Halo Model
**mcmc_halomodel.c**: Contains all the functions of the clustering estimator, including those used to calculate the HOD, satellite profile and resulting clustering functions.**mcmc_halomodel.h**: Contains definitions for all functions in mcmc_halomodel.c.**mcmc_mpfit.c**: MPFIT routine, based on MINPACK-1, taken directly from Craig Markwardt's page.**mcmc_mpfit.h**: The header for MPFIT, also taken directly from the above page.

In order to perform the MCMC sampling the code should be compiled
using My_Makefile_options_MCMC. To do so, in Makefile, replace
~~"My_Makefile_options"~~
with: "My_Makefile_options_MCMC"
in lines 23 and 40 of Makefile. Then simply run the executable file
generated using an mcmc input.par file as a run time argument.
MCMC input files have "mcmc" in the name and follow the same naming
convention as other input files.

Two input files are currently available:

**There are also bash scripts available to run the code in parallel,
in ./AuxCode/Run/, e.g.: cosma.bash**. After the burn out phase it is
equivalent to have a single MCMC chain with a lot of steps, or a lot of
chains with a few steps. Since the galaxy formation model takes a
relatively long time to run (from 5 to 10 minutes) it is better to run
smaller chains in multiple cores. Typically, one can get convergence
after 2 days on 100 cores for 10 parameters.

A fundamental difference between running the code in 'MCMC' or
'normal' mode is the output produced. In 'normal mode', the code
runs once in each dark matter sub-volume and one galaxy catalogue is
produced per sub-volume. In 'MCMC mode', the code runs in a
representative set of dark matter merger trees, for a large number
of times, and the output is a single file (per processor), with a
list of parameter values and a corresponding likelihood. If the code
is **not** run in parallel, a file is written in the output
directory with the comparison between the model and each
observational constraint. This can be used for debugging.

If you are just starting to use the MCMC you'll likely want to try to constrain different parameters in the model with different observational constrains. It is recommended that you start with what is currently available and from which you can easily choose by using the parameters switches at ./input/MCMC_inputs/MCMCParameterPriorsAndSwitches.txt, the observational constraints switches at ./input/MCMC_inputs/MCMCObsConstraints.txt and the observational constraints weights at ./input/MCMC_inputs/MCMCWeightsObsConstraints.txt. All these are described in the next section and allow you to select any combination of parameters to be sampled, and any combination of pre-defined constraints, with different weights, to be used at any redshifts.

Makefile options for the MCMC are simple. Once the Makefile
has been changed to compile using My_Makefile_options_MCMC that
has **OPT += -DMCMC** on. That, in combination with the right input
file, switches the code into MCMC mode. In addition, there is the
option to use a sample both with Millennium and MillenniumII merger
trees: **OPT += -DMR_PLUS_MRII** (which requires the appropriate
input file).

When OPT += -DMCMC is selected an additional OPT += -DHALOMODEL option is available. This will compute satellite profiles using a halo model, allowing the correlation function of galaxies to be accurately calculated from a very small subset of dark matter haloes. This makes it possible to use clustering measurements as observational constraints in the MCMC sampling. Further details are given in section Halo Model below.

A large number of additional inputs are needed in MCMC mode.

**A parameter is only used in the sampling if the switch is turned to 1**. For parameters with switch=1, the values are not read from input_mcmc_***.par.

**MCMC_Initial_Par_DisplacementMCMC_LogStep_Size=0**when re-starting).

**must be smaller than NOUT**) and the values for those redshifts (

**these must correspond to the outputs of the galaxy formation model listed in ./input/MCMC_inputs/desired_output_redshifts_mcmc.txt**). For each observational constraint a type of test is listed, in addition to a switch for each redshift available.

**There must a file in this directory corresponding to each constraint and redshift listed in MCMCObsConstraints, e.g. StellarMassFunction_z0.10.txt**. The files contain binned values in standard format:

**x_bin_low, x_bin_high, y_value, err_y_value**, with the first line in each file containing the number of bins.

In MCMC mode, in order to run the galaxy formation model in a reasonable time, a representative set of dark matter merger trees is used instead of the full volume of the N-body simulations. Therefore, there is a dark matter tree file containing the set of pre-selected trees (see Appendix B of Henriques et al., 2013, MNRAS, 431, 3373) and an MCMCSampleFile containing the IDs and weights of the FOF groups to be selected from those trees.

Name of sample file containing IDs and weights: 'MCMCSampleFilePrefix'_sample_allz_nh_'MCMCSampleFile'snapnum.txt
For each core, an output file is written into the output directory:
**senna_gNN_ii.txt**. Where NN is the last character of the route
directory of the model and ii is the core number. In each of this
files, one line will be printed per MCMC step containing:
\(weight (\rm{normally}=1), -log_{10}(\rm{likelihood}), log_{10}(P1),
log_{10}(P2), ..., log_{10}(Pn)\)
.

The output can be analysed using the public COSMO_MC package includes contains a number of very useful tests. Alternatively, mcmc_read_chains.pro has simple idl scripts to produce plots of 1D marginalised regions and of some parameter efficiencies.

**"sort -r -nk2 ./output/senna*" will sort all the output steps of
the MCMC, giving the parameters for the highest likelihood value last.**

If you would like to include additional parameters into the sampling
that is relatively straightforward. If the parameter was not present
before in the semi-analytic model you will have to include it in the
**input_***.par** file and read it in in **read_parameters.c**. In addition,
you will have to pass the parameter value from the MCMC to the semi-analytic model in function
**void read_mcmc_par (int snapnum)**. Finally, you will have to
include the new parameter
in the input file **./input/MCMC_inputs/MCMCParameterPriorsAndSwitches.txt** and increase
the value for the number of parameters at the top.

Including a new observational constraint is slightly more complex
and basically involves ** changes in the input files, mcmc_likelihood() and
mcmc_save.c()**. The first
thing to do is to increase the number of observational constraints
defined in **mcmc_vars.h**: "#define MCMCNConstraints" and at the top of
**./input/MCMC_inputs/MCMCObsConstraints.txt** and
**./input/MCMC_inputs/MCMCWeightsObsConstraints.txt**. Then, the new observational
constraint(s) must be included in these two input files. A type of
statistical test needs to be specified together with switches and
weights for all output redshifts. Then a file with the same name
specified in these files must be added to the directory
./ObsConstraints/ for all the redshifts to be used.

After the inputs, mcmc_save.c() and mcmc_likelihood() must be
changed. If a property not used before by the mcmc is needed, this must
be saved in **mcmc_save.c** (section
mcmc_save below).
Then, a method of binning the theoretical predictions and a
statistical test must be included in **mcmc_likelihood()**.
Section mcmc_likelihood
below explains how to use the functions already present in the
code and how to include new ones.

The code related to the MCMC sampling is divided into 4 main files:

In addition, mcmc_proto.h contains the headers for all the functions used in the MCMC.

mcmc.c contains most functions used by the MCMC sampling
including the 'main' function for the method: **Senna()**.
In MCMC mode, Senna() is the function controlling the entire
code (instead of main.c used in normal mode).

The first function to be called inside Senna() is
**initialize_mcmc_par_and_lhood()**, which reads the initial values
for all the parameters and a likelihood either from a previous output (in
./output/senna_g*_**.txt) or from MCMCStartingParFile. Next,
**read_observations()** reads all the observational
constraints into memory. initialize_mcmc_par_and_lhood() and
read_observations() are read only once at the
beginning. **read_sample_info()**
reads information about the IDs and weights of the pre selected sample
of FOF groups. In principle this could be read only once and that is
in fact the case if only one simulation is used. If more than one
simulation is used, currently if OPT += -DMR_PLUS_MRII, the function
must be called twice at each MCMC step to switch between simulations.

After the initialization has been done, the MCMC sampling starts. In a
for loop that goes from 1 to ChainLength the semi-analytic model is
run at each step, with galaxy properties being computed for a given
set of parameters (passed from the mcmc to the semi-analytic model at
**read_mcmc_par()**), and **mcmc_likelihood()** being called in the end to
return a joint likelihood for the model at the current step.

If the likelihood for the model at the current step is higher than
at the previous step, the current set of parameters is
accepted. If the likelihood is lower than in the previous step,
the new set of parameter might still be accepted, with a probability
given by the ratio of the two likelihoods. In any new case, a new set
of parameters is proposed in **propose_new_parameters()** and
passed into the semi-analytic model to re-start the process.

mcmc_likelihood.c is the place where theoretical predictions are binned like observational functions and then compared with them by means of a statistical test that returns a likelihood. There are switches for the different observational constraints and the different redshifts that define if a given test is performed or not.

There are currently three tests available: \(\chi^2\), maximum
likelihood and binomial **(although the binomial likelihood is not
correctly calculated yet)**. You can easily include more tests in
this function by looking at how the current ones are implemented.

If you want to include additional constraints using the tests
already provided you just need to bin the property you want to
compare with observations in the correct way. If you want to compare
a luminosity or mass function of a 'simple property', then you
just need to make sure that property is saved in mcmc_save.c and
use bin_function and the chi_square test. **However in
many cases you will need to include your own bin function**. A
detailed description of the tests can be found in
Henriques et al., 2013, MNRAS, 431, 3373). For
example, in the example already included for the stellar mass
function of red and blue galaxies, it is necessary to divide
galaxies into a red and blue population according to some criteria
before binning.

This file is where the code checks if the current galaxy is in an
FOF group that was pre-selected (the tree file contains all the FOFs
in a tree, a bush). If that is the case galaxy properties are passed
from HaloGal to MCMC_GAL. **If you need additional properties**, that
are computed by the galaxy formation model but not listed here, to
constrain the model with additional observations **they must be
passed between structures here**. Additional h factors are included here.
MCMC_GAL has galaxy properties for all the galaxies at all the
output redshifts. The maximum number of galaxies much
be pre-defined (MCMCAllocFactor) as it varies from redshift to redshift.

This file contains the structure MCMC_GALAXY with all the galaxy
properties available to the mcmc_likelihood() to compare with
observational constraints. **If you need additional properties**, that
are computed by the galaxy formation model but not listed here, to
constrain the model with additional observations **they must be
included in this structure**.

When OPT += -DHALOMODEL is switched on, HODs and satellite profiles are computed and used as input to a halo model, allowing the correlation function of all galaxies to be accurately estimated from a very small subset of dark matter haloes. This makes it possible to use clustering measurements as observational constraints in the MCMC sampling. The method is described in van Daalen et al., 2016.

Besides MCMC mode, the halo model requires an additional line
in the input file for
**MCMCHaloModelDir**, which for the public version of
the code should point to
./MCMC/HaloModel (see
e.g. ./input/MCMC_inputs/input_Guo13_mcmc_halomodel_MR_W1_W1.par).
This path contains the input power spectrum
(powrealized_rebin_corrected.dat), the FoF mass function
for M200mean masses (fofnum_m200mean.dat), and
corrections to the power spectrum to account for halo
triaxiality and alignments (ellip_corr.dat and
align_corr.dat).

The provided FoF mass function and input power spectrum are based on WMAP1, and are scaled with cosmology inside the code. If your run is not based on the WMAP1 Millennium simulation, these files should be replaced with your own. The corrections for halo shape and alignment are assumed to be independent of cosmology.

As implemented here, the clustering estimator is not intended
to be used in combination with
Millennium II (MRII). To use it in accordance with such a
higher-resolution simulation,
the FoF mass function should be updated, the number of
massbins should be increased (up from massbins=65),
and the minimum FoF mass considered should be decreased
(down from minfofmass=9.5). The latter
two parameters can be found in **mcmc_vars.h**. The
minimum FoF mass should be such that
the number of FoF groups in the first bin is zero.

When using the clustering estimator, we advise also setting OPT += -DPROJLIMITS, which calculates the projected correlation function in a way that is more consistent with observations. Additional options are OPT += -DOUTPUTPOW, OPT += -DOUTPUTCORR and OPT += -DOUTPUTPROJ, to output the sample-estimated power spectra, correlation functions and projected correlation functions, respectively.

Finally, note that the clustering estimator assumes redshift zero. If the observational clustering constraints used are at higher redshift, the FoF mass function at that redshift should be provided, and a growth factor correction for the cosmological parameter Sigma8 should be included. Additional adjustments (such as changes to the halo shape correction) may also prove necessary.

Files used by HALOMODEL:

The first function called is **void
initialize_halomodel()**, which reads the input power
spectrum,
the FoF mass function and the halo shape and alignments
corrections, and spline-interpolates
all of these. This function is called only once.

At each step of the MCMC chain, calls to the estimator's main
function,
**void halomodel(double* r,double* proj,float
masslimit_low,float masslimit_high,int snapnum)**, are
made. After each call, *proj* will contain the
estimated projected correlation function at
radii *r* (in units of Mpc) for galaxies with
stellar masses between *masslimit_low*
and *masslimit_high*
(in units of M_{☉}).
This function first calls **void init_numgal(float
masslimit_low,float masslimit_high,int snapnum)**
to calculate both the HOD and satellite profiles based
on the galaxies in the sample haloes. Both are
calculated in bins of halo mass. The HOD is
spline-fitted, while the satellite profile is fit to with
a function
defined
in
van Daalen et al., 2016 (equation 12).

Next, for each wavenumber *k* the 1-halo and 2-halo terms
of the
galaxy power spectrum are calculated, using calls
to **double Mcensat(double k,int haloterm,int censat)**
which in turns uses the HOD and satellite profiles of the
galaxies (considering central and satellite
galaxies separately). The resulting power spectrum is
corrected for halo shape and alignments, and then
given an additional correction for halo exclusion.

Finally, the galaxy power spectrum is spline-fitted and convolved to produce the galaxy correlation function, which is then convolved to produce the projected galaxy correlation function.