Configuration - FIREFLY

Configuring FIREFLY for Spectral Fitting

Before launching into full spectral fitting with FIREFLY, it's essential to prepare your input data and define your fitting configuration. This guide walks you through the different settings FIREFLY has available, how to define your desired model parameters, and construct the appropriate input FITS files that FIREFLY requires. If you are interested in how to add your own/new models to FIREFLY, see Adding Your Own Stellar Population Models to FIREFLY.

Configuration Options

FIREFLY offers multiple degrees of freedom in modeling:

Model Grids: MaStar or m11

# choose model: 'm11','m11-sg','MaStar'
model_key = 'MaStar'

Model Libraries (flavour): e.g. gold for MaStar

# m11: 'MILES', 'STELIB', 'ELODIE', 'MARCS'
# MaStar: 'gold'
model_lib = ['gold']

Initial Mass Functions (IMF): Kroupa or Salpeter

# choose IMF: 'kr' (Kroupa), 'ss' (Salpeter)
imfs = ['kr']

Dust Parameters: max E(B-V), number of values, smoothing length

# set parameters for dust determination: 'on', 'hpf_only' (i.e. E(B-V)=0)
hpf_mode = 'on' 

# Only change the following parameters, if you know what you are doing.
max_ebv = 1.5                   
num_dust_vals = 200             
dust_smoothing_length = 200 
max_iterations = 10
pdf_sampling = 300

Dust Law: Calzetti (recommended), Allen or Prevot

# 'calzetti', 'allen', 'prevot'
dust_law = 'calzetti'

Milky Way Reddening: Correct for Milky Way reddening

# set whether to correct for Milky Way reddening
milky_way_reddening=True

Age and Metallicity Limits: Use AoU for age of the universe or set manually

# minimum age and metallicity of models to be used 
# choose age in Gyr or 'AoU' for the age of the Universe
age_limits = [0,'AoU']
Z_limits = [-3.,3.]

Emission Line Masking: Customise masking options for fitting

# set to value>0 for masking (20 recommended), otherwise 0
N_angstrom_masked=0 
# set emission lines to be masked, comment-out lines that should not be masked
emlines = [
		'He-II',    # 'He-II:  3202.15A, 4685.74'
		'Ne-V',     # 'Ne-V:   3345.81, 3425.81'
		'O-II',     # 'O-II:   3726.03, 3728.73'
		'Ne-III',   # 'Ne-III: 3868.69, 3967.40'
		'H-ζ',      # 'H-ζ:     3889.05'
		'H-ε',      # 'H-ε:     3970.07'
		'H-δ',      # 'H-δ:     4101.73'
		'H-γ',      # 'H-γ:     4340.46'
		'O-III',    # 'O-III:  4363.15, 4958.83, 5006.77'
		'Ar-IV',    # 'Ar-IV:  4711.30, 4740.10'
		'H-β',      # 'H-β:     4861.32'
		'N-I',      # 'H-I:    5197.90, 5200.39'
		'He-I',     # 'He-I:   5875.60'
		'O-I',      # 'O-I:    6300.20, 6363.67'
		'N-II',     # 'N-II:   6547.96, 6583.34'
		'H-α',      # 'H-α:     6562.80'
		'S-II',     # 'S-II:   6716.31, 6730.68'
		'Ar-III',   # 'Ar-III: 7135.67'
				]

Resolution: Specify the instrument resolution of the survey. e.g. R=5000 for DESI

r_instrument = np.zeros(len(wavelength))
r_instrument.fill(5000)

Wavelength Fitting Options: Specify fitting options for wavelength calibration

# specify data medium: 'air', 'vaccum'
data_wave_medium = 'vacuum'
# specify whether you would like to fit in air or vacuum wavelengths
fit_wave_medium='vacuum'

Flux Units: Firefly assumes flux units of erg/s/A/cm^2

# choose factor in case flux is scaled
# (e.g. flux_units=10**(-17) for MaNGA)
flux_units=10**(-17)

Input Data Requirements

Spectrum arrays: WAVE, FLUX, IVAR
Redshift: required for cosmological age estimation
Metadata: ID, RA, DEC, SNR, VDISP

# Get spectrum data
wavelength = hdul[1].data['WAVE'][i]
flux = hdul[1].data['FLUX'][i]
error = hdul[1].data['IVAR'][i]
redshift = hdul[1].data['REDSHIFT'][i]
ra = hdul[1].data['RA'][i]
dec = hdul[1].data['DEC'][i]
vdisp = hdul[1].data['VDISP'][i]
snr = hdul[1].data['SNR'][i]
id = hdul[1].data['ID'][i]

Input FITS File Structure

Most FIREFLY launch pipelines expect a multi-spectrum FITS file with the following structure:

Primary HDU: contains global metadata (e.g. model key, redshift, SNR)
Binary Table HDU: per-spectrum data arrays (FLUX, WAVE, IVAR)

Each spectrum must be self-contained with associated metadata embedded either in the header or the table structure.

Creating FITS Files for FIREFLY

The repository presents three different methods you can use to generate the input FITS files for FIREFLY (Note: these examples are for DESI):

Option 1: FIREFLY 'All-in-one' launch scripts on NERSC

The firefly(AIO)_DESI_DR1.py and firefly(AIO)_DESI_EDR.py scripts offer the quickest and most efficent way to fit DESI spectra. Because NERSC already hosts all DESI data releases, these scripts can read everything directly from the shared filesystem and fit the spectra straight away. If you have a NERSC account, this method handles all data retrieval internally, letting you skip the fits file creation needed in other approaches.

On NERSC, launch these runs through SLURM using the provided SBATCH_Fuji.sh (DESI-EDR) and SBATCH_Iron.sh (DESI-DR1) batch scripts. The only part that usually needs editing in the sbatch files is the desired galaxy index range you want to fit and the correct path to the firefly(AIO) script:

START_INDEX=2600000
END_INDEX=2700000
SCRIPT_PATH="/global/cfs/cdirs/desi/users/helpss/FIREFLY/firefly/Launch/DESI/NERSC/run_scripts/SBATCH_Iron.sh"

Option 2: NERSC → FIREFLY File Builder (for SCIAMA or other HPCs)

If you want to run FIREFLY on another HPC system (e.g. SCIAMA) but still take advantage of NERSC's fast access to DESI spectra, the NERSC_fits_create.py script provides the ideal workflow. It retrieves all required DESI spectral data on NERSC, merges the B/R/Z arms, attaches the required metadata and FastSpecFit values, and outputs fully FIREFLY-ready FITS files that can be copied to any external machine for fitting.

To choose which galaxies to export, edit the index range at the top of the script:

START_SPECTRUM = 5000
END_SPECTRUM   = 10000

These indices correspond to rows in the DESI EDR zall-pix-fuji.fits catalog. (Note: this method was used to fit over a million galaxies on SCIAMA which utilised blocks of 5000 spectra per fits file for speed and memory optismisation on the HPC).

Running the script on NERSC

Once the desired galaxy index range is set, simply run the script via a terminal on NERSC:

python NERSC_fits_create.py

This will create an output file in your NERSC directory:

Data/DESI/DESI_EDR_data/DESI_EDR_5000-10000.fits

(Important: Ensure that all the file input and output paths are corrected for your placement the scirpt in your NERSC directory or within the cloned FIREFLY repoistory uploaded to NERSC. Remember that all file paths on NERSC require an extra / before global.)

What this script does

Reads the EDR zall-pix-fuji.fits and FastSpecFit catalogs
Selects galaxies using optional filters (SPECTYPE, redshift, subsurvey, etc.)
Retrieves each matching coadd file directly from the NERSC DESI data tree
Merges B/R/Z wavelength, flux and ivar into single arrays
Computes velocity dispersion (FastSpecFit or resolution-based)
Computes a median SNR per spectrum
Writes FIREFLY-compatible FITS files with columns:

ID, WAVE, FLUX, IVAR
REDSHIFT, RA, DEC
VDISP, SNR, SURVEY_TYPE

Option 3: Manual Download FITS Builder (DESI)

If you do not have access to NERSC you can still manually download galaxy spectra from the public DESI data directory or run the Local_fits_create.py script on a normal machine to generate FIREFLY-compatible FITS files.

Step 1: Edit the top of Local_fits_create.py to set the range of galaxies you want to process

START_SPECTRUM = 0      # Starting index in the DESI galaxy catalog
END_SPECTRUM = 5000     # Ending index (non-inclusive)

START_SPECTRUM = 5000
END_SPECTRUM = 10000

Step 2: Run the Script

python Local_fits_create.py

Data/DESI/DESI_EDR_data/DESI_EDR_5000-10000.fits

What This Script Does
- Downloads required DESI EDR catalogs (zall-fuji.fits and fastspec-fuji.fits) if not already cached
- Filters for galaxies and selects the index range you specify
- For each galaxy:
- Writes all this data into a single multi-extension FITS file

(Note: Although all the online files should remain the same, the output paths and directory structure set in this script may need to be altered to the desired destination on your machine. This method downloads large numbers of spectra and therefore is heavily dependent on your internet connection speed and is may be slow.)

Adding Your Own Stellar Population Models to FIREFLY

This guide explains the format FIREFLY expects for stellar population models, where to place them inside the repository, and two simple ways to plug new models into firefly_models.py. All examples and requirements below are taken from the existing firefly_models.py script in the fitting engine as it is recommend that this is script you alter to implement compatible models with minimal editing.

Where to put your models

The models directory used by the code is defined at the top of firefly_models.py:

MODELS_DIR = join(dirname(__file__), 'stellar_population_models')

So upload your model files under the repository path:

firefly/Fitting_Engine/stellar_population_models/

Follow the existing folder conventions:

SSP_M11_/ (e.g. SSP_M11_MILES/) - ASCII SSP files for the M11-type readers
SSP_M11__SG/ - special .sg flavour read by the m11-sg branch
MaStar_SSP_v1.1.fits.gz —-single FITS file used by the MaStar branch
EMILES_SSP/ - E-MILES FITS files

Format & structure FIREFLY expects (by model type)

M11 (ASCII SSP grids):
Reader uses pandas.read_table(..., usecols=[0,2,3], names=['Age','wavelength_model','flux_model'], delim_whitespace=True). Requiring each SSP file to be a whitespace-delimited table with at least these columns in that order:
1. Age - the age label used to group rows (the code does model_table.loc[model_table.Age == a, ['wavelength_model','flux_model']])
2. wavelength - wavelength column (Angstroms as used in the files)
3. flux - flux values corresponding to each wavelength point
Files are discovered via the following convention:
```
model_path = join(MODELS_DIR, 'SSP_M11_'+model_used, 'ssp_M11_' +model_used +'.' + imf_used)
```
The existing code then glob()s model_path + '*' and recognises metallicity by filename tokens. Allowed/recognized metallicity tokens in the code include (examples):
```
z001 z002 z004 z0001.bhb z0001.rhb z10m4 z-0.6 z-0.9 z-1.2 z-1.6 z-1.9
```
(For m11-sg the same tokens but filenames end with .sg, e.g. z001.sg.)
MaStar (single FITS grid):
The MaStar code expects a single FITS archive named MaStar_SSP_v1.1.fits.gz in the stellar_population_models folder. Internals used by the code include:
- hdul[1].data - contains parameter arrays (ages t, metallicities Z, slopes s)
- hdul[2].data[0,:] - the wavelength grid (wavelength_int)
- hdul[3].data - the flux 4D/3D array (called fluxgrid), indexed by age/metal/slope
The code selects IMF by slope (IMF codes: 'kr' → slope 1.3, 'ss' → slope 2.35) then reads the flux slice fluxgrid[ii,jj,sidx,:].
E-MILES (FITS per SSP):
E-MILES reading code searches for files like join(MODELS_DIR,'EMILES_SSP','Eku1.30') and for each matched file it:
- Opens the file with pyfits.open(i) and takes hdul[0].data as the flux array
- Constructs the wavelength array as np.arange(1680, 50000, 0.9)
- Derives age from the filename slice used in the existing code, so keep consistent file naming (the code extracts age from a substring of the file path: it assumes a fixed token layout).

Important processing behaviour (so your models are compatible)

Wavelength medium: the code converts between air/vacuum using airtovac() / vactoair(). Ensure you know whether your model wavelengths are air or vacuum and set data_wave_medium / fit_wave_medium appropriately when implementing new stellar population models into FIREFLY.
Downgrading: if you want FIREFLY to match your model resolution to the instrument, the code calls downgrade(wavelength,flux,deltal,self.specObs.vdisp, wave_instrument, r_instrument) when self.downgrade_models is True. Provide the model spectral resolution as deltal (for MaStar the code reads an R array from the MaStar FITS), for M11/E-MILES the code sets a scalar deltal.
Reddening: FIREFLY will apply Milky Way E(B-V) correction with unred() if ebv_mw ≠ 0.
Age and Z limits: FIREFLY filters models by self.age_limits and self.Z_limits so ensure your SSP ages and metallicities fall in the ranges you intend to fit.

Two simple ways to make FIREFLY use your models

Option A - Minimal: Place files in an existing model format & choose them at runtime
If you can package your models to match one of the existing readers above (ASCII M11-style, MaStar FITS grid or E-MILES FITS), then upload your files into firefly/Fitting_Engine/stellar_population_models/ following the folder/name patterns used above. Example for M11-style:
```
firefly/Fitting_Engine/stellar_population_models/SSP_M11_MyLib/ssp_M11_MyLib.kr firefly/Fitting_Engine/stellar_population_models/SSP_M11_MyLib/ssp_M11_MyLib.ss # name metallicity files using the tokens used by the parser, e.g. z001 z002 ...
```
Then when you construct the model runner set models and model_libs appropriately:
```
sp = StellarPopulationModel(specObs, outputFile, cosmo, models='m11', model_libs=['MyLib'], imfs=['kr'])
```
FIREFLY will follow the existing m11 branch to locate files, read ages/wavelengths/fluxes and proceed without changing firefly_models.py.

Option B: Add a new loader function in firefly_models.py (recommended for novel formats)

If your model format is different (e.g. a set of HDF5 files, a different FITS layout, or you want special metadata reading), add a dedicated branch in get_model() or, better, add a helper method and call it. Below is a minimal template you can paste into firefly_models.py (place it just below the other elif self.models == '...' branches).

# --- Example: custom model loader (paste into firefly_models.py) ---
def load_custom_models(self, model_used, imf_used, deltal, vdisp,
                       wave_instrument, r_instrument, ebv_mw):
    """
    Purpose:
        Read your custom SSP/model files from:
            /Fitting_Engine/stellar_population_models/SSP_CUSTOM_/
        and return exactly the 4 objects the rest of FIREFLY expects:
            wavelength, model_flux_list, age_list, metal_list

    Required return values:
        wavelength     -> 1D numpy array (model wavelength grid)
        model_flux     -> list (or array) of 1D numpy arrays, each an SSP spectrum
        age_list       -> list of ages (one per SSP)
        metal_list     -> list of metallicities (one per SSP)

    Notes / placeholders:
      - Replace / extend the file-reading branches below to match your file format
        (FITS, HDF5, ASCII, numpy .npy, etc).
      - `model_used` is intended to map to a subfolder name after 'SSP_CUSTOM_'.
      - Keep variable names and behaviour consistent with the rest of firefly_models.py:
          * use `self.data_wave_medium` to decide air/vacuum conversion (airtovac / vactoair)
          * use `self.downgrade_models` and the provided `downgrade()` helper if needed
          * apply MW attenuation via `unred(wavelength, ebv=0.0 - ebv_mw)` if ebv_mw != 0
    """
    model_dir = join(MODELS_DIR, 'SSP_CUSTOM_' + model_used)   # e.g. stellar_population_models/SSP_CUSTOM_MyLib
    files = sorted(glob.glob(join(model_dir, '*')))
    if len(files) == 0:
        raise FileNotFoundError(f"No files found in {model_dir}. Create the folder and add your model files.")

    model_flux = []
    age_model = []
    metal_model = []
    wavelength = None   # will be set from first file we successfully read

    for fpath in files:
        try:
            # ---------------------------
            # 1) Example: FITS-based file
            # ---------------------------
            if fpath.lower().endswith(('.fits', '.fits.gz', '.fz')):
                hdul = pyfits.open(fpath)
                # common possible field names:
                hdr = hdul[0].header
                # Attempt to read a wavelength array, adapting to your FITS structure:
                try:
                    # many SSP FITS store 1D flux in primary and wavelength implicit or in extension 1
                    flux = hdul[0].data.copy()
                    # attempt common wavelength carriers (customise to your files)
                    if 'WAVE' in hdul[1].data.names:
                        wave_int = hdul[1].data['WAVE']
                    elif 'wavelength' in hdul[1].data.names:
                        wave_int = hdul[1].data['wavelength']
                    else:
                        # fallback: if file does not have explicit wavelength, assume a uniform grid header entry:
                        if 'CRVAL1' in hdr and 'CDELT1' in hdr and flux is not None:
                            crval = hdr['CRVAL1']
                            cdelt = hdr['CDELT1']
                            wave_int = crval + cdelt * np.arange(len(flux))
                        else:
                            raise ValueError("Cannot locate wavelength in FITS file - adapt loader to your format.")
                except Exception:
                    # If extension structure differs, adapt here:
                    hdul.close()
                    raise

                # Example: metadata from header (customise to your headers)
                # If your files include AGE and Z in header use these; otherwise parse filename.
                age = hdr.get('AGE', None)                # e.g. AGE = 1.0 (Gyr) or in whatever units you choose
                metal = hdr.get('Z', None)               # e.g. Z = 0.02 (linear metallicity)
                hdul.close()

            # ---------------------------
            # 2) Example: ASCII / text table
            # ---------------------------
            elif fpath.lower().endswith(('.txt', '.dat', '.asc', '.ascii')):
                # Example: whitespace-delimited with columns: wavelength flux [optional: age, z]
                # Use pandas.read_table as in m11 branch if your ASCII matches that style.
                df = pd.read_table(fpath, delim_whitespace=True, header=None)
                # ASSUMPTION: first column = wavelength, second column = flux
                wave_int = df.iloc[:, 0].values
                flux = df.iloc[:, 1].values
                # Optionally: expect age/Z encoded in filename like mylib_age1.0_z0.02.dat
                age = None
                metal = None

            # ---------------------------
            # 3) Example: numpy / hdf5 / other - placeholder
            # ---------------------------
            else:
                # Placeholder: add your custom reader here.
                # For instance, for .npy: arr = np.load(fpath); wave_int = arr[:,0]; flux = arr[:,1]
                raise ValueError("Unknown file extension. Add a reader for this format in the loader.")

            # If age/metal were not in header, try to parse from filename using a convention.
            # Example filename convention: MyLib_age1.0_z0.02.fits  -> extract numbers
            if age is None or metal is None:
                # naive filename parsing - edit regexp to suit your naming
                fname = os.path.basename(fpath)
                # try to extract age token "age" and z token "z" or "Zp0.02" style
                import re
                age_match = re.search(r'age[_\-]?([0-9\.]+)', fname, flags=re.IGNORECASE)
                z_match = re.search(r'(?:z|Z|_z|_Z)([_\-]?[0-9\.]+)', fname)
                if age is None and age_match:
                    age = float(age_match.group(1))
                if metal is None and z_match:
                    try:
                        metal = float(z_match.group(1))
                    except Exception:
                        metal = None
                # final fallback defaults (replace with strict parsing if needed)
                if age is None:
                    age = 1.0    # Gyr or the unit you choose - ensure consistency with FIREFLY's age use
                if metal is None:
                    metal = 0.02

            # Convert wavelength medium if needed (use same helpers as file)
            if self.data_wave_medium == 'vacuum':
                wavelength_local = airtovac(wave_int)
            else:
                wavelength_local = wave_int

            # Store wavelength from first file read (assumes same grid for all SSPs; if not, you must resample)
            if wavelength is None:
                wavelength = wavelength_local.copy()
            else:
                # if grids differ you may need to resample flux to the master wavelength array here
                # placeholder: check equality and raise/warn if different
                if not np.allclose(wavelength, wavelength_local):
                    # NOTE: if your SSPs use different wavelength grids you must resample (interp1d)
                    raise ValueError("Model wavelength grids differ between files. Resample to a common grid before adding, or implement resampling here.")

            # optionally downgrade model resolution to match instrument
            if self.downgrade_models:
                mf = downgrade(wavelength, flux, deltal, self.specObs.vdisp, wave_instrument, r_instrument)
            else:
                mf = copy.copy(flux)

            # Apply Milky Way reddening correction if provided
            if ebv_mw != 0:
                attenuations = unred(wavelength, ebv=0.0 - ebv_mw)
                mf = mf * attenuations

            #  append outputs
            model_flux.append(mf)
            age_model.append(age)
            metal_model.append(metal)

        except Exception as e:
            print(f"[load_custom_models] Skipping file {fpath} due to error: {e}")
            continue

    # final sanity checks
    if wavelength is None or len(model_flux) == 0:
        raise RuntimeError("No valid models were loaded. Check file readers and file formats.")

    # Set the same attributes other loaders set so the rest of the pipeline works unchanged (with the exception of naming of results)
    self.model_wavelength = wavelength
    self.model_flux = model_flux
    self.age_model = age_model
    self.metal_model = metal_model

    # return values exactly as expected by get_model() callers
    return wavelength, model_flux, age_model, metal_model

# --- In get_model(), add a branch that calls your loader:
# elif self.models == 'MyCustom':
#     return self.load_custom_models(model_used, imf_used, deltal, vdisp,
#                                    wave_instrument, r_instrument, ebv_mw)
#
# Usage notes:
#  - Create a folder: stellar_population_models/SSP_CUSTOM_MyCustom/
#  - Put your files there, e.g. SSP_CUSTOM_MyCustom/ssp_age1.0_z0.02.fits
#  - Call FIREFLY with models='MyCustom' and model_libs=['']
#  - If your SSPs do not share the same wavelength grid, implement a safe resampling step before appending.

This method keeps the high-level flow in get_model() unchanged while isolating your file-format specifics inside load_custom_models(). Just remember FIREFLY's fitting pipelines expect the loader to return exactly:

wavelength - 1D wavelength array
model_flux - list/array of flux arrays (one per SSP)
age - list of ages for each SSP
metal - list of metallicities for each SSP

Quick checklist before running

Your files are uploaded into firefly/Fitting_Engine/stellar_population_models/
If you used an existing reader format (m11 / MaStar / E-MILES) ensure filenames and tokens follow the parser's expectations (see tokens listed above)
Decide whether to set self.downgrade_models=True (recommended if your models are higher resolution than the instrument) so FIREFLY will call downgrade()
Ensure data_wave_medium and fit_wave_medium are set correctly (air vs vacuum)
If you added a new branch in get_model(), restart any running Python sessions so the modified module is reloaded