Configuring FIREFLY for Spectral Fitting
Before launching into full spectral fitting with FIREFLY, it's essential to prepare your input data and define your fitting configuration. This guide walks you through the different settings FIREFLY has available, how to define your desired model parameters, and construct the appropriate input FITS files that FIREFLY requires. If you are interested in how to add your own/new models to FIREFLY, see Adding Your Own Stellar Population Models to FIREFLY.
Input Data Requirements
- Spectrum arrays:
WAVE,FLUX,IVAR - Redshift: required for cosmological age estimation
- Metadata:
ID,RA,DEC,SNR,VDISP
# Get spectrum data
wavelength = hdul[1].data['WAVE'][i]
flux = hdul[1].data['FLUX'][i]
error = hdul[1].data['IVAR'][i]
redshift = hdul[1].data['REDSHIFT'][i]
ra = hdul[1].data['RA'][i]
dec = hdul[1].data['DEC'][i]
vdisp = hdul[1].data['VDISP'][i]
snr = hdul[1].data['SNR'][i]
id = hdul[1].data['ID'][i]
Input FITS File Structure
Most FIREFLY launch pipelines expect a multi-spectrum FITS file with the following structure:
- Primary HDU: contains global metadata (e.g. model key, redshift, SNR)
- Binary Table HDU: per-spectrum data arrays (
FLUX,WAVE,IVAR)
Each spectrum must be self-contained with associated metadata embedded either in the header or the table structure.
Creating FITS Files for FIREFLY
The repository presents three different methods you can use to generate the input FITS files for FIREFLY (Note: these examples are for DESI):
Option 1: FIREFLY 'All-in-one' launch scripts on NERSC
The firefly(AIO)_DESI_DR1.py and firefly(AIO)_DESI_EDR.py scripts offer the quickest and most efficent way to fit DESI spectra. Because NERSC already hosts all DESI data releases, these scripts can read everything directly from the shared filesystem and fit the spectra straight away. If you have a NERSC account, this method handles all data retrieval internally, letting you skip the fits file creation needed in other approaches.
On NERSC, launch these runs through SLURM using the provided SBATCH_Fuji.sh (DESI-EDR) and SBATCH_Iron.sh (DESI-DR1) batch scripts. The only part that usually needs editing in the sbatch files is the desired galaxy index range you want to fit and the correct path to the firefly(AIO) script:
START_INDEX=2600000
END_INDEX=2700000
SCRIPT_PATH="/global/cfs/cdirs/desi/users/helpss/FIREFLY/firefly/Launch/DESI/NERSC/run_scripts/SBATCH_Iron.sh"
Option 2: NERSC → FIREFLY File Builder (for SCIAMA or other HPCs)
If you want to run FIREFLY on another HPC system (e.g. SCIAMA) but still take advantage of NERSC's fast access to DESI spectra, the NERSC_fits_create.py script provides the ideal workflow. It retrieves all required DESI spectral data on NERSC, merges the B/R/Z arms, attaches the required metadata and FastSpecFit values, and outputs fully FIREFLY-ready FITS files that can be copied to any external machine for fitting.
To choose which galaxies to export, edit the index range at the top of the script:
START_SPECTRUM = 5000
END_SPECTRUM = 10000
These indices correspond to rows in the DESI EDR zall-pix-fuji.fits catalog. (Note: this method was used to fit over a million galaxies on SCIAMA which utilised blocks of 5000 spectra per fits file for speed and memory optismisation on the HPC).
Running the script on NERSC
Once the desired galaxy index range is set, simply run the script via a terminal on NERSC:
python NERSC_fits_create.py
This will create an output file in your NERSC directory:
Data/DESI/DESI_EDR_data/DESI_EDR_5000-10000.fits
(Important: Ensure that all the file input and output paths are corrected for your placement the scirpt in your NERSC directory or within the cloned FIREFLY repoistory uploaded to NERSC. Remember that all file paths on NERSC require an extra / before global.)
What this script does
- Reads the EDR
zall-pix-fuji.fitsand FastSpecFit catalogs - Selects galaxies using optional filters (SPECTYPE, redshift, subsurvey, etc.)
- Retrieves each matching
coaddfile directly from the NERSC DESI data tree - Merges B/R/Z wavelength, flux and ivar into single arrays
- Computes velocity dispersion (FastSpecFit or resolution-based)
- Computes a median SNR per spectrum
- Writes FIREFLY-compatible FITS files with columns:
ID,WAVE,FLUX,IVARREDSHIFT,RA,DECVDISP,SNR,SURVEY_TYPE
Option 3: Manual Download FITS Builder (DESI)
If you do not have access to NERSC you can still manually download galaxy spectra from the public DESI data directory or run the Local_fits_create.py script on a normal machine to generate FIREFLY-compatible FITS files.
- Step 1: Edit the top of
Local_fits_create.pyto set the range of galaxies you want to process
START_SPECTRUM = 0 # Starting index in the DESI galaxy catalog
END_SPECTRUM = 5000 # Ending index (non-inclusive)
To process the next 5000 galaxies, simply update the index values on the next run:
START_SPECTRUM = 5000
END_SPECTRUM = 10000
python Local_fits_create.py
This will create an output file named like:
Data/DESI/DESI_EDR_data/DESI_EDR_5000-10000.fits
- Downloads required DESI EDR catalogs (
zall-fuji.fitsandfastspec-fuji.fits) if not already cached - Filters for galaxies and selects the index range you specify
- For each galaxy:
- Fetches the matching
coaddspectrum from the public DESI archive - Merges the B/R/Z data arms for concatenated arrays of
FLUX,WAVEandIVAR - Computes velocity dispersion from FastSpec or spectral resolution
- Calculates median signal-to-noise (SNR), redshift, RA/Dec, and target type
- Writes all this data into a single multi-extension FITS file
(Note: Although all the online files should remain the same, the output paths and directory structure set in this script may need to be altered to the desired destination on your machine. This method downloads large numbers of spectra and therefore is heavily dependent on your internet connection speed and is may be slow.)
Adding Your Own Stellar Population Models to FIREFLY
This guide explains the format FIREFLY expects for stellar population models, where to place them inside the repository, and two simple ways to plug new models into firefly_models.py. All examples and requirements below are taken from the existing firefly_models.py script in the fitting engine as it is recommend that this is script you alter to implement compatible models with minimal editing.
Where to put your models
The models directory used by the code is defined at the top of firefly_models.py:
MODELS_DIR = join(dirname(__file__), 'stellar_population_models')
So upload your model files under the repository path:
firefly/Fitting_Engine/stellar_population_models/
Follow the existing folder conventions:
SSP_M11_(e.g./ SSP_M11_MILES/) - ASCII SSP files for the M11-type readersSSP_M11_- special .sg flavour read by the_SG/ m11-sgbranchMaStar_SSP_v1.1.fits.gz—-single FITS file used by theMaStarbranchEMILES_SSP/- E-MILES FITS files
Format & structure FIREFLY expects (by model type)
- M11 (ASCII SSP grids):
Reader uses
pandas.read_table(..., usecols=[0,2,3], names=['Age','wavelength_model','flux_model'], delim_whitespace=True). Requiring each SSP file to be a whitespace-delimited table with at least these columns in that order:- Age - the age label used to group rows (the code does
model_table.loc[model_table.Age == a, ['wavelength_model','flux_model']]) - wavelength - wavelength column (Angstroms as used in the files)
- flux - flux values corresponding to each wavelength point
Files are discovered via the following convention:
model_path = join(MODELS_DIR, 'SSP_M11_'+model_used, 'ssp_M11_' +model_used +'.' + imf_used)The existing code then glob()s
model_path + '*'and recognises metallicity by filename tokens. Allowed/recognized metallicity tokens in the code include (examples):z001 z002 z004 z0001.bhb z0001.rhb z10m4 z-0.6 z-0.9 z-1.2 z-1.6 z-1.9(For
m11-sgthe same tokens but filenames end with.sg, e.g.z001.sg.) - Age - the age label used to group rows (the code does
- MaStar (single FITS grid):
The MaStar code expects a single FITS archive named
MaStar_SSP_v1.1.fits.gzin thestellar_population_modelsfolder. Internals used by the code include:hdul[1].data- contains parameter arrays (agest, metallicitiesZ, slopess)hdul[2].data[0,:]- the wavelength grid (wavelength_int)hdul[3].data- the flux 4D/3D array (calledfluxgrid), indexed by age/metal/slope
The code selects IMF by slope (IMF codes:
'kr'→ slope 1.3,'ss'→ slope 2.35) then reads the flux slicefluxgrid[ii,jj,sidx,:]. - E-MILES (FITS per SSP):
E-MILES reading code searches for files like
join(MODELS_DIR,'EMILES_SSP','Eku1.30')and for each matched file it:- Opens the file with
pyfits.open(i)and takeshdul[0].dataas the flux array - Constructs the wavelength array as
np.arange(1680, 50000, 0.9) - Derives age from the filename slice used in the existing code, so keep consistent file naming (the code extracts age from a substring of the file path: it assumes a fixed token layout).
- Opens the file with
Important processing behaviour (so your models are compatible)
- Wavelength medium: the code converts between air/vacuum using
airtovac()/vactoair(). Ensure you know whether your model wavelengths are air or vacuum and setdata_wave_medium/fit_wave_mediumappropriately when implementing new stellar population models into FIREFLY. - Downgrading: if you want FIREFLY to match your model resolution to the instrument, the code calls
downgrade(wavelength,flux,deltal,self.specObs.vdisp, wave_instrument, r_instrument)whenself.downgrade_modelsis True. Provide the model spectral resolution asdeltal(for MaStar the code reads an R array from the MaStar FITS), for M11/E-MILES the code sets a scalardeltal. - Reddening: FIREFLY will apply Milky Way E(B-V) correction with
unred()ifebv_mw≠ 0. - Age and Z limits: FIREFLY filters models by
self.age_limitsandself.Z_limitsso ensure your SSP ages and metallicities fall in the ranges you intend to fit.
Two simple ways to make FIREFLY use your models
- Option A - Minimal: Place files in an existing model format & choose them at runtime
If you can package your models to match one of the existing readers above (ASCII M11-style, MaStar FITS grid or E-MILES FITS), then upload your files into
firefly/Fitting_Engine/stellar_population_models/following the folder/name patterns used above. Example for M11-style:firefly/Fitting_Engine/stellar_population_models/SSP_M11_MyLib/ssp_M11_MyLib.kr firefly/Fitting_Engine/stellar_population_models/SSP_M11_MyLib/ssp_M11_MyLib.ss # name metallicity files using the tokens used by the parser, e.g. z001 z002 ...Then when you construct the model runner set
modelsandmodel_libsappropriately:sp = StellarPopulationModel(specObs, outputFile, cosmo, models='m11', model_libs=['MyLib'], imfs=['kr'])FIREFLY will follow the existing
m11branch to locate files, read ages/wavelengths/fluxes and proceed without changingfirefly_models.py. - Option B: Add a new loader function in
firefly_models.py(recommended for novel formats)If your model format is different (e.g. a set of HDF5 files, a different FITS layout, or you want special metadata reading), add a dedicated branch in
get_model()or, better, add a helper method and call it. Below is a minimal template you can paste intofirefly_models.py(place it just below the otherelif self.models == '...'branches).# --- Example: custom model loader (paste into firefly_models.py) --- def load_custom_models(self, model_used, imf_used, deltal, vdisp, wave_instrument, r_instrument, ebv_mw): """ Purpose: Read your custom SSP/model files from:/Fitting_Engine/stellar_population_models/SSP_CUSTOM_ / and return exactly the 4 objects the rest of FIREFLY expects: wavelength, model_flux_list, age_list, metal_list Required return values: wavelength -> 1D numpy array (model wavelength grid) model_flux -> list (or array) of 1D numpy arrays, each an SSP spectrum age_list -> list of ages (one per SSP) metal_list -> list of metallicities (one per SSP) Notes / placeholders: - Replace / extend the file-reading branches below to match your file format (FITS, HDF5, ASCII, numpy .npy, etc). - `model_used` is intended to map to a subfolder name after 'SSP_CUSTOM_'. - Keep variable names and behaviour consistent with the rest of firefly_models.py: * use `self.data_wave_medium` to decide air/vacuum conversion (airtovac / vactoair) * use `self.downgrade_models` and the provided `downgrade()` helper if needed * apply MW attenuation via `unred(wavelength, ebv=0.0 - ebv_mw)` if ebv_mw != 0 """ model_dir = join(MODELS_DIR, 'SSP_CUSTOM_' + model_used) # e.g. stellar_population_models/SSP_CUSTOM_MyLib files = sorted(glob.glob(join(model_dir, '*'))) if len(files) == 0: raise FileNotFoundError(f"No files found in {model_dir}. Create the folder and add your model files.") model_flux = [] age_model = [] metal_model = [] wavelength = None # will be set from first file we successfully read for fpath in files: try: # --------------------------- # 1) Example: FITS-based file # --------------------------- if fpath.lower().endswith(('.fits', '.fits.gz', '.fz')): hdul = pyfits.open(fpath) # common possible field names: hdr = hdul[0].header # Attempt to read a wavelength array, adapting to your FITS structure: try: # many SSP FITS store 1D flux in primary and wavelength implicit or in extension 1 flux = hdul[0].data.copy() # attempt common wavelength carriers (customise to your files) if 'WAVE' in hdul[1].data.names: wave_int = hdul[1].data['WAVE'] elif 'wavelength' in hdul[1].data.names: wave_int = hdul[1].data['wavelength'] else: # fallback: if file does not have explicit wavelength, assume a uniform grid header entry: if 'CRVAL1' in hdr and 'CDELT1' in hdr and flux is not None: crval = hdr['CRVAL1'] cdelt = hdr['CDELT1'] wave_int = crval + cdelt * np.arange(len(flux)) else: raise ValueError("Cannot locate wavelength in FITS file - adapt loader to your format.") except Exception: # If extension structure differs, adapt here: hdul.close() raise # Example: metadata from header (customise to your headers) # If your files include AGE and Z in header use these; otherwise parse filename. age = hdr.get('AGE', None) # e.g. AGE = 1.0 (Gyr) or in whatever units you choose metal = hdr.get('Z', None) # e.g. Z = 0.02 (linear metallicity) hdul.close() # --------------------------- # 2) Example: ASCII / text table # --------------------------- elif fpath.lower().endswith(('.txt', '.dat', '.asc', '.ascii')): # Example: whitespace-delimited with columns: wavelength flux [optional: age, z] # Use pandas.read_table as in m11 branch if your ASCII matches that style. df = pd.read_table(fpath, delim_whitespace=True, header=None) # ASSUMPTION: first column = wavelength, second column = flux wave_int = df.iloc[:, 0].values flux = df.iloc[:, 1].values # Optionally: expect age/Z encoded in filename like mylib_age1.0_z0.02.dat age = None metal = None # --------------------------- # 3) Example: numpy / hdf5 / other - placeholder # --------------------------- else: # Placeholder: add your custom reader here. # For instance, for .npy: arr = np.load(fpath); wave_int = arr[:,0]; flux = arr[:,1] raise ValueError("Unknown file extension. Add a reader for this format in the loader.") # If age/metal were not in header, try to parse from filename using a convention. # Example filename convention: MyLib_age1.0_z0.02.fits -> extract numbers if age is None or metal is None: # naive filename parsing - edit regexp to suit your naming fname = os.path.basename(fpath) # try to extract age token "age " and z token "z " or "Zp0.02" style import re age_match = re.search(r'age[_\-]?([0-9\.]+)', fname, flags=re.IGNORECASE) z_match = re.search(r'(?:z|Z|_z|_Z)([_\-]?[0-9\.]+)', fname) if age is None and age_match: age = float(age_match.group(1)) if metal is None and z_match: try: metal = float(z_match.group(1)) except Exception: metal = None # final fallback defaults (replace with strict parsing if needed) if age is None: age = 1.0 # Gyr or the unit you choose - ensure consistency with FIREFLY's age use if metal is None: metal = 0.02 # Convert wavelength medium if needed (use same helpers as file) if self.data_wave_medium == 'vacuum': wavelength_local = airtovac(wave_int) else: wavelength_local = wave_int # Store wavelength from first file read (assumes same grid for all SSPs; if not, you must resample) if wavelength is None: wavelength = wavelength_local.copy() else: # if grids differ you may need to resample flux to the master wavelength array here # placeholder: check equality and raise/warn if different if not np.allclose(wavelength, wavelength_local): # NOTE: if your SSPs use different wavelength grids you must resample (interp1d) raise ValueError("Model wavelength grids differ between files. Resample to a common grid before adding, or implement resampling here.") # optionally downgrade model resolution to match instrument if self.downgrade_models: mf = downgrade(wavelength, flux, deltal, self.specObs.vdisp, wave_instrument, r_instrument) else: mf = copy.copy(flux) # Apply Milky Way reddening correction if provided if ebv_mw != 0: attenuations = unred(wavelength, ebv=0.0 - ebv_mw) mf = mf * attenuations # append outputs model_flux.append(mf) age_model.append(age) metal_model.append(metal) except Exception as e: print(f"[load_custom_models] Skipping file {fpath} due to error: {e}") continue # final sanity checks if wavelength is None or len(model_flux) == 0: raise RuntimeError("No valid models were loaded. Check file readers and file formats.") # Set the same attributes other loaders set so the rest of the pipeline works unchanged (with the exception of naming of results) self.model_wavelength = wavelength self.model_flux = model_flux self.age_model = age_model self.metal_model = metal_model # return values exactly as expected by get_model() callers return wavelength, model_flux, age_model, metal_model # --- In get_model(), add a branch that calls your loader: # elif self.models == 'MyCustom': # return self.load_custom_models(model_used, imf_used, deltal, vdisp, # wave_instrument, r_instrument, ebv_mw) # # Usage notes: # - Create a folder: stellar_population_models/SSP_CUSTOM_MyCustom/ # - Put your files there, e.g. SSP_CUSTOM_MyCustom/ssp_age1.0_z0.02.fits # - Call FIREFLY with models='MyCustom' and model_libs=[' '] # - If your SSPs do not share the same wavelength grid, implement a safe resampling step before appending. This method keeps the high-level flow in
get_model()unchanged while isolating your file-format specifics insideload_custom_models(). Just remember FIREFLY's fitting pipelines expect the loader to return exactly:wavelength- 1D wavelength arraymodel_flux- list/array of flux arrays (one per SSP)age- list of ages for each SSPmetal- list of metallicities for each SSP
Quick checklist before running
- Your files are uploaded into
firefly/Fitting_Engine/stellar_population_models/ - If you used an existing reader format (m11 / MaStar / E-MILES) ensure filenames and tokens follow the parser's expectations (see tokens listed above)
- Decide whether to set
self.downgrade_models=True(recommended if your models are higher resolution than the instrument) so FIREFLY will calldowngrade() - Ensure
data_wave_mediumandfit_wave_mediumare set correctly (air vs vacuum) - If you added a new branch in
get_model(), restart any running Python sessions so the modified module is reloaded