The goal of SPHGR is to provide a framework to build your cosmological data analysis on. It responsible for grouping your raw data into halo and galaxy objects, calculating some fundamental quantities of those objects, and then linking them together via simple relationships. What you do with the result is up to you.

Some of the major benefits of SPHGR:

  • Intuitive and easy to use interface
  • Portable output files that take up minimal space compared to the raw simulation.
  • Output files are cross compatible between py2<—>py3
  • Consistent attribute naming between objects
  • All grouping is done internally. This negates the need to learn additional codes and their outputs.
  • Attributes within SPHGR have units attached taking away any uncertainty between comoving/physical and if you forgot an h along the way.
  • Easily explore object relationships
  • Work in index space, it’s all the rage!
  • Currently compatible with any SPH cosmological simulation readable by yt.
  • Has been an integral tool in numerous publications [1]
  • Built in methods for selecting zoom regions and printing out MUSIC mask files.

I would quit galaxies without it. SPHGR makes doing galaixes ‘easier’ - Desika Narayanan

What it does

SPHGR is very basic in how it operates.

  1. Identify halos via a 3D friends-of-friends algorithm (FOF) applied to dark matter and baryons with a mean inter-particle separation of b=0.2.
  2. Identify galaxies via a 3D FOF applied to dense gas and stars with a mean inter-particle separation of b=0.04 (20% of the halo b).
  3. Link galaxies to halos. This is done by comparing the particle list of each galaxy to the global halo particle list and assigning membership to the halo which contains the most particles from said galaxy.
  4. Consider the most massive galaxy within each halo as a central and all others satellites.
  5. Link objects and populate SPHGR lists (see Data Structure for details).

After each individual object is identified, we run through an optional unbinding procedure and calculate various physical quantities. If you have concerns over using a simple 3D FOF routine to identify objects then see my comments in the How does it stack up? section.


SPHGR is currently maintained in my fork of the popular astrophysical analysis code yt. To get up and running you first need a working installation of python. I recommend you do not use your system’s default version and instead go with something you can more easily customize. My personal favorite is anaconda, but there are other solutions out there such as homebrew (OSX) and enthought. Once you have a functional python installation you will need a few prerequisite packages which can be installed via the following command:

> pip install mercurial numpy cython sympy scipy h5py

Now that we have our python environment in place, we can clone my fork, update to the sphgr bookmark, and install via:

> hg clone
> cd sphgr_yt
> hg update sphgr
> python install

And that’s it! You are now ready to move on and start using yt & SPHGR.


Updating is extremely simple. I do my best to keep the my fork of yt up-to-date with respect to the latest changes to the yt-dev branch, so you should expect frequent updates. To update the code all you need to do is:

> cd sphgr_yt
> hg pull
> hg update
> python install

and you should be good to go!


If your simulation snapshot can be read natively by yt without any extra work (i.e. gadget/gizmo files with standard unit definitions of 1e10Msun/h, kpc/h, km/s) then a quick and easy way to run SPHGR on a given snapshot from the command line is:

> yt sphgr snapshot.hdf5

However, if your snapshot has a different set of units and/or block structure (tipsy, gadget binaries with custom blocks, etc) then you should refer to this link in order to make sure yt is correctly reading your data before proceeding. SPHGR needs to know about your cosmology and simulation units in order to give sensible results.

The above command is a useful shortcut, but what does it actually do? It is simply a shortcut for the following code snippet, which should be used in the case of wanting to run things manually, or having to alter the yt.load() step with custom data as I mentioned above.

import yt
import yt.analysis_modules.sphgr.api as sphgr

snapshot = 'snapshot.hdf5'
outfile  = 'sphgr_snapshot.hdf5'

ds  = yt.load(snapshot)
obj = sphgr.SPHGR(ds)

First we import yt and the SPHGR API. Next we load the snapshot as a yt-dataset (ds) and create a new SPHGR() object passing in the yt-dataset as an argument. This process will generate a MD5 hash to ensure that the resulting SPHGR file can only be paired with this particular snapshot. Next we run the member_search() method which does all of the magic described in the What it does section. Finally, we save the output to a new file called a SPHGR member file. This file can be passed around and used without the corresponding simulation snapshot if you do not require access to the raw particle data on disk.

Note: Correctly loading simulation data into yt is the users’s responsibility.

Loading SPHGR files

After creating an SPHGR member file we can easily read it back in via:

import yt.analysis_modules.sphgr.api as sphgr
sphgr_file = 'sphgr_snapshot.hdf5'
obj = sphgr.load_sphgr_data(sphgr_file)

This will read the member file, recreate all objects, and finally re-link them. The end result is access to all of the SPHGR data through the obj variable, whose contents are described in the following section.

Data Structure

One of the most useful aspect of SPHGR is the intuitive data structure which considers halos and galaxies as individual objects that have a number of attributes and relationships. All objects are derived from the same ParticleGroup class; this allows for consistent calculations and naming conventions for both attributes and internal methods. To get a better idea of the internal organization of these objects refer to the graphic below.

SPHGR object structure

All of the attributes listed under the ParticleGroup class are common to all objects. Halos and galaxies differ by their relationships to other objects. Halos for instance, contain a list of galaxies that reside within it, along with a link to the central, and another sub-list of satellite galaxies (which comes from the halo’s [galaxies] list). Galaxies on the other hand link to their parent halo, have a boolean defining if it is a central, and a list of its very own satellite galaxies (only populated when central=True).

Accessing the data

The data for each object is stored in simple lists and sub-lists within the root of an SPHGR object:

  • halos
  • galaxies
    • central_galaxies
    • satellite_galaxies

This makes it extremely simple to use python’s list comprehension to extract data that you are interested in. As an example, if we wanted the sfr attribute from each galaxy we can easily retrieve that data like so:

import numpy as np
import yt.analysis_tools.sphgr.api as sphgr
sphgr_file = 'your_sphgr_file.hdf5'

obj = sphgr.load_sphgr_data(sphgr_file)
galaxy_sfr = np.array([ for gal in obj.galaxies])

The last line is basically a fancy way to write a loop [i.ATTRIBUTE for i in LIST] where i is an arbitrary variable. More explicitly, the full loop would look something like this:

galaxy_sfr = []
for gal in obj.galaxies:
galaxy_sfr = np.array(galaxy_sfr)

Python’s list comprehension allows us to write fewer lines and cleaner code. We can also begin to add conditions to further query our data. Let’s now get the sfr attribute for all galaxies above a given mass:

galaxy_sfr = np.array([ for gal in obj.galaxies if gal.masses['stellar'] > 1.0e10])

which is the same as:

galaxy_sfr = []
for gal in obj.galaxies:
    if gal.masses['stellar'] > 1.0e10:
gal_sfr = np.array(gal_sfr)

We can take things one step further for this particular example and now only retrieve the sfr attribute for galaxies above a given mass residing in a halo above a given mass:

galaxy_sfr = np.array([ for gal in obj.galaxies if gal.masses['stellar'] > 1.0e10 and gal.halo.masses['total'] > 1.0e12])

As you can hopefully see, the relationships defined by SPHGR are an integral part to how you analyze your data. Because both galaxies and halos contain the same attributes (since they both derive from the ParticleGroup parent class) they are very easy to filter through and work with.

SPHGR’s goal is to provide a reliable framework that makes it easy to access and filter though objects within your simulation.

Accessing raw particle data

Each object contains three very important lists:

  1. dmlist
  2. glist
  3. slist

whose prefixes represent dark matter (dm), gas (g), and stars (s). The contents of these lists are the indexes of the particles assigned to that particular object. This provides a very powerful avenue for selecting only the particles that belong to your specific object out of the raw disk data.

To get to the raw data, you should familiarize yourself with yt’s fields. As an example, here is how you might get the densities of all gas particles within the most massive galaxy (galaxy 0):

import yt
import yt.analysis_tools.sphgr.api as sphgr

ds = yt.load('snapshot.hdf5')
dd = ds.all_data()
gas_densities = dd['PartType0','density']

obj = sphgr.load_sphgr_data('sphgr_snapshot.hdf5')
galaxy_glist = obj.galaxies[0].glist
galaxy_gas_densities = gas_densities[galaxy_glist]

In addition to these per-object lists there are also a handful of global particle lists under the global_particle_lists object contained within the root SPHGR object. These lists include:

  1. halo_dmlist
  2. halo_glist
  3. halo_slist
  4. galaxy_glist
  5. galaxy_slist

These lists are integer values for all simulation particles indicating their member status where -1 represents not assigned to a group, -2 represents unbound from a group, and values greater than -1 represent the index of the object it belongs to. Using this data, you can easily select all gas that has been assigned to a halo, or vice versa. These global lists are read-on-demand meaning that they do not take up excess memory when your SPHGR file is loaded. To illustrate how these work let us get the indexes of all star particles that do not belong to a galaxy (<0 in the global list):

import numpy as np
global_slist = obj.global_particle_lists.galaxy_slist
indexes = np.where(global_slist < 0)[0]

We can now use indexes as a slice through full particle data arrays to select only the particles of interest.

How does it stack up?

SPHGR is currently limited in a few ways. The most notable however, is that I am using a standard friends-of-friends group finder (FOF) to group both halos and galaxies. This is known to cause issues in certain scenarios such as mergers. However, when dealing with halos it does not have a large impact on the global properties [2]. Because the currently implemented FOF within yt is a simple 3D algorithm, SPHGR does not currently classify halo substructure. It is my hope that later this year it will be extended to 6 dimensions which will allow for substructure identification similar to many 6D phase space halo finders available today.

In regards to galaxy identification, I have compared SPHGR galaxy results to those of SKID and it stacks up pretty well. Below is an image that contains numerous comparison between SKID identified galaxies and FUBAR (SPHGR’s galaxy finding routine). They are not a perfect 1-to-1 match, but overall I would say that they are quite comparable. Regardless, I am always working to improve these results with constant updates and tweaks to the code.


What is to come?

The ultimate goal of SPHGR is to extend its capabilities beyond that of just particle codes. I would love nothing more than to provide a unified framework for analyzing cosmological simulations in a simple and intuitive manner.

But in the mean time, here is my todo list:

  • Test and verify python2<—>python3 member file compatibility.
  • Identify halos with a 6D phase space metric (substructure!).
  • Optimize the unbinding procedure so that small objects are not fully unbound anymore (unbinding is currently OFF).
  • Enable grid code functionality
  • Design a framework that allows the user to define arbitrary relationships and implement them.
  • Add yt dependent methods to make it simpler to for example: projection plot a single object along an axis.
  • Come up with a better name.


Hyperion, FSPS, and FSPS-python

Reinstalling OSX has me spending too much time figuring out what I’m missing.  Hyperion isn’t tricky, but I don’t feel like scrambling for instructions next time so here we go:


First thing’s first, nab and install gfortran. Next you’ll need to make sure you have HDF5 compiled and installed with fortran support, I covered this topic in this post. Next we’ll want to nab the hyperion tarball and move into the directory. This should be a basic configure, make, and make install:
> ./configure --prefix=/Users/bob/local/hyperion
> make
> make install
It is very important that you do *not* use ‘make -jN’; make must be ran serialized! Now we can build the python part via:
> python build
> python install
Easy enough right? Ok, well now we can move on to FSPS and Python-FSPS!

FSPS & Python bindings

Download the FSPS tarball via svn:
> svn checkout fsps
You’ll need to add a new environment variable to your .bash_profile or .bashrc:
and point it to wherever the directory is you just downloaded. Then a simple ‘make’ should take care of everything as long as your gfortran is properly installed. Now nab python-fsps from github and cd into the downloaded directory. The build method is a little unconventional for most python packages:
> python build_fsps
> python develop

Testing FSPS

We should now be able to test FSPS and produce some stellar spectra. The metallicity input boils down to a table of values where you have to pass in the closest index. here’s an example script:
 import fsps
 import random
 import fsps


 ZSUN = 0.0190
 METALS = [0.1,1,1.5] ## array of metals for our stars
 AGE = 5. ## maximum age

## table from FSPS

## loop through each star and calculate the spectrum based on its age
 and metallicity
 for i in range(0,len(METALS)):
 zmetindex = (abs(ZMET-(METALS[i]*ZSUN))).argmin() + 1 ## determine
 index of ZMET
 sp = fsps.StellarPopulation(imf_type=1, zmet=zmetindex) ## calculate
 stellar population
 spectrum = sp.get_spectrum(tage=random.random()*AGE) ## get the

## plot the results
 loglog(spectrum[0],spectrum[1],label=r'%0.2fZsun' % METALS[i],

All of that produces this:

OSX Automount

When reinstalling OSX on my hackintosh I’m always troubled with the question of how do I automatically reconnect to my server’s network drive on startup? I previously had an apple script that did just that, but for whatever reason it no longer works in OSX 10.9 Mavericks. Apparently Apple’s Automater is a lot easier to work with and does the same thing. Thanks to this post I found a clever and simple solution that should continue to work regardless of the OS version:

  1. Launch Automater
  2. Choose Application
  3. Search for pause and drag it to the right pane (enter 5secs or so)
  4. Search for server and drag Get Specified Servers and Connect to Servers (in this order) to the right pane.
  5. In Get Specified Servers click Add, and add the address (in my case it was afp://
  6. Save the application, and add it to your OSX startup!

Customizing Emacs

There’s an entire world of emacs out there that I have yet to explore. Before leaving for Amhurst I took the first step into that world and oh is it wonderful. I’m going to run through a few run of the mill additions to our ~/.emacs which are fairly essential.

(setq backup-by-copying t)

allow for editing files in Transmit or other FTP programs

(add-to-list 'auto-mode-alist '("\\.h\\'" . c-mode))
(add-to-list 'auto-mode-alist '("\\.pyx\\'" . python-mode))
(add-to-list 'auto-mode-alist '("\\.sm\\'" . fortran-mode))

add additional unrecognized file types that will allow for syntax highlighting

(unless window-system
  (xterm-mouse-mode 1)
  (global-set-key [mouse-4] '(lambda ()
                               (scroll-down 1)))
  (global-set-key [mouse-5] '(lambda ()
                               (scroll-up 1))))

enable mouse interaction in terminal mode

(require 'package)
(add-to-list 'package-archives '("marmalade" . ""))

Here we add a package repository. This allows for a package management system very similar to Debian’s apt-get (only for Emacs v24+). We can access these packages via open Emacs after adding this line, then hitting M-x (M stands for meta, default key is ESC), and typing ‘package-list-packages’.

Search the list for ‘jedi’, put your cursor next to it and hit ‘i’ for install, you should see a big I come up next to it. Now hit ‘x’ to execute and it will ask if you would like to install jedi - the obvious answer is ‘yes’. Unfortunately we’re not finished, we have to add some commands to our ~/.emacs and setup a python virtual environment:

(autoload 'jedi:setup "jedi" nil t)
(add-hook 'python-mode-hook 'auto-complete-mode)
(add-hook 'python-mode-hook 'jedi:setup)
(setq jedi:setup-keys t)
(setq jedi:complete-on-dot t)

Now we can proceed to the virtual environment, which can be a bit tricky. Navigate to your jedi.el directory, which is most likely located in ~/.emacs.d/elpa/jedi-xxx. In this directory you will find a Makefile. If you are using any python distribution other than CANOPY, then you can probably just run ‘make requirements’ and be set to go. If you are using CANOPY however, a small change must be made to the makefile since virtualenv is not compatible with 64bit CANOPY on osx. Edit the makefile and change the following line:

virtualenv --python=$(PYTHON) --> venv -s python_virtual

Once this is done run ‘make requirements’ and it should download and compile everything else you need for jedi. Open up a new python document and enjoy the auto complete! ref

Next we can add a syntax highlighter called Flycheck. To install in emacs we M-x package-list-packages and search for flycheck, then follow a similar installation procedure above for the initial install of jedi. Next install pylint and pyflakes to your python install via pip (really not sure which one does the trick, but I believe it is pylint):

pip install pylint
pip install pyflakes

Lastly add this line to your ~/.emacs:

(add-hook 'after-init-hook #'global-flycheck-mode)

Finally we can install autopair/smartpar, which automatically closes parentheses, brackets, etc. Follow the routine above for adding packages, and install ‘autopair’. Then add this to your ~/.emacs and you should be set to go:

(require 'autopair)
(smartparens-global-mode t)

refs: 1 2 3 4

EDIT 11/7/13:
I forgot to mention color themes! One of the default Emacs24 themes that is pretty decent is ‘deeper-blue’ which can be enabled by modifying your custom-set-variables by:
 ;; custom-set-variables was added by Custom.
 ;; If you edit it by hand, you could mess it up, so be careful.
 ;; Your init file should contain only one such instance.
 ;; If there is more than one, they won't work right.
 '(custom-enabled-themes (quote (deeper-blue)))
 '(inhibit-startup-screen t))

the ‘(custom-enabled-themes (quote (deeper-blue))) does it. BUT, I’m starting to prefer another theme called solarized. The installation is as simple as adding it via marmalade - ‘M-x package-list-packages’, then search for ‘color-theme-solarized’ and install it. Then in your ~/.emacs add the line:

(load-theme 'solarized-[light|dark] t)

and you should be set to go.

what exactly is in a python object?

When working with other people’s packages tit can sometimes be frustrating to not know exactly what types of things are in the objects you are creating. Especially when the documentation is lacking and you have to dig through the source to find the answers you seek. Luckily in python 2.7+ (I think) there is a way to list methods and attributes of an object. Let’s take a look at one of my galaxy objects - dataSet.galaxies[0]. We can see what lies inside if we tack a __dict__ on the end like so:

In [1]: dataSet.galaxies[0].__dict__
{'ALPHA': 1.2212573740203678,
 'BETA': -0.9541605597579632,
 'FMR': 82.891999999999996,
 'FMR_cv': 1.9885869336675259,
 'HMR': 44.951360000000001,
 'HMR_cv': 1.9472879450163203,
 'L': array([ -1.81083747e+11,   1.20597990e+11,   4.39587033e+10]),
 'RS_halo_d': 0,
 'RS_halo_gmass': 0,
 'RS_halo_index': -1,
 'RS_halo_isparent': 0,
 'RS_halo_mass': 0,
 'RS_halo_smass': 0,
 'Z': 0.0,
 'central': -1,
 'cm': array([ 10345.936     ,   9542.06628571,   9012.98285714]),
 'gas_mass': 108890025.46897629,
 'glist': [29820L,....],
 'index': 0,
 'max_cv': 2.2245953691479929,
 'max_vphi': 113.64515965485172,
 'sfr': 0.0,
 'slist': [],
 'stellar_mass': 0.0}

Now if you have a ton of information/attributes attached to this object, it can get pretty messy quickly. If we tack the keys() function on the end it cleans things up quite a bit

In [2]: dataSet.galaxies[0].__dict__.keys()

Now don’t ask me why it’s ordered the way it is; the point is you now know every attribute contained in the object.

But what about methods? Well apparently there is an equivalent function called dir(object) that pretty much does the same thing as .__dirs__.keys()

In [3]: dir(dataSets[0])

but this time we can see the methods such as read_sorad() and read_rockstar(). I’m not sure if there is a way to isolate the methods yet but regardless either command is quite helpful when dealing with another’s code.

compiling OpenMPI & HDF5 with Fortran support

Fortran…why on Earth would I want something with Fortran support? Apparently it’s fast; and when something is fast people still use it. I’m currently looking into the radiative transfer code Hyperion and it requires both MPI and HDF5 to be compiled with fortran support. It’s fairly simple to do, but can be a pain in the neck to find the syntax if you don’t know exactly where to look. Let’s start with OpenMPI which is currently on version 1.6.5.
?> ./configure --prefix=/Users/bob/local/openmpi-1.6.5 F77=gfortran FC=gfortran
?> make
?> make install
For HDF5 the syntax is similar:
?> ./configure CC=mpicc --enable-fortran --enable-hl --prefix=/Users/bob/local/hdf5-1.8.11 FC=gfortran
?> make
?> make install
And that should do it. To get hyperion to see these you just add the /bin dirs to your $PATH and the lib dirs to your $LD_LIBRARY_PATH. To check if the MPI-fortran bindings worked you can type mpif90 and mpi77 at the terminal and gfortran should error out saying that there are no input files.
Lastly once you recompile OpenMPI you may have to reconfigure/recompile/reinstall mpi4py if you’re using it. This is also super simple, but on OSX it can be troublesome. Before the compilation you have to redirect it to the proper SDK directory; the install should look like so:
?> export SDKROOT=/
?> python build
?> python install
I usually end up with an error at the end, but it installs and seems to run without any problems.

Listing files in a directory via python

I’ve come across a few situations where batch processing of snapshots is necessary. Normally I manually key in the number of snaps in the for loop, but is there a better more python way? Of course there is! Enter ‘glob‘. Glob is basically a wrapper for os.listdir, which is equivalent to typing ‘ls -U’ in terminal. If you use the -U however, the ordering is completely arbitrary, so by using say:
import glob
snaps = glob.glob("snapshot\_\*")
You’re going to get an unsorted back. For some this may not be a problem, but for me? I usually need a list ordered 000-N! The solution for this comes in the form of the the python function sorted():
import glob
import os
snaps = sorted(glob.glob("snapshot\_\*"), key=os.path.relpath)
which returns a sorted list! The key (pun intended) here is the key keyword, which tells it which os path to sort by. Many options exist such as ‘os.path.getmtime’ to sort by timestamp, and ‘os.path.getsize’ to sort by size. Thanks to stackoverflow again for the assistance!

check executable library dependencies

When compiling a C code it often requires certain libraries be present. To check which libraries the executable depends on there is a handy linux and OSX command that can show you where the executable thinks these libraries reside:
 ## LINUX ##
 ldd executable\_name

## OSX ##
 otool -L executable\_name
very handy when working on remote machines where things might be a little nutty.

setup an iPython notebook

Let’s setup an iPython notebook shall we? First we need to check what version of iPython we have, this is as easy as typing this in your terminal:
$> ipython --version
as of right now, the latest version is 1.0.0, if you have anything less than that you may want to consider upgrading. The easiest way to do this is via pip.


Type pip in terminal and see if you have it, I’ll wait…ok you don’t? Well follow the link and download it, then to install you un-tar the file and cd into the directory, then in terminal:
$> python build
$> python install
that second command may require sudo depending on your python installation.


Now that you have pip installed you have something very similar to ubuntu’s apt-get or fedora’s yum, except for python! Now we can upgrade iPython via pip:
$> pip install ipython --upgrade

Create a new iPython profile

Next we create a new iPython profile:
$> ipython profile create username
where you replace “username” with a profile name of your choice. Then we edit the file ~/.ipython/profile_username/ in your editor of choice; We’re concerned with two lines in particular:
c.IPKernelApp.pylab = 'inline'
c.FileNotebookManager.notebook\_dir = u'/Users/bob/Dropbox/research/python/ipyNotebooks'
These two lines are commented out by default. The first says to put matplotlib figures in the notebook rather than open a new window, and the second dictates where the notebooks are saved by default. I like to set this second option to somewhere in my dropbox so that I can automatically share those between my multiple machines.


Finally we create an alias to launch the notebook as it makes things much easier. Edit your ~/.bash_profile (mac) or ~/.bashrc (*nix) and add:
alias notebook='ipython notebook --profile=username'
Once this is done close all of your terminals and open a fresh one. Type ‘notebook’ and whala, your browser will open to your very own iPython notebook. I typically minimize this terminal window and keep it open for extended periods of time so the notebook stays alive.

There are numerous other settings you can dink around with in ~/.ipython/profile_username/ that I didn’t cover today, so feel free mess around in there. Lastly there are some additional security options to setup a password and such for your notebook that were also not covered, more details about this and credit for the bulk of these instructions comes from here.

timing python functions

One normally would use the magic %timeit function() to time how fast a python function is. For my purposes, I need to test nested functions that aren’t simple to call on their own; to do this I usually just print out the start and end time, but I frequently forget the syntax.
import time
start_time = time.strftime("%H:%M:%S",time.localtime())
end_time = time.strftime("%H:%M:%S",time.localtime())
print(r'START: %s END: %s' % (start_time, end_time))
A little manual subtraction is in order…and I’m sure there’s a simpler way to have python do this, but it suites my needs at this point.