yt-SPHGR

SPHGR

Introduction

The goal of SPHGR is to provide a framework to build your cosmological data analysis on. It responsible for grouping your raw data into halo and galaxy objects, calculating some fundamental quantities of those objects, and then linking them together via simple relationships. What you do with the result is up to you.

Some of the major benefits of SPHGR:

  • Intuitive and easy to use interface
  • Portable output files that take up minimal space compared to the raw simulation.
  • Output files are cross compatible between py2<—>py3
  • Consistent attribute naming between objects
  • All grouping is done internally. This negates the need to learn additional codes and their outputs.
  • Attributes within SPHGR have units attached taking away any uncertainty between comoving/physical and if you forgot an h along the way.
  • Easily explore object relationships
  • Work in index space, it’s all the rage!
  • Currently compatible with any SPH cosmological simulation readable by yt.
  • Has been an integral tool in numerous publications [1]
  • Built in methods for selecting zoom regions and printing out MUSIC mask files.

I would quit galaxies without it. SPHGR makes doing galaixes ‘easier’ - Desika Narayanan

What it does

SPHGR is very basic in how it operates.

  1. Identify halos via a 3D friends-of-friends algorithm (FOF) applied to dark matter and baryons with a mean inter-particle separation of b=0.2.
  2. Identify galaxies via a 3D FOF applied to dense gas and stars with a mean inter-particle separation of b=0.04 (20% of the halo b).
  3. Link galaxies to halos. This is done by comparing the particle list of each galaxy to the global halo particle list and assigning membership to the halo which contains the most particles from said galaxy.
  4. Consider the most massive galaxy within each halo as a central and all others satellites.
  5. Link objects and populate SPHGR lists (see Data Structure for details).

After each individual object is identified, we run through an optional unbinding procedure and calculate various physical quantities. If you have concerns over using a simple 3D FOF routine to identify objects then see my comments in the How does it stack up? section.

Installation

SPHGR is currently maintained in my fork of the popular astrophysical analysis code yt. To get up and running you first need a working installation of python. I recommend you do not use your system’s default version and instead go with something you can more easily customize. My personal favorite is anaconda, but there are other solutions out there such as homebrew (OSX) and enthought. Once you have a functional python installation you will need a few prerequisite packages which can be installed via the following command:

> pip install mercurial numpy cython sympy scipy h5py

Now that we have our python environment in place, we can clone my fork, update to the sphgr bookmark, and install via:

> hg clone https://rthompson@bitbucket.org/rthompson/sphgr_yt
> cd sphgr_yt
> hg update sphgr
> python setup.py install

And that’s it! You are now ready to move on and start using yt & SPHGR.

Updating

Updating is extremely simple. I do my best to keep the my fork of yt up-to-date with respect to the latest changes to the yt-dev branch, so you should expect frequent updates. To update the code all you need to do is:

> cd sphgr_yt
> hg pull
> hg update
> python setup.py install

and you should be good to go!

Running

If your simulation snapshot can be read natively by yt without any extra work (i.e. gadget/gizmo files with standard unit definitions of 1e10Msun/h, kpc/h, km/s) then a quick and easy way to run SPHGR on a given snapshot from the command line is:

> yt sphgr snapshot.hdf5

However, if your snapshot has a different set of units and/or block structure (tipsy, gadget binaries with custom blocks, etc) then you should refer to this link in order to make sure yt is correctly reading your data before proceeding. SPHGR needs to know about your cosmology and simulation units in order to give sensible results.

The above command is a useful shortcut, but what does it actually do? It is simply a shortcut for the following code snippet, which should be used in the case of wanting to run things manually, or having to alter the yt.load() step with custom data as I mentioned above.

import yt
import yt.analysis_modules.sphgr.api as sphgr

snapshot = 'snapshot.hdf5'
outfile  = 'sphgr_snapshot.hdf5'

ds  = yt.load(snapshot)
obj = sphgr.SPHGR(ds)
obj.member_search()
obj.save_sphgr_data(outfile)

First we import yt and the SPHGR API. Next we load the snapshot as a yt-dataset (ds) and create a new SPHGR() object passing in the yt-dataset as an argument. This process will generate a MD5 hash to ensure that the resulting SPHGR file can only be paired with this particular snapshot. Next we run the member_search() method which does all of the magic described in the What it does section. Finally, we save the output to a new file called a SPHGR member file. This file can be passed around and used without the corresponding simulation snapshot if you do not require access to the raw particle data on disk.

Note: Correctly loading simulation data into yt is the users’s responsibility.

Loading SPHGR files

After creating an SPHGR member file we can easily read it back in via:

import yt.analysis_modules.sphgr.api as sphgr
sphgr_file = 'sphgr_snapshot.hdf5'
obj = sphgr.load_sphgr_data(sphgr_file)

This will read the member file, recreate all objects, and finally re-link them. The end result is access to all of the SPHGR data through the obj variable, whose contents are described in the following section.

Data Structure

One of the most useful aspect of SPHGR is the intuitive data structure which considers halos and galaxies as individual objects that have a number of attributes and relationships. All objects are derived from the same ParticleGroup class; this allows for consistent calculations and naming conventions for both attributes and internal methods. To get a better idea of the internal organization of these objects refer to the graphic below.

SPHGR object structure

All of the attributes listed under the ParticleGroup class are common to all objects. Halos and galaxies differ by their relationships to other objects. Halos for instance, contain a list of galaxies that reside within it, along with a link to the central, and another sub-list of satellite galaxies (which comes from the halo’s [galaxies] list). Galaxies on the other hand link to their parent halo, have a boolean defining if it is a central, and a list of its very own satellite galaxies (only populated when central=True).

Accessing the data

The data for each object is stored in simple lists and sub-lists within the root of an SPHGR object:

  • halos
  • galaxies
    • central_galaxies
    • satellite_galaxies

This makes it extremely simple to use python’s list comprehension to extract data that you are interested in. As an example, if we wanted the sfr attribute from each galaxy we can easily retrieve that data like so:

import numpy as np
import yt.analysis_tools.sphgr.api as sphgr
sphgr_file = 'your_sphgr_file.hdf5'

obj = sphgr.load_sphgr_data(sphgr_file)
galaxy_sfr = np.array([gal.sfr for gal in obj.galaxies])

The last line is basically a fancy way to write a loop [i.ATTRIBUTE for i in LIST] where i is an arbitrary variable. More explicitly, the full loop would look something like this:

galaxy_sfr = []
for gal in obj.galaxies:
    galaxy_sfr.append(gal.sfr)
galaxy_sfr = np.array(galaxy_sfr)

Python’s list comprehension allows us to write fewer lines and cleaner code. We can also begin to add conditions to further query our data. Let’s now get the sfr attribute for all galaxies above a given mass:

galaxy_sfr = np.array([gal.sfr for gal in obj.galaxies if gal.masses['stellar'] > 1.0e10])

which is the same as:

galaxy_sfr = []
for gal in obj.galaxies:
    if gal.masses['stellar'] > 1.0e10:
        galaxy_sfr.append(gal.sfr)
gal_sfr = np.array(gal_sfr)

We can take things one step further for this particular example and now only retrieve the sfr attribute for galaxies above a given mass residing in a halo above a given mass:

galaxy_sfr = np.array([gal.sfr for gal in obj.galaxies if gal.masses['stellar'] > 1.0e10 and gal.halo.masses['total'] > 1.0e12])

As you can hopefully see, the relationships defined by SPHGR are an integral part to how you analyze your data. Because both galaxies and halos contain the same attributes (since they both derive from the ParticleGroup parent class) they are very easy to filter through and work with.

SPHGR’s goal is to provide a reliable framework that makes it easy to access and filter though objects within your simulation.

Accessing raw particle data

Each object contains three very important lists:

  1. dmlist
  2. glist
  3. slist

whose prefixes represent dark matter (dm), gas (g), and stars (s). The contents of these lists are the indexes of the particles assigned to that particular object. This provides a very powerful avenue for selecting only the particles that belong to your specific object out of the raw disk data.

To get to the raw data, you should familiarize yourself with yt’s fields. As an example, here is how you might get the densities of all gas particles within the most massive galaxy (galaxy 0):

import yt
import yt.analysis_tools.sphgr.api as sphgr

ds = yt.load('snapshot.hdf5')
dd = ds.all_data()
gas_densities = dd['PartType0','density']

obj = sphgr.load_sphgr_data('sphgr_snapshot.hdf5')
galaxy_glist = obj.galaxies[0].glist
galaxy_gas_densities = gas_densities[galaxy_glist]

In addition to these per-object lists there are also a handful of global particle lists under the global_particle_lists object contained within the root SPHGR object. These lists include:

  1. halo_dmlist
  2. halo_glist
  3. halo_slist
  4. galaxy_glist
  5. galaxy_slist

These lists are integer values for all simulation particles indicating their member status where -1 represents not assigned to a group, -2 represents unbound from a group, and values greater than -1 represent the index of the object it belongs to. Using this data, you can easily select all gas that has been assigned to a halo, or vice versa. These global lists are read-on-demand meaning that they do not take up excess memory when your SPHGR file is loaded. To illustrate how these work let us get the indexes of all star particles that do not belong to a galaxy (<0 in the global list):

import numpy as np
global_slist = obj.global_particle_lists.galaxy_slist
indexes = np.where(global_slist < 0)[0]

We can now use indexes as a slice through full particle data arrays to select only the particles of interest.

How does it stack up?

SPHGR is currently limited in a few ways. The most notable however, is that I am using a standard friends-of-friends group finder (FOF) to group both halos and galaxies. This is known to cause issues in certain scenarios such as mergers. However, when dealing with halos it does not have a large impact on the global properties [2]. Because the currently implemented FOF within yt is a simple 3D algorithm, SPHGR does not currently classify halo substructure. It is my hope that later this year it will be extended to 6 dimensions which will allow for substructure identification similar to many 6D phase space halo finders available today.

In regards to galaxy identification, I have compared SPHGR galaxy results to those of SKID and it stacks up pretty well. Below is an image that contains numerous comparison between SKID identified galaxies and FUBAR (SPHGR’s galaxy finding routine). They are not a perfect 1-to-1 match, but overall I would say that they are quite comparable. Regardless, I am always working to improve these results with constant updates and tweaks to the code.

FUBAR vs. SKID

What is to come?

The ultimate goal of SPHGR is to extend its capabilities beyond that of just particle codes. I would love nothing more than to provide a unified framework for analyzing cosmological simulations in a simple and intuitive manner.

But in the mean time, here is my todo list:

  • Test and verify python2<—>python3 member file compatibility.
  • Identify halos with a 6D phase space metric (substructure!).
  • Optimize the unbinding procedure so that small objects are not fully unbound anymore (unbinding is currently OFF).
  • Enable grid code functionality
  • Design a framework that allows the user to define arbitrary relationships and implement them.
  • Add yt dependent methods to make it simpler to for example: projection plot a single object along an axis.
  • Come up with a better name.

[1]
[2]

Hyperion, FSPS, and FSPS-python

Reinstalling OSX has me spending too much time figuring out what I’m missing.  Hyperion isn’t tricky, but I don’t feel like scrambling for instructions next time so here we go:

Hyperion

First thing’s first, nab and install gfortran. Next you’ll need to make sure you have HDF5 compiled and installed with fortran support, I covered this topic in this post. Next we’ll want to nab the hyperion tarball and move into the directory. This should be a basic configure, make, and make install:
> ./configure --prefix=/Users/bob/local/hyperion
> make
> make install
It is very important that you do *not* use ‘make -jN’; make must be ran serialized! Now we can build the python part via:
> python setup.py build
> python setup.py install
Easy enough right? Ok, well now we can move on to FSPS and Python-FSPS!

FSPS & Python bindings

Download the FSPS tarball via svn:
> svn checkout http://fsps.googlecode.com/svn/trunk/ fsps
You’ll need to add a new environment variable to your .bash_profile or .bashrc:
SPS_HOME=/Users/bob/research/codes/fsps
and point it to wherever the directory is you just downloaded. Then a simple ‘make’ should take care of everything as long as your gfortran is properly installed. Now nab python-fsps from github and cd into the downloaded directory. The build method is a little unconventional for most python packages:
> python setup.py build_fsps
> python setup.py develop

Testing FSPS

We should now be able to test FSPS and produce some stellar spectra. The metallicity input boils down to a table of values where you have to pass in the closest index. here’s an example script:
 import fsps
 import random
 import fsps

 random.seed('test')

 ZSUN = 0.0190
 METALS = [0.1,1,1.5] ## array of metals for our stars
 AGE = 5. ## maximum age

## table from FSPS
 ZMET =
 asarray([0.0002,0.0003,0.0004,0.0005,0.0006,0.0008,0.0010,0.0012,
 0.0016,0.0020,0.0025,0.0031,0.0039,0.0049,0.0061,0.0077,
 0.0096,0.0120,0.0150,0.0190,0.0240,0.0300])

## loop through each star and calculate the spectrum based on its age
 and metallicity
 for i in range(0,len(METALS)):
 zmetindex = (abs(ZMET-(METALS[i]*ZSUN))).argmin() + 1 ## determine
 index of ZMET
 sp = fsps.StellarPopulation(imf_type=1, zmet=zmetindex) ## calculate
 stellar population
 spectrum = sp.get_spectrum(tage=random.random()*AGE) ## get the
 spectrum

## plot the results
 loglog(spectrum[0],spectrum[1],label=r'%0.2fZsun' % METALS[i],
 ls=LINES[i])

legend()
 show()
All of that produces this:
stuff

blob & KH tests again!

Turns out I was running my BLOB tests incorrectly by not telling gadget that the box size was not a cube. The periodic box was supposed to be x,y,z=[2000,2000,6000] but was simply registering as [2000,2000,2000]. Initially I didn’t think this would pose a problem since it was a periodic box, but I forgot to take into account the shockwaves propagating back into the initial blob and causing some unwanted disturbances. Anyway, by switching on three parameters in the makefile:

-DLONG_X=1.
-DLONG_Y=1.
-DLONG_Z=3.
I was able to easily fix this problem so the box was properly structured. After a day’s worth of testing I ended up with some nice results:
where we see that both DISPH cases do much better than the vanilla SPH case.
UPDATE(7/26/13)! Movie of a high-res blob test zoomed in on the blob:
In other news I made a longer KH instability movie, this time in 1080p!
these movies were created with ffmpeg, the settings to create a slightly smaller and ‘more compatible’ youtube file are as follows:
ffmpeg -r 12 -i splash_%04d.png -c:v libx264 -crf 1 -pix_fmt
yuv420p -r 30 output.mp4
where the -crf is the constant rate factor; a value of 0 is no compression where a value of 50 is maximum compression. The -pix_fmt command has something to do with downsampling. I got all of this from a forum post.

vanilla SPH vs DISPH so far….

I ran a systematic comparison between vanilla SPH and DISPH using our latest version of the code. I include all the fancy physics such as EZW winds (momentum+energy) at 150km/s & 75km/s, feedback with typeIa,AGB, etc etc.

Let’s start with the phase differences:
PHASEDIAGRAM
SFRD:
SFRD
GSMF:
GSMF
m-Z relation:
mZrelation
fgas:
gasfraction
sSFR:
sSFR
HMR:
halfmassradius
timings:
timings
density fields:
densityfields

continued trouble with surface density estimations

Question: does the mass-weighted Sobolev/HSML surface density per-particle along a column make any sense? I’m not so sure it does as the idea of a single SPH particle having it’s own surface density is kind of a stretch in it’s own right. To take a deeper look I ran through the grid calculation again, but this time I returned the grid indexes [x,y] for each particle. I then plotted the calculated (smoothed) $$\Sigma$$ value for each cell vs. the raw per-particle $$\Sigma_{\rm{sob}}$$ value. The result was a little different than I was expecting:
perpixelsobolev
At first I thought it was wrong. But then I took a deeper look at the radial values for Sobolev/Sigma from a previous post (bottom left panel):
sobolevpixels
and realized that they were indeed correct. This is basically telling us that there is signifiant spread in the Sobolev surface density values with a pretty substantial under-estimation as very few of those blue dots lie on the 1-to-1 line. HSML looks very similar, but the normalization is slightly higher. And apparently when one takes the mass-weighted average along a column (with smoothing), the mass contribution spills into neighboring cells pretty significant lowering the sobolev $$\Sigma$$ substantially.

The saga continues…still no clear direction in how to resolve this issue.



dynamic geometry!

I ran across a program in my downloads and figured it deserved some notation. I’ve only put it to use once when looking into the Toomre-Q parameter (disk stability), but I can see where it may come in handy down the road. It’s basically a geometrical toy set called C.A.R.Metal. You can define angles/lines/circles/vectors/etc., then move those points and everything dynamically adjusts! Great for messing with angles and the like. Here’s an old image from my toying with it for reference:
angle1
So as an example, you could grab one of those points and move it about. As you move it around the numbers corresponding to those angles change and update dynamically.

mass-weighted Sobolev/HSML column values

Ken suggested I revisit the particles per grid-cell comparison - i.e. Grid vs. Sob, Sob vs. HSML, and Grid vs. HSML. In order to do this so that each grid corresponds to one value, I have to take the mass-weighted average of the Sobolev/HSML values within a given grid-cell:
\begin{equation*} <\Sigma_{\rm{Sob}}>_{\rm{MW}} = \Sigma_\rm{((Sob,or,HSML)}\times \rm{mass})/\Sigma_\rm{mass} \end{equation*}

I then have a single \(\Sigma\) value per grid-cell and can make a direct comparison:

sobolevpixels
(plot made with gal_reduction/tools/sigmacompare.py via PIXELPLOT==1)

In an ideal situation, each line would lie on the one-to-one dotted black line. Unfortunately both Sobolev and HSML values under estimate the grid values. The good news is that there isn’t much difference due to the resolution. We might have to examine more galaxies within our sims in a similar fashion to see if this under prediction takes place at the same surface densities; if that is the case we can easily incorporate some sort of correction factor. But that leads to the question - how many galaxies do we have to look at?

In making this plot I learned how to stick the legend outside of the plot. It’s as simple as passing bbox_to_anchor to the legend() call:
legend(bbox_to_anchor=(0.1,1.05),loc=3,shadow=True,fancybox=True)
this is a tiny bit tricky in that the anchor is attached to the loc. So if loc=3 then the anchor for the box is attached to the bottom left corner of the legend box. Also the x,y coordinates are in absolute figure positions. This is thanks to matplotlib and stack overflow.

Next up was Ken’s concern about the histograms I was sending him. Further inspection showed that simply passing ‘bins=50’ is not necessarily the best idea to properly represent the data. By default hist() takes the min & max of the data then splits it into N bins. The problem here is that if you’re comparing two different datasets then if they span a different range then the binsizes will differ. To circumvent this issues I simply specified the binsize manually via altering my histo() function call a tiny bit:
def histoit(values,COLOR,LABEL,HLS,BINSIZE=0.1):
indexes = where(values>0)[0] ##prevents -inf
vals = log10(values[indexes])
binner = arange(min(vals),max(vals),BINSIZE)
hist(vals,color=COLOR,log=True,
bins=binner,
label=LABEL,histtype='step',lw=1.5,ls=HLS)
With this quick fix the data looks much more comparable, here’s the before and after:

[table] Before,After [/table] better right? hello?


And last but not least, I’ve plotted the gas particle positions for this galaxy at all three resolutions for visual inspection:
ledisks
we go from slightly compressed cloud, to fluffy pancake, to regular pancake!


Comparing Sigma

The new HI gradient stuff finished down to z=3.  Turns out it does make a small difference in the calculated Sobolev values of this particular galaxy.  At the same time however, we find now calculate the HI surface density via the grid calculation and drop that along with it, so the difference between grid & sobolev remains about ~1dex:sigmaHIcompare(plot made via gal_reduction/tools/sigmacompare.py)

Sigma_HSML however remains a front-runner here as it’s separation is only about ~0.5dex or less depending on if we account for the spread.  2xSigma_HSML overshoots the grid calculation by a tiny bit.  In comparing these two galaxies I’ve also found that the Sigma_HI results in slightly less stellar and gas content:

[table]
label,dir,eff epsilon [kpc/h],#of nested grids, pcount, galaxy number, RS halo number, Mgas, Mstar
superlowres,run14,3.13,0,128^3,101,915,2.09e9,1.73e9
lowres,run11,1.56,1,256^3,13,143,8.98e8,1.81e9
hires,run12,0.78,2,512^3,24,995,5.31e8,1.85e9
hires(HIgrad),run16,0.78,2,512^3,25,1024,4.77e8,1.81e9
[/table]

It’s not entirely clear where to go from this point or which option is the best option to estimate the surface density within these SPH sims…

As a side not the KS-relation changed a tiny bit between these two as well:

Is this thing on?

Today I’m running run16_ken_fixedgradient so I can compare with our old method of doing things. Previously I was calculating the density gradient of the entire particle and using that for h:

\begin{equation*} h=\frac{\rho}{|\nabla \rho|} \end{equation*}

but now It’s changing to:

\begin{equation*} h=\frac{\rho_{\rm{HI}}}{|\nabla \rho_{\rm{HI}}|} \end{equation*}

which may not make much a difference at all, but will be nice to compare.  Run is currently at $$z\sim4.8$$.

Have a SPLOTCH picture!