[lsst-dm-stack-users] Running tasks on the LSST pipeline

Tue Jul 16 22:23:40 PDT 2013

On 07/15/2013 10:48 PM, James McElveen wrote:
> Hi Jim,
>
> I'm one of Scott Dodelson's students. We spoke a few weeks ago while
> Scott was talking with me on Skype. Scott told me to contact you about
> some of the questions I had regarding tasks I can run with the pipeline.
> Most of my questions are related to the various common tasks offered on
> the Trac website under the Summer2013 installation instructions
> (https://dev.lsstcorp.org/trac/wiki/Installing/Summer2013).
>
> To begin, under the "Running on Stripe 82 with extended source
> photometry turned on" header, there is the following statement regarding
> ImSim: "For ImSim, follow the other instructions for running one of the
> process*.py scripts, but add the command-line option below:
>
> --configfile=$MEAS_EXTENSIONS_MULTISHAPELET_DIR/config/multiShapelet.py"
>

First off, you'll notice that I've CC'd my reply to 
lsst-dm-stack-users at lsstcorp.org, which is currently the main avenue for 
user support with the stack.  Please include it in any replies, as it 
will greatly increase the number of people able to respond.

I'm afraid there isn't a whole lot of documentation on how to run the 
various task scripts, but they all do have fairly similar usage:

<task>.py <INPUT_DIR> --id <DATA_IDS> --output <OUTPUT_DIR>

(with lots of optional options for specifying configuration parameters).

I'm not really sure why the above recommendation is on that page; all 
that does is turn on the galaxy model mags, which aren't on by default 
in some cases because they slow things down quite a bit.  Maybe they 
were on by default for SDSS in this version and off by default in 
lsstSim.  It's not important, just something I noticed.  In any case, 
you should be able to run any task without any custom configuration, 
because the defaults are supposed to be sensible.

> I can't seem to find the other instructions for running the scripts on
> the website. My impression is that the instructions being referenced
> here are either those offered in main.dox under pipe_base or those
> briefly mentioned under the next header, "Running a Stripe 82 SFM" with
> the following code (which I expect we modify depending on the script
> we're running):
>
> processCcdSdss.py sdss /lsst7/stripe82/dr7/runs --id run=1033 camcol=2
> field=111 filter=g --output /nfs/lsst7/stripe82/dr7-coadds/v1/run2
>
> However, when I attempt to run this code on the sdss files provided on
> the Trac website or on my own sdss/DES files, I obtain the following
> exception: RuntimeError: No mapper provided and no _mapper available.
> Unfortunately, the _mapper file isn't referenced in main.dox. What is
> this missing file supposed to be? I see that there is a _mapper file in
> the demo offered on the Trac site which contains the following:
> lsst.obs.sdss.sdssMapper.SdssMapper. Is this calling to the .paf file in
> the stack? When I run the processCcdSdss.py script, should I be
> referencing the _mapper file explicitly somehow in the command line?
>

What you want to do is put a _mapper file just like that one in the any 
directory that contains your input data; this file and a registry.sqlite 
file you've probably also noticed are used to tell the system how the 
data is organized.  All the _mapper file does is tell us which Python 
class to use as the "mapper" - a camera-specific class that knows how 
data should be organized on disk, as well as some other camera-specific 
things.  So it will always just have a single line, which would be, e.g.:

lsst.obs.sdss.sdssMapper.SdssMapper  ---- for SDSS data

lsst.obs.lsstSim.LsstSimMapper ---- for LSST Sim data

lsst.obs.suprimecam.SuprimeCamMapper ---- for Subaru SuprimeCam data

...etc.

> Additionally, is there any documentation that explains the scripts in
> the pipeline? I've looked over the main.dox file in pipe_base and am
> hoping to get a feel for the functionality of each script. Let me know
> if you know of any sources in the pipeline that I may have missed or any
> documentation online that I might use as a reference. All help is
> appreciated.

As I said before, the documentation isn't great, but you'd be better off 
looking at the bin directory of the pipe_tasks package, rather than 
what's in pipe_base, and then following those to the docstrings of the 
Task classes they invoke (in pipe_tasks/python/...).

The following pages may also be useful.

How to play with the data produced by processCcd (and the other process* 
Tasks), and a bit on how configure those tasks:

https://dev.lsstcorp.org/trac/wiki/v62_processCCD_data

A detailed account of some advanced processing recently done by Yusra 
AlSayyad.  It was done with a much more recent version of the stack than 
what you're likely using, but it may be useful as a view of how 
everything fits together:

https://dev.lsstcorp.org/trac/wiki/Summer2013/ConfigAndStackTestingPlans/Instructions

A how-to guide recently started by Debbie Bard, of the DESC WL team:

http://kipac.stanford.edu/collab/research/lensing/slac/HowTo/imsimDMpipeline

Hope that helps!

Jim