28Feb2012 Discussion

(These notes are by @wkerzendorf, so 1st person language refers to him)

Implementation questions

Instantiator

two main suggestions: instantiation with a flux & a dispersion array OR a flux array and a WCS object suggested WCS implementation:

https://github.com/wkerzendorf/specutils/blob/specio/specutils/io/io_utils.py https://github.com/wkerzendorf/specutils/blob/specwcs/specutils/specwcs/specwcs.py

----- discussion -----

everyone agrees that we want something like a python version of WCS to handle dispersion axes, but it won't be done for a while. For the short-term we need a temporary solution.

instantiator:

_init_(dispersion,flux,dispersion_units,...ndddata...)

WCS will be an option from the NDData.

Slicing

slicing can work on either dispersion or pixels
keyword to switch between both of them
getitem overload?

.slice for now will only implement .slice as .slice_dispersion functionality unclear. .slice_index

.slice_dispersion --- important note --- NDData link

arithmetic

operator overload? if operator overload
what operands allowed? scalars, spectrum same dispersion, spectrum different dispersion
op-mode setting in spectrum1d?

--discussion---

allowed for isscalar and Spectrum1d where dispersion/units match.

metadata drop conflicting keys.

logging if that happens interpolator

for now simple np.interp? (scipy)

null and void. thinking about interpolation using wcs?

DECISION?: using np.interp coupled with scipy ….

mask interp error interp solutions to this are tied to wider solution forNDData

API questions

Copy vs inplace

meta data copy -- time intensive?
pythonic to do copy?

DECISION: copy default. a copy keyword with copy=True. metadata drop conflicting keys.

methods vs functions

when to use functions, when spectrum1d internal methods. argument plotting argument io

(NON-)DECISION deferred to later.

keywords vs functions names

group multiple similar functions together with keywords. if it doesn’t work we will implement, new function.

ongoing debate for new function.

Issues outside the Spectrum1D discussion

Error

Mask

From the pull request: #1

Instantiator
Ideas: s = Spectrum1D(flux=, dispersion=) vs. s = Spectrum1D.from_whatever()
@astrofrog I agree with this, it does indeed not make sense to initialize from anything other than two 1-D sequences (numpy array, list, tuple), because it is easy enough to specify two columns from a table, from a file, etc. and Spectrum1D should not really be implementing I/O. I think this does not mean that in future, one could not imagine having a 'read' method for well defined 1-D spectrum specific formats that don't have any ambiguities though.
@eteq I mostly agree with @astrofrog (and @adrn and @demitri) - we don't want any of the io stuff actually implemented here... The from_table is a bit less clear, to me - it seems needlessly verbose to type Spectrum1D(table['wavelength'],table['flux']) when I could type Spectrum1D.from_table(table). But then again, perhaps that also should be considered "io" and a io.from_table function that we make later could accept either a filename of a table OR a Table object.
@adrn @eteq regarding your point about from_table() -- I think the idea here is that Spectrum1D(table['wavelength'],table['flux']) vs. Spectrum1D.from_table(table, columns=[0,1]) (or similar) doesn't save you anything, because if you have non-standard column names you still end up specifying the columns. But regardless, I agree that it probably belongs somewhere in io.
@wkerzendorf I agree 100% there should only be one way to instantiate the Spectrum1D object. In my opinion this is init(self, data, error=None, mask=None, wcs=None, meta=None, units=None, copy=True, validate=True).

What I merely did was include the read-in methods, that you guys mentioned, within the Spectrum1D object. Obviously we could put them somewhere in specutils.io, but why? IMHO, they do belong to Spectrum1D as they are only useful with a Spectrum1D object. It would obviously be reflected in the doc_string of Spectrum1D, how to use these read methods.

A spectrum dispersion solution is in the core a world coordinate system. If you reduce spectra with IRAF or other software for that matter, one often fits some polynomial to get a mapping between pixel and dispersion space. I think it's not a good idea to just force every user to map out his WCS into an array, as there's definitely a loss of information that is important in some cases. In addition, the current implementation (just using flux and dispersion arrays) sees no way of specifying what units the dispersion axis is. For example, I would like to multiply my spectrum with a filter curve and subsequently integrate it to get a flux (I think Adam suggested this in one of the emails as well). I believe all the attributes in NDData are important for Spectrum1D and should therefore be used in Spectrum1D. Of course we could include the units in such a implementation, but as the wcs will have that anyways I don't see a reason to duplicate the effort.

As for my implementation it just set WCS equal to the lookup table (dispersion array), for now. But this is only a temporary measure until there's a proper WCS in place that deals with lookup tables. With a proper WCS class, when .disp is required it would return a lookup table generated from the true WCS.

It seems that the consensus is that there should be only one way to create a Spectrum1D object, and io operations don't belong in the Spectrum1D class.

Be verbose in naming

Abbreviate vs. Spell things out (e.g., .disp looks like display)

@astrofrog Also agree, I keep stumbling every time I see 'disp' too!
@eteq Agreed, except that "dispersion" is a bit long to type for something we may be using a lot (in an interactive/ipython setting). What about having dispersion be the actual name (which we will use in all examples and documentation), and add in a very-short alias like x to make interactive work quicker/easier (with a clear warning somewhere that it should not be used in a code/library setting)? Or is that too much rope to hang oneself with?
@adrn I agree that it is long, but I don't know how I feel about having both dispersion and x. What do others think?
@wkerzendorf I agree in most cases to be verbose. I think however that disp for now is fine. We spoke in great length about what we wanted to call that axis (and after 'wave' was shot down and that rightly so ;-) ), we decided between dispersion and disp. disp won and I still think that's a good name. It would be at a prominent place in the Spectrum1D docstring. We could have both I guess.

Consensus:

We should not use disp (use dispersion instead), but whether or not we implement a shortcut .x is still up in the air.

Scipy Dependency

Astropy has the requirement that if scipy is not available, it should not break the imports, so if scipy is not available, the interpolate method would have to be disable. Therefore, I am suggesting that for linear interpolation, np.interp should be used so that at least basic interpolation is available if scipy is not installed. But I'm all for allowing more complex interpolation if scipy is installed.

Spectrum1D is not an array

Ideas: Slicing on the Spectrum1D object itself e.g. spectrumObject[100:200] vs. Slicing methods e.g. spectrumObject.slice_wavelength[1001.41:2000.52]

@adrn I think the point is just what you say: we shouldn't allow spectrum[0:100] because (as a general comment) a Spectrum is not an array!
@wkerzendorf I believe that was taken a little bit out of context. My slice operations had a switch, what units to use with slicing. The essence of the spectrum1d object is the flux array. I believe it is pretty clear, what is meant by slicing (in the sense that we cut out a bit of the array, not in the what values we are slicing on). I also believe that slicing should always return a new Spectrum1D array. Many other functions in python work that way (slicing ndarrays, slicing lists, ….). I have had great success with overloading the getitem operator and using this as slicing in dispersion units. It's a simple and obvious convention that many people in my old department liked. I propose having the overload in addition to slice.
@adrn But I strongly disagree with overloading getitem -- a Spectrum is not an array, so spectrumObject[100:200] doesn't make sense! It is much less ambiguous to do spectrumObject.slice_index(100,200)

Other decision items

It was agreed that most things that require assumptions (like spec[100:2000] being on wavelength or index or whatever) should be implemented in subclasses like ``OpticalSpectrum1dorRadioSpectrum1D` or wahtever), and the base class shouldn't make those assumptions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly