Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

impl fitsio::tables::ReadsCol for OSString #246

Open
d3v-null opened this issue May 15, 2023 · 7 comments
Open

impl fitsio::tables::ReadsCol for OSString #246

d3v-null opened this issue May 15, 2023 · 7 comments

Comments

@d3v-null
Copy link

Hello,
I've had the misfortune of encountering a fits file in which random non-utf8 junk bytes are present after a null byte in a string column.
Because fitsio impls ReadsCol for String, but not OSString, this means that it can't read these columns at all.

Here's what the data looks like in Python:

>>> from astropy.io import fits
>>> hdus=fits.open('file.uvf')
>>> print(repr(hdus['AIPS AN'].data['ANNAME'][0]))
b's0000\x00\xff\xff'

The first ANNAME should just be s0000. If there were an impl OSString, I could trim these junk bytes before converting to string and read the column correctly.

Thanks

@simonrw
Copy link
Owner

simonrw commented May 15, 2023

Hi @d3v-null, thanks for raising this issue. I think you're right - the string handling needs to be better in rust-fitsio. I think you're probably right that these need to be OsStrings since they are not null terminated in cfitsio. It might make the rest of the string handling simpler as well since I think we're stripping trailing junk data from the strings as part of the reading process. If we read as OsString then we can strip in a cheaper way (I think, it's been a while... 😂).

@simonrw
Copy link
Owner

simonrw commented May 16, 2023

I don't suppose you have (or can you generate) a test file I can use for my testing?

@simonrw
Copy link
Owner

simonrw commented May 24, 2023

@d3v-null do you have an example file that you can share with me? I'd like to investigate further the use of non-unicode characters in table values.

@d3v-null
Copy link
Author

d3v-null commented May 30, 2023

Hey @simonrw , sorry it took me so long to respond. I deleted my copy of the corrupt file and had to figure out how to get you a copy of the broken hdu without sending you the entire 9GB file. Here's a file with a copy of the AIPS AN table in question.

>>> test['AIPS AN'].data
/usr/lib/python3/dist-packages/astropy/io/fits/fitsrec.py:699: UserWarning: Field 2 has a repeat count of 0 in its format code, indicating an empty field.
  warnings.warn(
FITS_rec([('s0000\x0002', [-2564976.05653377,  5085352.7292609 , -2861040.11633764]),
...
         dtype=(numpy.record, [('ANNAME', 'S8'), ('STABXYZ', '>f8', (3,)), ('ORBPARM', '>f8', (0,)), ('NOSTA', '>i4'), ('MNTSTA', '>i4'), ('STAXOF', '>f4'), ('POLTYA', 'S1'), ('POLAA', '>f4'), ('POLCALA', '>f4'), ('POLTYB', 'S1'), ('POLAB', '>f4'), ('POLCALB', '>f4'), ('DIAMETER', '>f4')]))

the ANNAME column should just say s0000, not s0000\x0002

test.fits.zip

The original file is from here if you're curious

The other files are corrupt in different ways, but the root problem is the non-utf8 characters.

Cheers!

@simonrw
Copy link
Owner

simonrw commented Jun 6, 2023

Hi, sorry I am confused here. If I read the data from that file using a String column, the column names read until the null byte ("\x00") in this case, and s0000 is returned correctly for the first row.

I'm happy to also add reading as OsString since the caller can do their own string manipulation, but I'm curious. The String implementation already reads until the null character anyway.

@simonrw
Copy link
Owner

simonrw commented Apr 5, 2024

@d3v-null have you had a chance to reflect on my last comment?

@d3v-null
Copy link
Author

d3v-null commented Apr 17, 2024

Hey Simon,
when I use the example below, I get the s0000, but instead of stopping at the null byte, skips it, and \x0002, becomes an ASCII 02 at the end. I would actually prefer it if it stopped at the null byte like you described.

Otherwise, you could be right about OsString not handling things better, maybe Vec<u8> be a better option then?

use std::error::Error;
use std::path::PathBuf;
use fitsio::FitsFile;

fn run() -> Result<(), Box<dyn Error>> {
    let mut fitsfile = FitsFile::edit(PathBuf::from("test.fits"))?;
    fitsfile.pretty_print().expect("printing fits file");
    let ant_hdu = fitsfile.hdu("AIPS AN")?;
    for name in ant_hdu.read_col::<String>(&mut fitsfile, "ANNAME")? {
        println!("{:?}", name);
    }
    Ok(())
}

fn main() {
    run().unwrap();
}
cargo run --example dev
  file: "test.fits"
  mode: READWRITE
  extnum hdutype      hduname    details
  0      IMAGE_HDU               dimensions: [], type: UnsignedByte
  1      BINARY_TBL   AIPS AN    num_cols: 13, num_rows: 512
"s000002"
"s000102"
"s000202"
"s000302"
"s000402"
"s000502"
"s000602"

compared to astropy fits, which gives me

python -c "from astropy.io import fits; names=fits.open('test.fits')['AIPS AN'].data['ANNAME'];  print('\n'.join(map(repr, names)))"
's0000\x0002'
's0001\x0002'
's0002\x0002'
's0003\x0002'
's0004\x0002'
's0005\x0002'
's0006\x0002'
's0007\x0002'
's0008\x0002'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants