-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for 0-length B arrays in SAM files to conform to 1.6 spec #1194
Conversation
@yfarjoun Could you take a look at this to make sure I haven't done anything crazy... I removed all the checks for 0 lengths that I could find and the test passes for bam/sam/cram but it's possible there's something else relying on it that I haven't though of. |
Codecov Report
@@ Coverage Diff @@
## master #1194 +/- ##
===============================================
+ Coverage 68.414% 68.715% +0.302%
- Complexity 8019 8065 +46
===============================================
Files 542 543 +1
Lines 32701 32751 +50
Branches 5530 5541 +11
===============================================
+ Hits 22372 22505 +133
+ Misses 8125 8043 -82
+ Partials 2204 2203 -1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
final int length = Array.getLength(value); | ||
if(length == 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space after if
@@ -316,6 +326,16 @@ private Object covertStringArrayToObject(final String stringVal) { | |||
} | |||
} | |||
|
|||
private static Object createEmptyArray(char elementType) { | |||
switch( Character.toLowerCase(elementType) ) { | |||
case 'c': { return new byte[0]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need {
}
around case
or default
body. Also, usually case body statements are formatted across lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reformatted
@@ -316,6 +326,16 @@ private Object covertStringArrayToObject(final String stringVal) { | |||
} | |||
} | |||
|
|||
private static Object createEmptyArray(char elementType) { | |||
switch( Character.toLowerCase(elementType) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
odd whitespace here, I would expect to see switch (Character.toLowerCase(elementType)) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
.setCreateMd5File(false) | ||
.setCreateIndex(false); | ||
final Path reference = IOUtil.getPath("src/test/resources/htsjdk/samtools/one-contig.fasta"); | ||
try(final SAMFileWriter samFileWriter = writerFactory.makeWriter(samRecords.getHeader(), false, tmp, reference)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space after try
here and on line 1237 below
|
||
@DataProvider | ||
public Object[][] getEmptyArrays(){ | ||
final String BAM = BamFileIoUtils.BAM_FILE_EXTENSION; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be replaced with streaming operations, e.g.
public Iterator<Object[]> getEmptyArrays() {
return Stream.of(BamFileIoUtils.BAM_FILE_EXTENSION, IOUtil.SAM_FILE_EXTENSION, CramIO.CRAM_FILE_EXTENSION)
.flatMap(ext -> Stream.of(new int[0], new short[0], new byte[0], new float[0])
.map(array -> new Object[]{array, array.getClass(), ext})).iterator();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
personally, I do not find that to be more clear...less line? yes...but not more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the sentiment but not this particular stream operation. Instead I've added a new cartesianProduct method which allows you to combine dataproviders in a nice way.
if(length == 0){ | ||
return ""; | ||
} | ||
final StringBuilder ret = new StringBuilder(Array.get(value, 0).toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be tested with non-zero length.
case 'c': { return new byte[0]; } | ||
case 's': { return new short[0]; } | ||
case 'i': { return new int[0]; } | ||
case 'f': { return new float[0]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what to do with F
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not accept F
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not accepting F
|
||
final int numberOfTokens = StringUtil.splitConcatenateExcessTokens(stringVal, elementTypeAndValue, ','); | ||
if (!(numberOfTokens == 1 || numberOfTokens == 2)) { | ||
throw new SAMFormatException("Tag of type B requires an element type"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a test-case that will fall into this hole?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't come up with one... removed this...
Assert.assertEquals(record.getAttribute(arrayTag), new char[0]); | ||
record.setAttribute(arrayTag, null); | ||
Assert.assertNull(record.getStringAttribute(arrayTag)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove excess nls
Assert.assertEquals(attribute.getClass(), expectedClass); | ||
Assert.assertEquals(Array.getLength(attribute), 0); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove NLs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the suggestions are a nice addition.
* adding explicit test dependency on a recent version of guava, we already have a transitive test dependency on an older version so this shouldn't be a big deal (unless of course it breaks something unexpected) * adding new method for combining dataproviders to TestNGUtils
8147e99
to
f7c46fb
Compare
@yfarjoun Could you take a further quick look when you get a chance, I made some additional changes to things including changing our test dependencies slightly and added additional tests. The way that arrays and particularly unsigned arrays are retrofitted into the tag encoder is really gross and I want to replace it. |
} | ||
|
||
@Test(dataProvider = "getArraysAndLists") | ||
public void testObjectsToLists(Object[][] objects, List<List<Object>> lists){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many cases here where the code is missing a space before a {
, and missing a space after a comma above (e.g. line 36). It might be easier to reformat the file completely in the IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done... I'm not sure it helps much though, this is a gross block of cases
/** | ||
* @return the cartesian product of two or more DataProviders than can be used as a new dataprovider | ||
*/ | ||
public static Object[][] cartesianProduct(Object[][] ... dataProviders){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to write this and preserve types (e.g. avoid SuppressWarnings). I'm pretty sure it can also be written using generics (vs Object
) but didn't have a chance to finish that.
public static Object[][] cartesianProduct(Object[][]... dataProviders) {
List<List<List<Object>>> lists = Arrays.stream(dataProviders)
.map(Main::nestedArraysToNestedLists)
.collect(Collectors.toList());
final List<List<List<Object>>> product = Lists.cartesianProduct(lists);
final List<List<Object>> mergeProduct = product.stream()
.map(list -> {
List<Object> result = new ArrayList<>();
list.forEach(result::addAll);
return result;
}).collect(Collectors.toList());
return nestedListsToNestedArrays(mergeProduct);
}
/**
* @param dataProvider a nested Object array
* @return an equivalent nested List
*/
public static<T> List<List<T>> nestedArraysToNestedLists(T[][] dataProvider) {
return Arrays.stream(dataProvider)
.map(Arrays::asList)
.collect(Collectors.toList());
}
/**
* @param lists a nested List
* @return an equivalent nested array
*/
public static Object[][] nestedListsToNestedArrays(List<List<Object>> lists) {
return lists.stream().map(List::toArray).toArray(Object[][]::new);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think stream().map(... list.forEach().. return)
is really flatMap
. something like
final List<List<Object>> mergeProduct = product.stream().flatMap(list -> list.stream()).collect(Collectors.toList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 👍 to the first comment, thats better.
- I don't think this can be done with generics in any sane way, since the It's an Object[][] of totally unrelated types. There's no shared type other than Object. Also, it's what dataproviders expect.
- That flatMap flattens the wrong level of list, this works but I'm not sure it's any clearer
final List<List<List<Object>>> product = Lists.cartesianProduct(lists);
final List<List<Object>> mergeProduct = product.stream()
.map( l -> l.stream()
.flatMap(Collection::stream)
.collect(Collectors.toList()))
.collect(Collectors.toList());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, not sure much can be done here to make the code clearer. Regarding generics you can definitely make cartesianProduct
generic. I ran into trouble with nestedListsToNestedArrays
though. But as you say, the caller is always passing Object
so not much is gained there.
@yfarjoun Did you have any final comments on this one? I'd like to get it in soon. |
final int length = Array.getLength(value); | ||
for (int i = 1; i < length; ++i) { | ||
final StringBuilder ret = new StringBuilder(); | ||
for (int i = 0; i < length; ++i) { | ||
ret.append(','); | ||
ret.append(Array.get(value, i).toString()); | ||
} | ||
return ret.toString(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove nl
if (StringUtil.splitConcatenateExcessTokens(stringVal, elementTypeAndValue, ',') != 2) { | ||
throw new SAMFormatException("Tag of type B should have an element type followed by comma"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove NL
final char elementType = elementTypeAndValue[0].charAt(0); | ||
if (numberOfTokens == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be removed and fall through with length==0?
public void testEmptyArrayAttributeHasNoCommaWhenWrittenToSAM(){ | ||
final SAMFileHeader header = new SAMFileHeader(); | ||
final SAMRecord record = new SAMRecord(header); | ||
record.setAttribute("xa", new int[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARRAY_TAG?
? textTagCodec.encode(tagName, array) | ||
: textTagCodec.encodeUnsignedArray(tagName, array); | ||
Assert.assertEquals(encodedTag, expectedTag); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extra NL
@@ -57,4 +64,6 @@ public static Path getTribbleFileInJimfs(String vcf, String index, FileSystem fi | |||
} | |||
return Files.copy(vcfPath, vcfDestination); | |||
} | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove extra NLs
@DataProvider | ||
public Object[][] getArraysAndLists() { | ||
return new Object[][]{ | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add empty set,
add actual products of more than 2 lists (i.e. with more than one element each)
add a prodcut with an empty set
add a product with a empty set object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for keeping us compliant!
List<List<List<Object>>> lists = Arrays.stream(dataProviders) | ||
.map(TestNGUtils::nestedArraysToNestedLists) | ||
.collect(Collectors.toList()); | ||
final List<List<List<Object>>> product = Lists.cartesianProduct(lists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh, didn't realize this required adding a guava dependency. It seems like there should be a way to do this using jdk 8 streams directly, by using I can see how to do this for 2, 3, etc, but not sure how to do it for N.map(...flatMap())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not actually adding a guava dependency. We have a it as a transient test dependency already through jimfs. I'm just making it an explicit dependency and also bumping to the most recent version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it's still a test dependency.
@yfarjoun Any further comments? I'd like to get this merged soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. thanks!
@yfarjoun Some day I'll write a commit without a typo... but today is not that day. |
Unfortunately there was (at least) one 0-length check that was missed, in Now addressed by PR #1669. |
This function was omitted from PR samtools#1194. Zero-length 'B' arrays are valid, so remove this check too and add a test case.
* Remove setUnsignedArrayAttribute zero-length-array check This function was omitted from PR #1194. Zero-length 'B' arrays are valid, so remove this check too and add a test case. * Remove Slice setAttribute zero-length-array check Slices' optional tags follow the rules for BAM tagged fields, so zero-length 'B' arrays are valid. Slice tags are currently unused, so the question is somewhat moot and there are no test cases. * removed unused setAttribute methods --------- Co-authored-by: John Marshall <[email protected]>
Making changes to allow 0-length arrays in SAM because this is allowed in SAM 1.6
Checklist