median_correlations
now has more columns, regardless of the setting forbetween_classes
, to provide consistency.median_correlations
output gains an extra column,group
to allow easier filtering of within and between class comparisons.
- Fixed behavior of
visqc_test_pca_scores
to skip testing attributes that either have a single value, or the number of numeric values matches the number of samples.
-
Changed some functions to treat columns as samples and rows as features:
keep_non_zero_percentage
summarize_data
calculate_fratio
-
Added
keep_non_missing_percentage
, which allows using multiple values to represent missingnes. -
Made
summarize_data
handle possible missing values. -
Removed correlation calculation functions, those have been superseded by ICIKendallTau.
- Added a new argument
only_high
todetermine_outliers
to only look at the high end of the score distribution for outliers, as sometimesboxplot.stats
will pick up outliers at the low end as well.
- Updated the quality_control vignette to use ICIKendallTau instead of other correlation measures.
- Windows and Mac binaries are now available via r-universe, and installation instructions are updated to reflect that.
- Updated determine_outliers to be able to use either the output from median_correlations or outlier_fraction singly or together. If using one or the other alone, I suggest explicitly naming the arguments so that the correct entry is set to NULL and the other one used.
- Updated the README to show using ici_kendalltau instead of the it_weighted_correlation.
- Updated tests, and moved to testthat v 3.
- Updated pkgdown for rendering the help site.
- Moving all of the ICI-Kendall-tau code into it's own package, ICIKendallTau. This reduces the dependencies necessary if all you want is to run a fast Kendall-tau.
-
Making the splitup version of ICI-Kendall-tau the "implementation" (
visqc_ici_kendallt
), and using a single core if the user doesn't setup a "plan" first. A reference version still exists so we can run tests against it, but it is no longer exported for general users. -
Also inlined the C++ sign function, which gave us another 3X speedup on my 8 core machine on a larger test data set.
- Now throw an error if X and Y are not the same length in
ici_kendallt
.
- Added a function for calculating the information-content-informed Kendall-tau
correlation,
ici_kendallt
, and variants around calculating all pairwise correlations between samples;visqc_ici_kendallt
andvisqc_ici_kendallt_splitup
for parallel processing.
-
Removed requirement for
ggbiplot
, instead we added a function for calculating the variances of each of the PCs in the scores. -
updated the vignette accordingly.
-
Now using
globally_it_weighted_correlation
andlocally_it_weighted_correlation
instead ofpairwise_correlation
.
keep_non_zero_percentage
gains an argument,all
, that defaults toFALSE
to keep previous behavior. Settingall = TRUE
means that the value must be non-zero in at least X% of all of the sample classes.
median_correlations
gains a new argument,between_classes
to generate the median values to samples in other classes. This causes the appearance of two more columns when set to TRUE. The default is FALSE, so hopefully this does not cause current code to misbehave, but I've bumped the version number as a warning.
-
Augmented correlations (
weight = TRUE
) should be much more useful and interpretable. -
information_volume
andcorrespondence
calculations improved. Namely thatinformation_volume
is being scaled by the maximum. -
correspondence
by default does not consider presence of zeros in both samples to be informative, this can be changed by settingnot_both = TRUE
. The default is more useful in cases where there are lots of features and the data is sparse, and zeros are likely to happen by chance. -
In addition to returning the
cor
matrix andkeep
matrix,pairwise_correlations
now returns theraw
correlations, and the weighting matricesinfo
andcorrespondence
so that each one can be examined. -
The diagonal of
info
weighting corresponds to how many features a sample has compared to the sample with the most features.
-
Added two functions,
information_volume
andcorrespondence
to calculate weights based on the amount of things that are non-zero in both things when doing pairwise correlation. -
Added logical argument
weight
topairwise_correlation
to weight the correlations. Ifweight = TRUE
, the diagonal will not be 1 anymore, but instead will reflect how many features out of the total are in that sample.
- A bug was discovered in
median_correlations
that meant the wrong sample ids might be added to the output data, making detection of real problems difficult
-
pairwise_correlation
now usescor
internally directly, whereas previously it did afor
loop to allow pairwise comparisons. This makes the correlations 3x faster. -
count
has been removed from the list returned bypairwise_correlation
-
new function
pairwise_correlation_count
to get the counts in each pairwise comparison
- Changed correlation function to return a list instead of a matrix. This
list contains the correlations (
cor
), counts in each correlation (count
), and which points passed the criteria (keep
).