About Dimensionality #8
-
Congrats on your package. Should I run it on data with more than 300K dimensions? I have some methylation data with an ultra-high volume of features. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Thank you very much, and Ι even more, thank you for your question. Most of the algorithms in our package use a projection of the data in its first principal component to determine the dissimilarity of the data (because we are talking about discriminative clustering). They then process the data on the principal component to split it. This process circumvents the curse of dimensionality that data of too many dimensions like the data you describe to me suffer. So to answer your question, yes, you can run the PDDP, dePDDP, iPDDP, and kM-PDDP packet algorithms on your data. |
Beta Was this translation helpful? Give feedback.
Thank you very much, and Ι even more, thank you for your question.
Most of the algorithms in our package use a projection of the data in its first principal component to determine the dissimilarity of the data (because we are talking about discriminative clustering). They then process the data on the principal component to split it. This process circumvents the curse of dimensionality that data of too many dimensions like the data you describe to me suffer.
So to answer your question, yes, you can run the PDDP, dePDDP, iPDDP, and kM-PDDP packet algorithms on your data.