Skip to content

Model-based clustering for aCGH data using variational EM

Notifications You must be signed in to change notification settings

gyom/hmmmix-soft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%                                                       %
% UBC 2007-2009. Guillaume Alain, [email protected]    %
%                                                       %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% 1. Some notes
% 2. Installation procedure.
% 3. About k-medoids
% 4. Permission to use the code.

%%%%%%%%%%%%%%%%%%%%%%
%%% 1. Some notes  %%%
%%%%%%%%%%%%%%%%%%%%%%

% This code contains the essential pieces of the hmmmix-soft algorithm described in my
% thesis at UBC, entitled "Model-based clustering for aCGH data using variational EM".
% All the scripts to orchestrate the experiences that I carried are not found here, but
% there is one file called "script_to_compare_hmmsoft_inRAM_vs_onHD.m" that should serve
% as a basis for using this code.
%
% The hmmmix-soft algorithm is my version of the hmmmix algorithm from Sohrab Shah. Refer
% to Sohrab Shah's PhD thesis if needed.
%
% This code is provided more or less "as is", but I did try to do a good job at packaging
% it so I could be built from scratch and used easily. My apologies to anyone finding the
% documentation insufficient. The final steps in the production of this "package" for
% distribution to the general public were done after I was finished with my experiments
% for my thesis. This means that, although I have tested my code, performed unit tests and
% such, I've never really used the final package myself. This limits how confident I can be
% when I say that it's working fine.


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% 2. Installation procedure  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% This procedure was tested on Mac OSX 10.5.8 with Matlab R2008b
% and on Linux (Open Suse 10.3) with Matlab R2007b.
% To install this on Windows you need a C compiler because a lot of functions
% were written in C when vectorization was impossible. There is nothing
% fundamentally related to Linux/Unix in the code that would cause problems,
% but the compilation method might differ slightly depending on your compiler.
%
% For the sake of the discussion, let's assume that you want to install the
% code for hmmmix-soft code into the directory
% /home/gyom/.matlab/work/hmmsoft
% and that the contents of the zip archive has alread been expanded into that
% directory, keeping the internal structure of the archive.
% I don't personally recommend adding anything other than the main directory into the
% Matlab path. That is, don't add it recursively because you will catch the 'optional_kmedoids_wrapper'
% directory that overshadows some of Sohrab Shah's functions from his CNAhmmer package.


cd /home/gyom/.matlab/work/hmmsoft

% Compile the C sources for certain functions. If you get a problem, open
% the compile_hmmmix_soft.m file and run the commands line by line to diagnose the problem.
compile_hmmmix_soft

% Now let's run some unit tests to be sure that some of the basic functions work properly.
% In the 'reference' directory there are Matlab implementations for functions such as
% the forwards-backwards algorithm whose results should match those of the C implementation.
% You can always add the 'reference' directory to your path if you don't mind catching these
% functions, but they are not required for the rest of the algorithm. I would advise to only
% cd into that directory at installation to check the C functions compiled and then never go
% into that directory again.
cd reference
unit_test_normalize
unit_test_viterbi_path
unit_test_fwd_back_MatlabC
unit_test_hmmmix_frugal_hM_KTg_MatlabC


% Inside the 'reference' directory, you can run this function to make sure that both
% implementations of hmmmix-soft (in memory and on hard drive) produce the same results.
% This is more of a sanity check than anything. If this passes, it probably means that
% you're clear and there is not problem in the installation. This script can also serve as
% an example of usage for my implementation of the hmmmix-soft algorithm. There is a small issue
% where the results can differ by almost nothing. I decided to draw the line there and not
% track down this minor issue. See inside the script for further comments.

script_to_compare_hmmsoft_inRAM_vs_onHD
cd ..

%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% 3. About k-medoids  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%

% There is a function called 'kmedoids_inRAM_wrapper_to_Sohrab_code' that uses
% Sohrab Shah's k-medoids code to initialize the patients assignments. To use it, you
% need to have his CNA-HMMer code on your Matlab path. On top of that, you need to add the 
% 'hmmmix_soft/optional_kmedoids_wrapper' directory to your path when using the function
% 'kmedoids_inRAM_wrapper_to_Sohrab_code'. It was necessary for me to overshadow some of Sohrab's
% original functions to get the functionality that I wanted, but this is not something that you
% want to do permanently if you are using his CNA-HMMer code regularly.
%
% I'm personally using a more recent version of the code available on his web site. The code that I
% use dates from around January 2009 and the directory is named "CNA-HMMer-spec". I just added 
% all the directory from his code recursively and it works for me, but the important thing is that
% the functions from my 'optional_kmedoids_wrapper' are found BEFORE those of CNA-HMMer-spec that they
% are intended to overshadow.
%
% You can test the k-medoids functions by uncommenting lines 67-68 in
% 'script_to_compare_hmmsoft_inRAM_vs_onHD'. This should indicate how the 
% 'kmedoids_inRAM_wrapper_to_Sohrab_code' function is meant to be used.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% 4. Permission to use the code  %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% I don't think there is much money to be made with this code, but I'd still like to get credit
% for what I did. So, there you have it. That makes it more of a BSD license than a GPL license.
% I would be really happy if my work was actually used by others.

% I wrote everything in the root directory except mexutils.c, mexutils.h, repmatC.c and normalize.m.
% Some functions in the 'reference' directory come from Kevin Murphy and Sohrab Shah. Most of what is
% in the 'option_kmedoids_wrapper' comes from Sohrab Shah more or less directly.

% Finally, I should say that, at the time I release this code, I don't plan to work on it anymore.
% It was a nice thing to be able to work on this problem for my thesis, but it's not really my "field"
% and I won't be the one maintaining this code if it's actually used for serious research. In fact,
% if you start finding bugs that you can fix and if you want to integrate this code into your own,
% you are free to do so. It'd be nice to send me an email if you do that, though.



 

About

Model-based clustering for aCGH data using variational EM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published