Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inquiry Regarding MixMHC2pred Output and PWM Calculation #17

Open
winnieWei123456 opened this issue Sep 23, 2024 · 1 comment
Open

Inquiry Regarding MixMHC2pred Output and PWM Calculation #17

winnieWei123456 opened this issue Sep 23, 2024 · 1 comment
Labels
question Further information is requested

Comments

@winnieWei123456
Copy link

Dear developers of MixMHC2pred,

I hope this message finds you well. I am writing to seek clarification on the output of the first block of MixMHC2pred. In your article "Machine learning predictions of MHC-II specificities reveal alternative binding mode of class II epitopes," you mentioned that the output of the first block is the PPM. However, when reviewing the results, I noticed that the PPM contains values greater than 1. Shouldn't the sum of the frequencies of the 20 amino acids at each position be equal to 1?

Furthermore, I also took a look at your PWMdef file and noted the presence of 'PWM_norm'. What does 'norm' mean?Considering your statement in the F.A.Q section on the Motif Atlas website that "A correct description of the MHC binding specificity needs to account for this bias by renormalizing the amino acid frequencies computed in the raw ligands," I am curious to know if this PWM matrix has undergone frequency normalization. Could you please explain how this frequency normalization process is conducted?

I am interested in obtaining the original PWM. Could you provide guidance on how to calculate this?

I appreciate your time and assistance in addressing these queries. Thank you for your attention to this matter.

Warm regards,
Winnie

@jracle85
Copy link
Member

Hello Winnie,

The "PWM" files that we use in our predictor correspond to

$$PWM^{a,s}_{l,i} = \frac{PPM^{a,s}_{l,i}}{f_i}$$

(as in the Equation 2 from our 2023 Immunity paper); where the PPMs are the position probability matrices and $f_i$ is the frequency of amino acid $i$ in the human proteome, used to normalize these PPMs. The PPMs will sum to 1 and not the PWMs. We used the following human proteome frequencies:
A=0.0693, C=0.0218, D=0.0483, E=0.0721, F=0.0355, G=0.0651, H=0.0258, I=0.0434, K=0.0577, L=0.0983, M=0.0219, N=0.036, P=0.0632, Q=0.048, R=0.0565, S=0.084, T=0.0545, V=0.0602, W=0.0122, Y=0.0261.

Best regards,

Julien

@jracle85 jracle85 added the question Further information is requested label Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants