genes are so essential for triggering the immune system, that we are able to use these genes to foretell an individual’s immune response. Right here I’ll show easy methods to estimate illness charges simply from immune gene frequencies. All of the steps from getting the immune gene knowledge, to figuring out excessive threat international locations, and assessing limitations of the mannequin are mentioned and the total code is out there at github.com/DAWells/HLA_spondylitis_rate.
HLA genes are related to an individual’s response to an infection, vaccination, and sometimes very strongly linked to autoimmune ailments. So strongly linked actually, that in massive teams we are able to predict illness charges from HLA gene frequencies. HLA frequencies are extensively studied and so typically accessible, permitting us to estimate charges of autoimmune circumstances which can be lacking or inaccurate as a result of challenges of analysis. On this publish we’ll mix research to generate correct estimates of immune gene frequencies and use these to foretell nationwide charges of ankylosing spondylitis.
allelefrequencies.net is a database of human immune gene frequency knowledge from the world over which is an open entry, free and public useful resource (Gonzalez-Galarza et al 2020). Nevertheless, it may be tough to obtain and mix knowledge from a number of initiatives; this makes it exhausting to reap the benefits of all this knowledge. Fortunately HLAfreq
is a python bundle which makes it straightforward to get the most recent knowledge from allelefrequencies.web and put together them for our evaluation. (Full disclosure, I’m one of many authors of HLAfreq!).
Ankylosing spondylitis is a type of arthritis, and 90% of sufferers have a particular model of the HLA B gene. To get the frequency of this model in numerous international locations, I downloaded all accessible frequency for this gene and mixed research of the identical nation, weighting by pattern dimension. In short, the mixture is predicated on the Dirichlet distribution and we are able to use a Bayesian method to estimate uncertainty too. Singapore is used for instance within the determine under (all figures on this article are generated by the creator). Completely different HLA-B gene variations (often known as alleles) are proven on the y axis, with their frequency in Singapore on the x axis. Information from the unique Singapore research are proven in color, and mixed estimates in black. I targeted on the weighted common on this evaluation, which is proven by the black circles. HLAfreq additionally calculates a Bayesian estimate with uncertainty which is indicated by the black bars.
The code used to obtain, mix, and plot the HLA-B allele frequency knowledge for Singapore is under.
# Obtain uncooked knowledge
base_url = HLAfreq.makeURL(“Singapore”, commonplace="g", locus="B")
aftab = HLAfreq.getAFdata(base_url)
# Put together knowledge
aftab = HLAfreq.only_complete(aftab)
aftab = HLAfreq.decrease_resolution(aftab, 1)
# Mix knowledge from a number of research
caf = HLAfreq.combineAF(aftab)
hdi = HLAhdi.AFhdi(aftab, credible_interval=0.95)
caf = pd.merge(caf, hdi, how="left", on="allele")
# Plot gene frequencies
HLAfreq.plotAF(caf, aftab.sort_values("allele_freq"), hdi=hdi, compound_mean=hdi)
Now we have now the nationwide allele frequencies we are able to pair them with nationwide illness charges to review the correlation. I’ve used the illness charges reported in Dean et al 2014. I log reworked the illness fee to make it usually distributed so I may match an extraordinary least squares linear regression. As anticipated, there was a big optimistic correlation; international locations with increased frequencies of HLA-B*27 had increased charges of ankylosing spondylitis. The exception to this was Finland which had an unusually excessive frequency of HLA-B*27 however a middling fee of illness. I eliminated Finland from the mannequin as an outlier, a choice which was supported by “statistical leverage”. (Leverage means this one level had too massive an affect on the general mannequin; we would like the mannequin to inform us about international locations basically not anybody nation specifically).
We will use our linear regression mannequin to foretell charges of ankylosing spondylitis in international locations the place we all know the HLA-B*27 frequency. This tells us that international locations like Austria and Croatia have excessive predicted ankylosing spondylitis charges. Utilizing these predictions will increase the variety of international locations with illness fee estimates from 16 to 52 and can assist establish international locations that might profit from extra surveillance. On this planet map under, international locations with low identified or predicted charges of ankylosing spondylitis are plotted in blue and excessive charges in yellow. International locations with identified charges are outlined in black and people with predicted charges are outlined in cyan or orange. Cyan is used for international locations within the vary of our mannequin and orange is used for international locations outdoors our mannequin’s vary, see under for why that is essential.

We needs to be cautious about predicting illness charges for international locations with HLA-B*27 charges outdoors of the vary of our mannequin. Of the 36 international locations we have now predicted illness charges for, 10 have HLA-B*27 frequencies increased or decrease than any nation we utilized in our mannequin. Due to this fact, we are able to’t ensure the mannequin will give correct predictions for these international locations. Particularly, predictions could also be unreliable for international locations with excessive HLA-B*27 charges, we already know that Finland didn’t match our mannequin. This might be due to a non-linear development however we wouldn’t have sufficient knowledge to discover these excessive frequencies.

The international locations with identified illness charges are plotted with stuffed factors. Finland which was omitted from the mannequin is plotted in purple. The expected illness charges are plotted as open circles, cyan for international locations within the mannequin’s vary and orange outdoors of it. The boldness intervals of the mannequin are proven as dashed strains, and the prediction intervals are proven as a gray ribbon. A fast reminder concerning the distinction: we count on the true relationship to fall inside the confidence intervals 95% of the time, and we count on 95% of information factors to fall inside the prediction intervals.
It’s value taking a second to remind ourselves that regardless of this correlation, there are lots of different elements influencing illness charges. Clearly a person’s probability of creating ankylosing spondylitis can also be impacted by their setting and different genetic elements. So if we needed actually correct illness fee predictions we would want take into account these different variables. However given how straightforward it’s to get HLA frequency knowledge, it’s a fairly spectacular predictor for a illness that may take years to diagnose.
Conclusion
HLA genes have a robust influence on human well being by means of an infection, vaccination, autoimmune ailments, and organ transplants. Due to these sturdy relationships, we are able to use extensively accessible HLA frequency knowledge to review these well being traits not directly. Sources like allelefrequency.net and HLAfreq make it simpler to review these relationships, both by these correlations immediately or utilizing allele frequencies as a proxy when different knowledge is lacking. I hope this publish has received you enthusiastic about inquiries to ask utilizing HLA frequency knowledge.
References
Gonzalez-Galarza, F. F., McCabe, A., Santos, E. J. M. D., Jones, J., Takeshita, L., Ortega-Rivera, N. D., … & Jones, A. R. (2020). Allele frequency web database (AFND) 2020 replace: gold-standard knowledge classification, open entry genotype knowledge and new question instruments. Nucleic acids analysis, 48(D1), D783-D788.
Dean, L. E., Jones, G. T., MacDonald, A. G., Downham, C., Sturrock, R. D., & Macfarlane, G. J. (2014). International prevalence of ankylosing spondylitis. Rheumatology, 53(4), 650-657.
Wells, D. A., & McAuley, M. (2023). HLAfreq: Obtain and mix HLA allele frequency knowledge. bioRxiv, 2023-09. https://doi.org/10.1101/2023.09.15.557761