NGS Bioinformatics

I recently started working on a GWAS experiment involving two candidate gene chips. The genomic inflation factor on these is high and I need to find a way to bring it down. If you have no idea what I'm talking about, welcome to the club. I had no idea either. I came across a couple of papers and tutorials that did a pretty good job of explaining how to do a GWAS analysis. 3 of them are from Nature Protocols (or Nature Methods). One of the things I need to do with normalize/control for ethnicity. Doing so from the candidate gene chips meant getting the bead ids of each sample, determining the blinded sample id (SID), from the SID get the original GWAS chip, and map the SID to the GWAS chipid+chipsection. Merge the GWAS chip data with HapMap3, do a PCA on the merged data, cluster the results and determine where the samples fall in ethnicity group in relation to the HapMap data. This wasn't an easy process at the GWAS data spanned over 12 chips and not all of the data is clean. There was also some converting from Illumina 1/2 to top strand to forward strand issues as well. I will post the full pipeline I used once this is done. In any case, I got the majority of the ethnicity data from the GWAS data and tied it back to the CG chip data. I need to now do a test of association using logistic regression and include ethnicity as a covariate. The plink command I have is 'plink --file cg_data --logisitic --covar ethnicity.covar --out data. But, what is ethnicity.covar? No information on this.

I now have come across http://www.cureffi.org/2012/10/15/population-covariates-using-1000-genomes/ which seems to be useful and I am now reading through it. Unfortunately its for use with VCF files, which I don't have so I'm not sure how much of what I need to do changes. I'm sure the methodology is the same, but the parameters are different. Maybe this (http://sites.tufts.edu/cbi/files/2013/02/GWAS_Exercise6_Stratification.pdf) will provide me with what I'm looking for, unfortunately, they say to copy the a file containing the covariates, but don't describe HOW to generate the covariates.

I think I'm going to copy of the PC1 and PC2 from the PCA analysis and try and use that.

NGS Bioinformatics

Thursday, June 6, 2013

No comments:

Post a Comment

Followers

Blog Archive

About Me