I am trying to perform Mandalian Randomisation using the R package “TwoSampleMR”.
As exposure data, I use instruments from the GWAS catalog. (Phenotype - Sphingolipid levels).
As a outcome data, I use GISCOME ischemic stroke outcome GWAS (http://www.kp4cd.org/index.php/node/391)
I have an error when I do harmonization by the command harmonise_data().
The text of the error is:
**Error in data.frame(…, check.names = FALSE) : arguments imply differing number of rows: 1, 0**.
I have noticed that the error is caused by some exact lines in the file with outcomes. When I make a text file that contains only one line from the original file and use it as outcome data, some lines cause an error, and someones don’t.
As an example this one causes an error:
MarkerName CHR POS Allele1 Allele2 Freq1 Effect StdErr P-value
rs10938494 4 47563448 a g 0.2139 0.0294 0.0519 0.5706
This one doesn’t:
rs1000778 11 61655305 a g 0.2559 0.0939 0.0493 0.05705
Here is all commands that I use.
library(TwoSampleMR)
library(MRInstruments)
data(gwas_catalog)
exp <- subset(gwas_catalog, grepl("Sphingolipid levels", Phenotype))
exp_dat<-format_data(exp)
exp_dat<-clump_data(exp_dat)
exp_dat
out_dat<-read_outcome_data(
snps=exp_dat$SNP,
filename='giscome.012vs3456.age-gender-5PC.meta1.txt'
sep='\t', snp_col='MarkerName',
beta_col='Effect',
se_col='StdErr',
effect_allele_col='Allele1',
other_allele_col='Allele2',
eaf_col='Freq1',
pval_col='Р-value'
)
dat<-harmonise_data(exporsure_dat=exp_dat, outcome_dat=out_dat)
What would be the reason for this problem?
Thank you.
It is difficult to comment without looking at your sample input file but you might encounter this sort of error when there are inconsistencies with naming the exposure columns in your data frame.
Please see this thread on.
https://github.com/MRCIEU/TwoSampleMR/issues/226
Related
I am trying to perform an MR using summary statistics from this GWAS. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7026164/#MOESM1
Unfortunately, the summary stats in the supplemental only have an A1 allele and do not give an A2 reference allele or EAF and therefore I am unable to harmonise the data to my outcome data.
I am using MR package in R with code
x <- harmonise_data(
exposure_dat =exposure_dat,
outcome_dat = outcome_dat_all, action = 1)
and i am getting the error "error in A2[to_swap] <- A1[to_swap] :
NAs are not allowed in subscripted assignments"
I believe this is because it requires an A2 allele in the exposure dataset. Is there anyway I can perform the MR without it? Or alternatively, can anybody suggest how I can quickly find all of the reference alleles. There are around 400 SNPs so searching for them individually would not be ideal.
Thanks, I would appreciate any help.
I think you can fetch A2 from the column uniqID from the supplementary information. If they have not provided EAF, you can potentially use the 1000 Genomes data to calculate it.
I have data frames that have wide rows, either because they have a lot of columns of numerical values or because they have long text columns
I typically use RStudio and Rmd Notebooks. I can not figure out how to get the entire line printed without a truncation. I get the same behavior when I run the code in the RStudio console so I do not think it is an R markdown issue.
I also want to use capture.out() to save the formatted output from functions like anova. The capture.out() file is also truncated
Here is an example from the RStudio console
> fdf <- mcols(res.lfcShrink)
> fdf
DataFrame with 4 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized c..
log2FoldChange results log2 fold change (MA..
lfcSE results posterior SD: diseas..
svalue results FSOS s-value (T=1): ..
> mcols(res.lfcShrink)$description
[1] "mean of normalized counts for all samples"
[2] "log2 fold change (MAP): diseaseState PDAC vs healthy"
[3] "posterior SD: diseaseState PDAC vs healthy"
[4] "FSOS s-value (T=1): diseaseState PDAC vs healthy"
I tried setting "limit length of lines displayed in console to" 500. Did not help
thanks in advance
Andy
I am a newbie in R programming and seek help in analyzing the Metabolomics data - 118 metabolites with 4 conditions (3 replicates per condition). I would like to know, for each metabolite, which condition(s) is significantly different from which. Here is part of my data
> head(mydata)
Conditions HMDB03331 HMDB00699 HMDB00606 HMDB00707 HMDB00725 HMDB00017 HMDB01173
1 DMSO_BASAL 0.001289121 0.001578235 0.001612297 0.0007772231 3.475837e-06 0.0001221674 0.02691318
2 DMSO_BASAL 0.001158363 0.001413287 0.001541713 0.0007278363 3.345166e-04 0.0001037669 0.03471329
3 DMSO_BASAL 0.001043537 0.002380287 0.001240891 0.0008595932 4.007387e-04 0.0002033625 0.07426482
4 DMSO_G30 0.001195253 0.002338346 0.002133992 0.0007924157 4.189224e-06 0.0002131131 0.05000778
5 DMSO_G30 0.001511538 0.002264779 0.002535853 0.0011580857 3.639661e-06 0.0001700157 0.02657079
6 DMSO_G30 0.001554804 0.001262859 0.002047611 0.0008419137 6.350990e-04 0.0000851638 0.04752020
This is what I have so far.
I learned the first line from this post
kwtest_pvl = apply(mydata[,-1], 2, function(x) kruskal.test(x,as.factor(mydata$Conditions))$p.value)
and this is where I loop through the metabolite that past KW test
tCol = colnames(mydata[,-1])[kwtest_pvl <= 0.05]
for (k in tCol){
output = posthoc.kruskal.dunn.test(mydata[,k],as.factor(mydata$Conditions),p.adjust.method = "BH")
}
I am not sure how to manage my output such that it is easier to manage for all the metabolites that passed KW test. Perhaps saving the output from each iteration appending to excel? I also tried dunn.test package since it has an option of table or list output. However, it still leaves me at the same point. Kinda stuck here.
Moreover, should I also perform some kind of adjusted p-value, i.e FWER, FDR, BH right after KW test - before performing the posthoc test?
Any suggestion(s) would be greatly appreciated.
I have a csv file having 4 columns labeled AGE, DIASTOLIC, BMI and EVER.PREGNANT and 700 rows. The last column consists of only yes or no. I wish to plot the data BMI vs EVER.PREGNANT with an intent to comparing BMI of those with yes in the fourth column and no in the same column. What code should I write to get the required boxplot?
I have tried the following code:
Sheet=read.csv(/Downloads/1739230_1284354330_PIMA.csv - 1739230_1284354330_PIMA.csv.csv, sep=",")
boxplot(BMI~EVER.PREGNANT,data=sheet, main="BMI vs PREG",xlab="BMI",ylab="PREGNANT")
The error that I get is
Error in eval(expr,envr,enclos): object 'Sheet' not found
Similarly, what modifications can be done to plot AGE vs DIASTOLIC, where both columns are numbers? Will I get the 700 odd values nicely?
I answer here because it tells me not to extend the discussion :-).
I think you haven't loaded correctly your data set. You need to add header = T when loading to tell the program that your first row corresponds with the names of the variables.
Sheet=read.csv("/Downloads/1739230_1284354330_PIMA.csv", sep=",", header = T)
I am having trouble using the functcomp package in R.
I have 2 datasets: one with species frequency, and the other listing the functional traits of my species. The frequency dataset has 264 species listed in the first row and 27 sites listed in the first column, all values in dataset are between 0-1. The functional trait dataset has the same 264 species (copied & pasted from the frequency dataset to make sure identical) listed in the first column, and 5 different functional traits listed in the 1st row (height, life history, life form, origin, palatability).
I am using the following code:
traits.df <- read.table("species_functional_traits_6_ August.txt", header = TRUE)
frequency.df <- read.table("Spring 2014 - combined table - 6 August.txt", header = TRUE)
x <- (as.matrix(traits.df))
a <- (as.matrix(frequency.df))
functcomp(x, a, CWM.type = c("dom", "all"), bin.num = height)
But keep getting the following error message:
Error in functcomp(x, a, CWM.type = c("dom", "all"), bin.num = height) :
Different number of species in 'x' and 'a'.
I have tried fiddling with a couple of things in the code and datasets, but cannot work out what I am doing wrong here. Any help would be greatly appreciated!
Here are links the frequency & trait data (a subset of it, but still get same error message with this data) as a tab-delimited text file
frequency: https://www.dropbox.com/s/girs3nrq1ciyg1a/frequency%20-%20small.txt?dl=0
traits: https://www.dropbox.com/s/l888sallx7mu3f6/traits%20-%20small.txt?dl=0
try stating row.names=1 when read in your table, this solved my problem -
Anna