R function doesn't recognize variable - r

I am not very familiar with loops in R, and am having a hard time stating a variable such that it is recognized by a function, DESeqDataSetFromMatrix.
pls is a table of integers. metaData is a data frame containing sample IDs and conditions corresponding to pls. I verified that the below steps run error-free with the individual elements of cond run successfully .
I reviewed relevant posts on referencing variables in R:
How to reference variable names in a for loop in R?
How to reference a variable in a for loop?
Based on these posts, I modified i in line 3 with single brackets, double brackets and "as.name". No luck. DESeqDataSetFromMatrix is reading the literal text after ~ and spits out an error.
cond=c("wt","dhx","mpp","taz")
for(i in cond){
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=~i, tidy = TRUE)
"sizeFactors"(dds) <- 1
paste0("PLS",i)<-DESeq(dds)
pdf <- paste(i,"-PLS_MA.pdf",sep="")
tsv <- paste(i,"-PLS.tsv",sep="")
pdf(file=pdf,paper = "a4r", width = 0, height = 0)
plotMA(paste0("PLS",i),ylim=c(-10,10))
dev.off()
write.table(results(paste0("PLS",i)),file = tsv,quote=FALSE, sep='\t', col.names = NA)
}
With brackets, an unexpected symbol error populates.
With i alone, DESEqDataSetFromMatrix tries to read "i" from my metaData column.
Is R just not capable of reading variables in some situations? Generally speaking, is it better to write loops outside of R in a more straightforward language, then push as standalone commands? Thanks for the help—I hope there is an easy fix.

For anyone else who may be having trouble looping with DESeq2 functions, comments above addressed my issue.
Correct input:
dds <- DESeqDataSetFromMatrix(countData=pls,colData=metaData,design=as.formula(paste0("~", i)), tidy = TRUE)
as.formula worked well with all DESeq functions that I tested.
reformulate(i) also worked well in most situations.
Thanks, everyone for the help!

Related

Write stata dataframe in R [duplicate]

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

R: subset() function altered character data into strange code

i read some data into R with the read.xlsx() in openxlsx package, and here's my code for reading the data:
data_all = read.xlsx(xlsxFile = paste0(path, EoLfileName), sheet = 1, detectDates = T, skipEmptyRows = F)
now, when i access one name cell in my data, it will print the name in characters:
> data_all[1,'name']
[1] "76-ES+ADVIP-20G"
now, lets say i want to subset out some rows based on a condition on another colum:
data_sub = subset(data_all, !is.na(data_all$amount))
however, then if i print this subset data, i'd get:
> data_sub[1,'name']
[1] "A94198.10"
i've also tried to do subsetting using the following method:
data_sub = data_all[!is.na(data_all$amount),]
but i get the same thing: the expected output of "76-ES+ADVIP-20G" would be turned into "A94198.10"
I've checked many times with mode() and str() for data_all$name and data_sub$name, both return character, so they are in correct format.
here's a link to smaple data to play with:
https://drive.google.com/file/d/0BwIbultIWxeVY1VtdDU5NFp1Tkk/view?usp=sharing
Please please help me! I am quite stuck, and i dont see other posts with similar problem.
Why is this happeneing? subsetting shouldnt change data formatting correct?
Thank you in advance for your help!
additional note (if its helpful):
so when i tried to debug, i noticed that, when i was viewing the data_all in RStudio, and if i copy and paste the name "76-ES+ADVIP-20G" into the filter bar, it actually cannot find it; i'd have to type in "76-ES" and as soon as i type in the next character which is "+", RStudio data view filter would say "no matching records found"

Converting R file to Stata with missing string values

I am getting an error while converting R file into Stata format. I am able to convert the numbers into
Stata file but when I include strings I get the following error:
library(foreign)
write.dta(newdata, "X.dta")
Error in write.dta(newdata, "X.dta") :
empty string is not valid in Stata's documented format
I have few strings like location, name etc. which have missing values which is probably causing this problem. Is there a way to handle this? .
I've had this error many times before, and it's easy to reproduce:
library(foreign)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write.dta(test, 'example.dta')
One solution is to use factor variables instead of character variables, e.g.,
for (colname in names(test)) {
if (is.character(test[[colname]])) {
test[[colname]] <- as.factor(test[[colname]])
}
}
Another is to change the empty strings to something else and change them back in Stata.
This is purely a problem with write.dta, because Stata is perfectly fine with empty strings. But since foreign is frozen, there's not much you can do about that.
Update: (2015-12-04) A better solution is to use write_dta in the haven package:
library(haven)
test <- data.frame(a = "", b = 1, stringsAsFactors = FALSE)
write_dta(test, 'example.dta')
This way, Stata reads string variables properly as strings.
You could use the great readstata13 package (which kindly imports only the Rcpp package).
readstata13::save.dta13(mtcars, 'mtcars.dta')
The function allows to save already in Stata 15/16 MP file format (experimental), which is the next update after Stata 13 format.
readstata13::save.dta13(mtcars, 'mtcars15.dta', version="15mp")
Note: Of course, this also works with OP's data:
readstata13::save.dta13(data.frame(a="", b=1), 'my_data.dta')

Vectorise an imported variable in R

I have imported a CSV file to R but now I would like to extract a variable into a vector and analyse it separately. Could you please tell me how I could do that?
I know that the summary() function gives a rough idea but I would like to learn more.
I apologise if this is a trivial question but I have watched a number of tutorial videos and have not seen that anywhere.
Read data into data frame using read.csv. Get names of data frame. They should be the names of the CSV columns unless you've done something wrong. Use dollar-notation to get vectors by name. Try reading some tutorials instead of watching videos, then you can try stuff out.
d = read.csv("foo.csv")
names(d)
v = d$whatever # for example
hist(v) # for example
This is totally trivial stuff.
I assume you have use the read.csv() or the read.table() function to import your data in R. (You can have help directly in R with ? e.g. ?read.csv
So normally, you have a data.frame. And if you check the documentation the data.frame is described as a "[...]tightly coupled collections of variables which share many of the properties of matrices and of lists[...]"
So basically you can already handle your data as vector.
A quick research on SO gave back this two posts among others:
Converting a dataframe to a vector (by rows) and
Extract Column from data.frame as a Vector
And I am sure they are more relevant ones. Try some good tutorials on R (videos are not so formative in this case).
There is a ton of good ones on the Internet, e.g:
* http://www.introductoryr.co.uk/R_Resources_for_Beginners.html (which lists some)
or
* http://tryr.codeschool.com/
Anyways, one way to deal with your csv would be:
#import the data to R as a data.frame
mydata = read.csv(file="SomeFile.csv", header = TRUE, sep = ",",
quote = "\"",dec = ".", fill = TRUE, comment.char = "")
#extract a column to a vector
firstColumn = mydata$col1 # extract the column named "col1" of mydata to a vector
#This previous line is equivalent to:
firstColumn = mydata[,"col1"]
#extract a row to a vector
firstline = mydata[1,] #extract the first row of mydata to a vector
Edit: In some cases[1], you might need to coerce the data in a vector by applying functions such as as.numeric or as.character:
firstline=as.numeric(mydata[1,])#extract the first row of mydata to a vector
#Note: the entire row *has to be* numeric or compatible with that class
[1] e.g. it happened to me when I wanted to extract a row of a data.frame inside a nested function

R code: Dynamic Variables in a loop

I'm facing a challenge in R. I'm writing a code that incorporates another code written in C++ called MHX.
MHX is used for chemical data analysis by inputting some concentrations, etc. The integration between R and MHX works fine. So I'm able to write my MHX code definitions in the form of cat(CODE HERE) then calling a bash command to run MHX from terminal.
Now the results from MHX are given as tab delimited data tables that I am able to read without a problem in R. The problem is that I use R to simulate a large number of MHX calculations using loops.
Hence the need to write dynamic variables and here were I'm stuck. Let me give you more information with examples of my R code:
for (i in 1:100) {
fin <- file.create("input/ex1") #MHX input file
fout <- file.create("output/ex1.out") #MHX output file
FNM <- paste0("table_data/pH", i, ".txt") #filename used inside MHX definition
file.create(FNM) #this is used to create FNM table in R
fXY <- file.create(paste0("table_data/ECOMXY", i, ".txt"))
ifelse (HERE SOME MATHEMATICAL DEFINITIONS OF SOME VARIABLES)
ksource(MHXCode) #THIS CALLS MY MHX CODE which is inside another R code called `MHXCode` using a custom function KSOURCE. No problem here.
Up to here I don't have major problems. Now I need to setup the dynamic variables:
First I am creating variables PHL1 to PHL100
assign(paste("PHL", i, sep=""), read.table(paste0("table_data/pH", i, ".txt") ,skip=0, sep="\t", head=TRUE, na.strings = "-Inf"))
Each PHL table contains two rows and about 20 columns. Now I am interested in creating data frames from the second row for each column. Take for example row number 1 which is called EMF, ideally I need to do the following for all tables from PHLto PHL100 which is very tedious:
EMFT <- cbind(PHL1$EMF[2], PHL2$EMF[2], PHL3$EMF[2], PHL4$EMF[2], PHL5$EMF[2], PHL6$EMF[2],PHL7$EMF[2], PHL8$EMF[2], PHL9$EMF[2], PHL10$EMF[2], ....... etc up to PHL100! )
I tried many things to achieve the above, but I was not successful, including:
XX <- assign(paste0("PHL", i, "$EMF[2]"), cat(paste0("PHL", i, "$EMF[2]")))
I will need to do the same for other variables in order to be able to create some complicated plots. I hope anyone would be able to help.
I must mention that the main problem with assign is that I get qouted names of variables hence cannot return their values. Also for cat, you cannot use it to return a value, you will get NULL in the example above. Simple I am stuck!!
Please help.
Thanks to Justin he gave me a clue to answer my question. Here is what I have done:
files <- list.files(path="table_data", pattern=".dat", full.names=T); files
FRM <- NULL
for (f in files){
dat <- read.table(f, skip=0, header=TRUE, sep="\t", na.strings="",quote="", colClasses="character")[2,]
note that the [2, ] argument means that you skip all lines except line number 2 while keeping header which exactly what I was looking for.
Now I can bind it all in one table for my plots.
FRM <- rbind(FRM, dat)
This is a short answer and I think it is neat, sorted!

Resources