I have a variable named SAL_mean created like this (I want to make a loop once I figure this out):
watersheds <- c('ANE', 'SAL', 'CER')
assign(paste0(watersheds[1], '_mean'), read.csv(paste0(watersheds[1], '_mean.csv')))
now the next step should be something like this (which works):
cols_dont_want <- c('B1', 'B2', 'B3')
assign(paste0(watersheds[1], '_mean'), SAL_mean[, !names(SAL_mean) %in% cols_dont_want])
but I wanted to ask how to replace "SAL_mean" by using watersheds[1], because this line of code doesn't work:
assign(paste0(watersheds[1], '_mean'), paste0(watersheds[1], '_mean')[, !names(paste0(watersheds[1], '_mean')) %in% cols_dont_want])
I think it treats the "paste0(watersheds[2], '_mean')" as string and not as a name of variable but I haven't been able to find a solution (I tried for example "as.name" function but it gave me an error "object of type 'symbol' is not subsettable")
Keep dataframes in a list using ?lapply, then it gets easier to carry out same transformations on multiple dataframes in a list, something like:
# set vars
watersheds <- c('ANE', 'SAL', 'CER')
cols_dont_want <- c('B1', 'B2', 'B3')
# result, all dataframes in one list
myList <- lapply(watersheds, function(i){
# read the file
x <- read.csv(paste0(i, "_mean.csv"))
# exclude columns and return
x[, !colnames(x) %in% cols_dont_want]
} )
replace
paste0(watersheds[2], '_mean')
with
eval(parse(text = paste0(watersheds[2], '_mean')))
and it should work. Your guess is correct, paste0 just gives you a string but you need to call the variable which is done using eval()
Or you can do it in a for loop (some find the syntax more understandable). It's equivalent to zx8754's solution, except it assigns names to each dataframe as per the OP. It's trivial to modify zx8754's solution do do the same.
watersheds <- c('ANE', 'SAL', 'CER')
cols_dont_want <- c('B1', 'B2', 'B3')
ws.list <- list()
for (i in 1:length(watersheds)) {
ws.list[[i]] <- read.csv(paste0(watersheds[i], '_mean.csv'))
names(ws.list)[i] <- paste0(watersheds[i], '_mean')
ws.list[[i]] <- ws.list[[i]][!names(ws.list[[i]]) %in% cols_dont_want]
}
names(ws.list)
# "ANE_mean" "SAL_mean" "CER_mean"
# If you absolutely want to call the data.frames by their
# individual names, you can do so after you attach() the list.
attach(ws.list)
ANE_mean
Related
I have a dataframe with ~9000 rows of human coded data in it, two coders per item so about 4500 unique pairs. I want to break the dataset into each of these pairs, so ~4500 dataframes, run a kripp.alpha on the scores that were assigned, and then save those into a coder sheet I have made. I cannot get the loop to work to do this.
I can get it to work individually, using this:
example.m <- as.matrix(example.m)
s <- kripp.alpha(example.m)
example$alpha <- s$value
However, when trying a loop I am getting either "Error in get(v) : object 'NA' not found" when running this:
for (i in items) {
v <- i
v <- v[c("V1","V2")]
v <- assign(v, as.matrix(get(v)))
s <- kripp.alpha(v)
i$alpha <- s$value
}
Or am getting "In i$alpha <- s$value : Coercing LHS to a list" when running:
for (i in items) {
i.m <- i[c("V1","V2")]
i.m <- as.matrix(i.m)
s <- kripp.alpha(i.m)
i$alpha <- s$value
}
Here is an example set of data. Items is a list of individual dataframes.
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
items <- c("l","t")
I am sure this is a basic question, but what I want is for each file, i, to add a column with the alpha score at the end. Thanks!
Your problem is with scoping and extracting names from objects when referenced through strings. You'd need to eval() some of your object to make your current approach work.
Here's another solution
library("irr") # For kripp.alpha
# Produce the data
l <- as.data.frame(matrix(c(4,3,3,3,1,1,3,3,3,3,1,1),nrow=2))
t <- as.data.frame(matrix(c(4,3,4,3,1,1,3,3,1,3,1,1),nrow=2))
# Collect the data as a list right away
items <- list(l, t)
Now you can sapply() directly over the elements in the list.
sapply(items, function(v) {
kripp.alpha(as.matrix(v[c("V1","V2")]))$value
})
which produces
[1] 0.0 -0.5
I used to work in C++ and I think I am misunderstanding how for-loops (or iterations) work in R. I want to change list items in a for loop, but the for loop seems to make a temporary copy and only change that? How can I prevent this? This seems like a trivial beginners question, but I was unable to find a tutorial / question on stackoverflow about why this happens.
Code:
myList <- list(a=1, b=1, c=1, d=1)
for(item in myList){item <- 3}
myList
# Expected output: 3,3,3,3 - Real output: 1,1,1,1
# Additionally, I now have a variable "item" with value 3.
for(item in myList) creates a new object called item
If you want to refer to the items from the list, it would be better to do it by calling either their position with myList[1], or their name with myList[["a"]].
You can for-loop through the list by using the index (as one of the comments suggested).
myList <- list(a=1, b=2, c=4, d=5)
for(i in 1:length(myList)){
myList[i] <- 3
}
myList
But I would recomment a vector approach. Check this out:
myList <- list(a=1, b=2, c=1, d=5)
myList=='1'
myList[myList=='1']=3
myList
myList[names(myList)=='a']=9
myList
Now you do not have any redundant variables.
This is actually the recommended approach in R. For-loops are too computationally expensive.
As stated by #nicola, lapply should be a good option. Here is an example based on your question.
myList <- list(a = 1, b = 1, c = 1, d = 1) # output: 1,1,1,1
lapply(myList, function(x) 3) # output: 3,3,3,3
# lapply iterates over every list item
I have a data.frame with couple of thousands rows. I am applying several lines of code to subsets of this data.
I have 4 subsets in a column "mergeorder$phylum":
[1] "ascomycota" "basidiomycota" "unidentified"
[4] "chytridiomycota"
And on every subset i have to apply this set of functions separately:
ascomycota<-mergeorder[mergeorder$phylum %in% c("ascomycota"), ]
group_ascomycota <- aggregate(ascomycota[,2:62], by=list(ascomycota$order), FUN=sum)
row.names(group_ascomycota)<-group_ascomycota[,1]
group_ascomycota$sum <-apply(group_ascomycota[,-1],1,sum)
dat5 <-sweep(group_ascomycota[,2:62], 2, colSums(group_ascomycota[2:62]), '/')
dat5$sum <-apply(group_ascomycota[,-1],1,sum)
reorder_dat5 <- dat5[order(dat5$sum, decreasing=T),]
reorder_dat5$OTU_ID <- row.names(reorder_dat5)
FINITO<-reorder_dat5[1:15,]
write.table(FINITO, file="output_ITS1/ITS1_ascomycota_order_top15.csv", col.names=TRUE,row.names=FALSE, sep=",", quote=FALSE)
This code works. However, I would like to apply this code without manually replacing every "ascomycota" with "basidiomycota", "unidentified", "chytridiomycota".
What function should I use? How should I use it? I've been struggling with sapply(), repeat() but haven't come far.
The end result should execute the whole code and export csv separate files.
Many thanks for your answer
It's usually possible to write code that handles all subsets in one go. However, what you are doing is pretty complicated. The best thing to do might be to gather all that into a function and then just run the function for each subset. Something like this:
subset_transform <- function(subset){
t <-mergeorder[mergeorder$phylum %in% c(subset), ]
group_t <- aggregate(t[,2:62], by=list(t$order), FUN=sum)
row.names(group_t)<-group_t[,1]
group_t$sum <-apply(group_t[,-1],1,sum)
dat5 <-sweep(group_t[,2:62], 2, colSums(group_t[2:62]), '/')
dat5$sum <-apply(group_t[,-1],1,sum)
reorder_dat5 <- dat5[order(dat5$sum, decreasing=T),]
reorder_dat5$OTU_ID <- row.names(reorder_dat5)
FINITO<-reorder_dat5[1:15,]
write.table(FINITO, file = paste("output_ITS1/ITS1_", subset, "_order_top15.csv"), col.names=TRUE,row.names=FALSE, sep=",", quote=FALSE)
}
subset_transform("ascomycota")
subset_transform("basidiomycota")
subset_transform("unidentified")
subset_transform("chytridiomycota")
I have of list of variable names.
list = c("smoke", "ptd", "ht", "ui", "racecat", "visit")
I want to make a series of plots like this.
plot(low[somevariable=="0"] ~ age[somevariable=="0"])
I'd like to replace somevariable with each of the variable name in the list. For example,
plot(low[smoke=="0"] ~ age[smoke=="0"])
plot(low[ptd=="0"] ~ age[ptd=="0"])
......
plot(low[visit=="0"] ~ age[visit=="0"])
I've tried to create a for-loop and use a variety of things to replace somevariable, such as
list[i], paste(list[i]), substitute(list[i]), parse(list[i])
But none of them works. I would really appreciate some help. Thanks in advance.
You can try:
for(i in list) {
print(plot(low[get(i) == 0] ~ low[get(i) == 0]))
}
Here's a trivial example of what you want to do. You can use get to find the relevant named variable:
set.seed(1)
x <- rnorm(100)
y <- rnorm(100)
a <- rnorm(100)
b <- rnorm(100,10)
l <- c('a','b')
# histogram of one variable:
hist(get(l[1]))
# loop of plots more like what you're going for:
for(i in 1:length(l)) plot(y[get(l[i])<2] ~ x[get(l[i])<2])
Find the solution:
eval(parse(text=list[i]))
There is another post addressing the same thing:
Getting strings recognized as variable names in R
I would like to add a column to every data frame in my R environment which all have the same format.
I can create the column I want with a simple assignment like this:
x[,8] <- x[,4]/(x[,4]+x[,5])
When I try to put this in a for loop that will iterate over every object in the environment, I get an error.
control_data <- ls()
for (i in control_data) {(i[,8] <- i[,4]/(i[,4]+i[,5]))}
Error: unexpected '[' in "for (i in control_data) {["
Here is what the input files look like:
ENSMUSG00000030088 Aldh1l1 chr6:90436420-90550197 1.5082200 3.130860 0.671814 0.0000000
ENSMUSG00000020932 Gfap chr11:102748649-102762226 7.0861500 44.182700 20.901700 0.2320750
ENSMUSG00000024411 Aqp4 chr18:15547902-15562193 3.4920400 3.474880 2.463230 0.0331238
ENSMUSG00000023913 Pla2g7 chr17:43705046-43749150 1.5105400 24.275600 11.422400 1.5111100
ENSMUSG00000035805 Mlc1 chr15:88786313-88809437 1.9010200 7.147400 5.313190 0.6358940
ENSMUSG00000007682 Dio2 chr12:91962993-91976878 1.7322900 12.094200 6.738320 1.0736900
ENSMUSG00000017390 Aldoc chr11:78136469-78141283 55.4562000 199.958000 91.328300 22.9541000
ENSMUSG00000005089 Slc1a2 chr2:102498815-102630941 63.7394000 130.729000 103.710000 10.0406000
ENSMUSG00000070880 Gad1 chr2:70391128-70440071 2.6501400 14.907500 13.730200 1.3992200
ENSMUSG00000026787 Gad2 chr2:22477724-22549394 3.9908200 11.308600 28.221500 1.4530500
Thank you for any help you could provide. Is there a better way to do this using an apply function?
As mentioned in the comment, your error happens because the results of calling ls are not the objects themselves but rather their names as strings.
To use the for-loop, you'll be headed down the eval(parse(...)) path. You can also do this with apply and a function.
myfun <- function(x) {
df <- get(x)
df[,8] <- df[,4] / (df[,4] + df[,5])
return(df)
}
control_data <- ls()
lapply(control_data, myfun)
As per the comment:
for(i in control_data) {
df <- get(i)
df[,8] <- df[,4] / (df[,4] + df[,5])
assign(i, df)
}