I have variables from c1 to c24, totally 24 variables. I want to do something like:
b <- c(c1,c2,c3,c4,c5,c6,c7,c8,c9,
c10,c11,c12,c13,c14,c15,c16,c17,
c18,c19,c20,c21,c22,c23,c24)
How could I do this ? It is not working to use something like b <- c(c 1:c24), R only connects two values (c1 and c24) in this case, but I want to put all 24 values into this vector.
You can do this with lapply and get:
c1 <- c2 <- c3 <- c4 <- 1
unlist( ## convert from list to vector
lapply(
paste0("c",1:4), ## names of variables
get) ## retrieve variable by name
)
## [1] 1 1 1 1
In general, it would be a good idea to look further back in your workflow and see if it's possible to generate those variables within a list in the first place ...
Related
does anyone know how to have a row in R that is calculated from another row automatically? i.e.
lets say in excel, i want to make a row C, which is made up of (B2/B1)
e.g. C1 = B2/B1
C2 = B3/B2
...
Cn = Cn+1/Cn
but in excel, we only need to do one calculation then drag it down. how do we do it in R?
In R you work with columns as vectors so the operations are vectorized. The calculations as described could be implemented by the following commands, given a data.frame df (i.e. a table) and the respective column names as mentioned:
df["C1"] <- df["B2"]/df["B1"]
df["C2"] <- df["B3"]/df["B2"]
In R you usually would name the columns according to the content they hold. With that, you refer to the columns by their name, although you can also address the first column as df[, 1], the first row as df[1, ] and so on.
EDIT 1:
There are multiple ways - and certainly some more elegant ways to get it done - but for understanding I kept it in simple base R:
Example dataset for demonstration:
df <- data.frame("B1" = c(1, 2, 3),
"B2" = c(2, 4, 6),
"B3" = c(4, 8, 12))
Column calculation:
for (i in 1:ncol(df)-1) {
col_name <- paste0("C", i)
df[col_name] <- df[, i+1]/df[, i]
}
Output:
B1 B2 B3 C1 C2
1 1 2 4 2 2
2 2 4 8 2 2
3 3 6 12 2 2
So you iterate through the available columns B1/B2/B3. Dynamically create a column name in every iteration, based on the number of the current iteration, and then calculate the respective column contents.
EDIT 2:
Rowwise, as you actually meant it apparently, works similarly:
a <- c(10,15,20, 1)
df <- data.frame(a)
for (i in 1:nrow(df)) {
df$b[i] <- df$a[i+1]/df$a[i]
}
Output:
a b
1 10 1.500000
2 15 1.333333
3 20 0.050000
4 1 NA
You can do this just using vectors, without a for loop.
a <- c(10,15,20, 1)
df <- data.frame(a)
df$b <- c(df$a[-1], 0) / df$a
print(df)
a b
1 10 1.500000
2 15 1.333333
3 20 0.050000
4 1 0.000000
Explanation:
In the example data, df$a is the vector 10 15 20 1.
df$a[-1] is the same vector with its first element removed, 15 20 1.
And using c() to add a new element to the end so that the vector has the same lenght as before:
c(df$a[-1],0) which is 15 20 1 0
What we want for column b is this vector divided by the original df$a.
So:
df$b <- c(df$a[-1], 0) / df$a
How to compute different parameters as one in R. For example. I have 3 arrays of a variable A called A1.1,A1.2,A1.3. I want to compute them in one as "A". How to do that?
A1.1>c(1,1,1,0,0,0)
A1.2>c(1,0,0,1,1,1)
A1.3>c(0,1,1,1,1,1)
Out put should be like this. in SPSS we do this by compute variables.
A>c(1,1,1,1,1,1)
In R you can use simple math on arrays, for example:
A1.1 <- c(1,0,1,0,0,0)
A1.2 <- c(1,0,0,1,1,1)
A1.3 <- c(0,0,1,1,1,1)
A1 <- 1*((A1.1 + A1.2 + A1.3)>0)
> A1
[1] 1 0 1 1 1 1
In R you can use the any() function inside of apply() to make this check. For example:
a1 <- c(1,0,0,0,1,1)
a2 <- c(0,1,0,0,0,1)
a3 <- c(0,1,1,0,1,1)
a <- apply(data.frame(a1,a2,a3), 1, function(x) ifelse(any(x),1,0))
And then as output:
> a
[1] 1 1 1 0 1 1
In SPSS you can take a similar approach:
COMPUTE a = ANY(1, a1 TO a3) .
EXE .
I am trying to rename a variable over several data frames, but assign wont work. Here is the code I am trying
assign(colnames(eval(as.name(DataFrameX)))[[3]], "<- NewName")
# The idea is, go through every dataset, and change the name of column 3 to
# "NewName" in all of them
This won't return any error (All other versions I could think of returned some kind of error), but it doesn't change the variable name either.
I am using a loop to create several data frames and different variables within each, now I need to rename some of those variables so that the data frames can be merged in one at a later stage. All that works, except for the renaming. If I input myself the names of the dataframe and variables in a regular call with colnames(DF)[[3]] <- "NewName", but somehow when I try to use assign so that it is done in a loop, it doesn't do anything.
Here is what you can do with a loop over all data frames in your environment. Since you are looking for just data frame in your environment, you are immune of the risk to touch any other variable. The point is that you should assign new changes to each data frame within the loop.
df1 <- data.frame(q=1,w=2,e=3)
df2 <- data.frame(q=1,w=2,e=3)
df3 <- data.frame(q=1,w=2,e=3)
# > df1
# q w e
# 1 1 2 3
# > df2
# q w e
# 1 1 2 3
# > df3
# q w e
# 1 1 2 3
DFs=names(which(sapply(.GlobalEnv, is.data.frame)))
for (i in 1:length(DFs)){
df=get(paste0(DFs[i]))
colnames(df)[3]="newName"
assign(DFs[i], df)
}
# > df1
# q w newName
# 1 1 2 3
# > df2
# q w newName
# 1 1 2 3
# > df3
# q w newName
# 1 1 2 3
We could try ?eapply() to apply setnames() from the data.table package to all data.frame's in your global enviromnent.
library(data.table)
eapply(.GlobalEnv, function(x) if (is.data.frame(x)) setnames(x, 3, "NewName"))
I am trying to reorganize my data, basically a list of data.frames.
Its elements represent subjects of interest (A and B), with observations on x and y, collected on two occasions (1 and 2).
I am trying to make this a list that contains data.frames referring to the subjects, with the information on which occasion x and y were collected being stored in the respective data.frames as new variable, as opposed to the element name:
library('rlist')
A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
list <- list(A1=A1,A2=A2,B1=B1,B2=B2)
A <- do.call(rbind,list.match(list,"A"))
B <- do.call(rbind,list.match(list,"B"))
list <- list(A=A,B=B)
list <- lapply(list,function(x) {
y <- data.frame(x)
y$class <- c(rep.int(1,2),rep.int(2,2))
return(y)
})
> list
$A
x y class
A1.1 66 96 1
A1.2 76 58 1
A2.1 50 93 2
A2.2 57 12 2
$B
x y class
B1.1 58 56 1
B1.2 69 15 1
B2.1 77 77 2
B2.2 9 9 2
In my real world problem there are about 500 subjects, not always two occasions, differing numbers of observations.
So my example above is just to illustrate where I want to get, and I am stuck at how to pass to the do.call-rbind that it should, based on elements names, bind subject-specific elements as new list elements together, while assigning a new variable.
To me, this is a somewhat fuzzy task, and the closest I got was the rlist package. This question is related but uses unique to identify elements, whereas in my case it seems to be more a regex problem.
I'd be happy even for instructions on how to use google, any keywords for further research etc.
From the data you provided:
subj <- sub("[A-Z]*", "", names(lst))
newlst <- Map(function(x, y) {x[,"class"] <- y;x}, lst, subj)
First we do the regular expression call to isolate the number that will go in the class column. In this case, I matched on capital letters and erased them leaving the number. Therefore, "A1" becomes "1". Please note that the real names will mean a different regex pattern.
Then we use Map to create a new column for each data frame and save to a new list called newlst. Map takes the first element of each argument and carries out the function then continues on with each object element. So the first data frame in lst and the first number in subj are used first. The anonymous function I used is function(x,y) {x[, "class"] <- y; x}. It takes two arguments. The first is the data frame, the second is the column value.
Now it's much easier to move forward. We can create a vector called uniq.nmes to get the names of the data frames that we will combine. Where "A1" will become "A". Then we can rbind on that match:
uniq.nmes <- unique(sub("\\d", "", names(lst)))
lapply(uniq.nmes, function(x) {
do.call(rbind, newlst[grep(x, names(newlst))])
})
# [[1]]
# x y class
# A1.1 1 79 1
# A1.2 30 13 1
# A2.1 90 39 2
# A2.2 43 22 2
#
# [[2]]
# x y class
# B1.1 54 59 1
# B1.2 83 90 1
# B2.1 85 36 2
# B2.2 91 28 2
Data
A1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
A2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B1 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
B2 <- data.frame(x=sample(1:100,2),y=sample(1:100,2))
lst <- list(A1=A1,A2=A2,B1=B1,B2=B2)
It sounds like you're doing a lot of gymnastics because you have a specific form in mind. What I would suggest is first trying to make the data tidy. Without reading the link, the quick summary is to put your data into a single data frame, where it can be easily processed.
The quick version of the answer (here I've used lst instead of list for the name to avoid confusion with the built-in list) is to do this:
do.call(rbind,
lapply(seq(lst), function(i) {
lst[[i]]$type <- names(lst)[i]; lst[[i]]
})
)
What this will do is create a single data frame, with a column, "type", that contains the name of the list item in which that row appeared.
Using a slightly simplified version of your initial data:
lst <- list(A1=data.frame(x=rnorm(5)), A2=data.frame(x=rnorm(3)), B=data.frame(x=rnorm(5)))
lst
$A1
x
1 1.3386071
2 1.9875317
3 0.4942179
4 -0.1803087
5 0.3094100
$A2
x
1 -0.3388195
2 1.1993115
3 1.9524970
$B
x
1 -0.1317882
2 -0.3383545
3 0.8864144
4 0.9241305
5 -0.8481927
And then applying the magic function
df <- do.call(rbind,
lapply(seq(lst), function(i) {
lst[[i]]$type <- names(lst)[i]; lst[[i]]
})
)
df
x type
1 1.3386071 A1
2 1.9875317 A1
3 0.4942179 A1
4 -0.1803087 A1
5 0.3094100 A1
6 -0.3388195 A2
7 1.1993115 A2
8 1.9524970 A2
9 -0.1317882 B
10 -0.3383545 B
11 0.8864144 B
12 0.9241305 B
13 -0.8481927 B
From here we can process to our hearts content; with operations like df$subject <- gsub("[0-9]*", "", df$type) to extract the non-numeric portion of type, and tools like split can be used to generate the sub-lists that you mention in your question.
In addition, once it is in this form, you can use functions like by and aggregate or libraries like dplyr or data.table to do more advanced split-apply-combine operations for data analysis.
So, I created a list a of csv files:
tbl = list.files(pattern="*.csv")
Then I separated them into two different lists:
tbl1 <- tbl[c(1,3:7,10:12,14:18,20)]
tbl2 <- tbl[c(2,19,8:9,13)]
Then loaded them:
list_of_data1 = lapply(tbl1, read.csv)
list_of_data2 = lapply(tbl2, read.csv)
And now I want to create a master file. I just want to select some data from each of csv file and store it in one table. To do that I created such loop:
gdata1 = lapply(list_of_data1,function(x) x[3:nrow(x),10:13])
for( i in 1:length(list_of_data1)){
rownames(gdata1[[i]]) = list_of_data1[[i]][3:nrow(list_of_data1[[i]]),1]
}
tmp = lapply(gdata1,function(x) matrix(as.numeric(x),ncol=4))
final.table1=c()
for(i in 1:length(gnames)){
print(i)
tmp=gnames[i]
f1 = function(x) {x[tmp,]}
tmp2 = lapply(gdata1,f1)
tmp3 = c()
for(j in 1:length(tmp2)){
tmp3=rbind(tmp3,tmp2[[j]])
}
tmp4 = as.vector(t(tmp3))
final.table1 = rbind(final.table1,tmp4)
}
rownames(final.table1) = gnames
I created two different lists of data because in first one list_of_data1 there are four interesting columns for me (10:13) and in the other one list_of_data2 there are only 3 columns (10:12). I want to put all of the data in one table. Is there any way to do it in one loop ?
I have an idea how to solve that problem. I may create a new loop for list_of_data2and after that bind both of them using cbind. I want to do it in more elegant way so that's why I came here!
I would suggest looking into do.call , you can rbind your first list of tables and then rbind your second list of tables and then cbind as you stated. Below a trivial use of do.call
#creating a list of tables that we are interested in appending
#together in one master dataframe
ts<-lapply(c(1,2,3),function(x) data.frame(c1=rep(c("a","b"),2),c2=(1:4)*x,c3=rnorm(4)))
#you could of course subset ts to the set of columns
#you find of interest ts[,colsOfInterest]
master<-do.call(rbind,ts)
After seeing your complication of various row/columns of interest in each file, I think you could do something like this. Seems a bit hackerish but could get the job done. I assume you merge the files based on a column named id, you could of course generalize this to multiple columns etc
#creating a series of data frames for which we only want a subset of row/cols
> df1<-data.frame(id=1:10,val1=rnorm(10),val2=rnorm(10))
> df2<-data.frame(id=5:10,val3=rnorm(6))
> df3<-data.frame(id=1:3,val4=rnorm(3), val5=rnorm(3), val6=rnorm(3))
#specifying which rows/cols we are interested in
#i assume you have some way of doing this programmatically or you defined elsewhere
> colsofinterest<-list(df1=c("id","val1"),df2=c("id","val3"),df3=c("id","val5","val6"))
> rowsofinterest<-list(df1=1:5,df2=5:8,df3=2:3)
#create a list of data frames where each has only the row/cols combination we want
> ts<-lapply(c("df1","df2","df3"),
function(x) get(x)[rowsofinterest[[x]],colsofinterest[[x]]])
> ts
[[1]]
id val1
1 1 0.24083489
2 2 -0.50140019
3 3 -0.24509033
4 4 1.41865350
5 5 -0.08123618
[[2]]
id val3
5 9 -0.1862852
6 10 0.5117775
NA NA NA
NA.1 NA NA
[[3]]
id val5 val6
2 2 0.2056010 -0.6788145
3 3 0.2057397 0.8416528
#now merge these based on a key column "id", and we want to keep all.
> final<-Reduce(function(x,y) merge(x,y,by="id",all=T), ts)
> head(final)
id val1 val3 val5 val6
1 1 0.24083489 NA NA NA
2 2 -0.50140019 NA 0.2056010 -0.6788145
3 3 -0.24509033 NA 0.2057397 0.8416528
4 4 1.41865350 NA NA NA
5 5 -0.08123618 NA NA NA
6 9 NA -0.1862852 NA NA
Is this what you are thinking about or did I misinterpret?
not ldplyr() functions in the same way as do.call() in JPC's answer.... I just happen to use plyr more, if you are looking at manipulating r datastructures in a vectorised way then lots of useful stuff in there.
library(plyr)
d1 <- ldplyr(list_of_data1, rbind)
d2 <- ldplyr(list_of_data2, rbind)
select cols of d1 and d2
d1 <- d1[,c(10:13)]
d2 <- d2[,c(10:12)]
final.df <- cbind(d1,d2)