Add new column to data.frame through loop in R

Add new column to data.frame through loop in R - r

I have n number of data.frame i would like to add column to all data.frame
a <- data.frame(1:4,5:8)
b <- data.frame(1:4, 5:8)
test=ls()
for (j in test){
j = cbind(get(j),IssueType=j)
}
Problem that i'm running into is
j = cbind(get(j),IssueType=j)
because it assigns all the data to j instead of a, b.

As commented, it's mostly better to keep related data in a list structure. If you already have the data.frames in your global environment and you want to get them into a list, you can use:
dflist <- Filter(is.data.frame, as.list(.GlobalEnv))
This is from here and makes sure that you only get data.frame objects from your global environment.
You will notice that you now already have a named list:
> dflist
# $a
# X1.4 X5.8
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
#
# $b
# X1.4 X5.8
# 1 1 5
# 2 2 6
# 3 3 7
# 4 4 8
So you can easily select the data you want by typing for example
dflist[["a"]]
If you still want to create extra columns, you could do it like this:
dflist <- Map(function(df, x) {df$IssueType <- x; df}, dflist, names(dflist))
Now, each data.frame in dflist has a new column called IssueType:
> dflist
# $a
# X1.4 X5.8 IssueType
# 1 1 5 a
# 2 2 6 a
# 3 3 7 a
# 4 4 8 a
#
# $b
# X1.4 X5.8 IssueType
# 1 1 5 b
# 2 2 6 b
# 3 3 7 b
# 4 4 8 b
In the future, you can create the data inside a list from the beginning, i.e.
dflist <- list(
a = data.frame(1:4,5:8)
b = data.frame(1:4, 5:8)
)

To create a list of your data.frames do this:
a <- data.frame(1:4,5:8); b <- data.frame(1:4, 5:8); test <- list(a,b)
This allows you to us the lapply function to perform whatever you like to do with each of the dataframes, eg:
out <- lapply(test, function(x) cbind(j))
For most data.frame operations I recommend using the packages dplyr and tidyr.

wooo wooo
here is answer for the issue
helped by #docendo discimus
Created Dataframe
a <- data.frame(1:4,5:8)
b <- data.frame(1:4, 5:8)
Group data.frame into list
dflist <- Filter(is.data.frame, as.list(.GlobalEnv))
Add's extra column
dflist <- Map(function(df, x) {df$IssueType <- x; df}, dflist, names(dflist))
unstinting the data frame
list2env(dflist ,.GlobalEnv)

Related

Change column names in list of dataframes

I have a list of dataframes on which I want to change the name of the 2nd column in each dataframe in the list so that it matches the name of the list item that holds it. The code that I have at the moment is:
my_list <- list(one = data.frame(a <- 1:5, b <- 1:5), two = data.frame(a <- 1:5, b <- 1:5))
my_list <- lapply(seq_along(names(my_list)), function(x) names(my_list[[x]])[2] <- names(my_list)[x])
but my code just replaces the dataframes without me understanding why. Any help would be much appreciated.
I know that I can do this easily with a "for" loop, but I would like to avoid it, hence my question.

setNames can be convenient here:
my_list2 <- lapply(
names(my_list),
function(x) setNames(my_list[[x]], c(names(my_list[[x]])[1], x))
)
Or the same using Map (which I think is easier to read):
my_list2 <- Map(
function(x, n) setNames(x, c(names(x)[1], n)),
my_list, names(my_list)
)
> my_list2
$one
a one
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
$two
a two
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
A problem of names<- is that it returns the name, not the object.

Try this, loop through data.frames, update column name:
# dummy list
my_list <- list(one = data.frame(a = 1:5, b = 1:5), two = data.frame(a = 1:5, b = 1:5))
my_list_updated <-
lapply(names(my_list), function(i){
x <- my_list[[ i ]]
# set 2nd column to a new name
names(x)[2] <- i
# return
x
})
my_list_updated
# [[1]]
# a one
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5
#
# [[2]]
# a two
# 1 1 1
# 2 2 2
# 3 3 3
# 4 4 4
# 5 5 5

You could to use the super assignment operator:
my_list <- list(one = data.frame(a = 1:5, b = 1:5), two = data.frame(a = 1:5, b = 1:5))
lapply(seq_along(names(my_list)), function(x) names(my_list[[x]])[2] <<- names(my_list)[x] )
my_list

combine list of data frames in list in specific manner

I got a list which have another list of data frames.
The outside list elements represents years and inside list represent months data.
Now I want to create a final list which will contain data for all months. Each Month columns will be "cbinded" by other years column values.
Alldata <- list()
Alldata[[1]] <- list(data.frame(Jan_2015_A=c(1,2), Jan_2015_B=c(3,4)), data.frame(Feb_2015_C=c(5,6), Feb_2015_D=c(7,8)))
Alldata[[2]] <- list(data.frame(Jan_2016_A=c(1,2), Jan_2016_B=c(3,4)), data.frame(Feb_2016_C=c(5,6), Feb_2016_D=c(7,8)))
Expected output list is as following
I've tried using for loops and its little complex, I want any R function to do this task.
I have done this using for loops using following code. But this is really complex and I myself found this little complicate. Hope I will get any simpler and tidy code for this operation.
I created list with each months and years data as a list item in form of data frames
x2 <- list()
for(l1 in 1: length(Alldata[[1]])){
temp <- list()
for(l2 in 1: length(Alldata)){
temp <- append(temp, list(Alldata[[l2]][[l1]]))
}
x2 <- append(x2, list(temp))
}
# then created final List with succesive years data of each month as list items. This is primarily used for Tracking data for years For Example: how much was count was for Jan_2015 and Jan_2016 for "A"
finalList <- list()
for(l3 in 1: length(x2)){
temp <- x2[[l3]]
td2 <- as.data.frame(matrix("", nrow = nrow(temp[[1]])))
rownames(td2)[rownames(temp[[1]])!=""] <- rownames(temp[[1]])[rownames(temp[[1]])!=""]
for(l4 in 1:ncol(temp[[1]])){
for(l5 in 1: length(temp)){
# lapply(l4, function(x) do.call(cbind,
td2 <- cbind(td2, temp[[l5]][, l4, drop=F])
}
}
finalList <- append(finalList, list(td2))
}
> finalList
[[1]]
V1 Jan_2015_A Jan_2016_A Jan_2015_B Jan_2016_B
1 1 1 3 3
2 2 2 4 4
[[2]]
V1 Feb_2015_C Feb_2016_C Feb_2015_D Feb_2016_D
1 5 5 7 7
2 6 6 8 8

You could do the following below. The lapply will iterate over the outer list and the do.call will cbind the inner list of data frames.
lapply(Alldata, do.call, what = 'cbind')
[[1]]
Jan_2015_A Jan_2015_B Feb_2015_C Feb_2015_D
1 1 3 5 7
2 2 4 6 8
[[2]]
Jan_2016_A Jan_2016_B Feb_2016_C Feb_2016_D
1 1 3 5 7
2 2 4 6 8
You can also use dplyr to get the same results.
library(dplyr)
lapply(Alldata, bind_cols)
Here is a third option proposed by J.R.
lapply(Alldata, Reduce, f = cbind)
EDIT
After clarification from OP, the above solution has been modified (see below) to produce the newly specified output. The solution above has been left there since it is a building block for the solution below.
pattern.vec <- c("Jan", "Feb")
### For a given vector of months/patterns, returns a
### list of elements with only that month.
mon_data <- function(mo) {
return(bind_cols(sapply(Alldata, function(x) { x[grep(pattern = mo, x)]})))
}
### Loop through months/patterns.
finalList <- lapply(pattern.vec, mon_data)
finalList
## [[1]]
## Jan_2015_A Jan_2015_B Jan_2016_A Jan_2016_B
## 1 1 3 1 3
## 2 2 4 2 4
##
## [[2]]
## Feb_2015_C Feb_2015_D Feb_2016_C Feb_2016_D
## 1 5 7 5 7
## 2 6 8 6 8
## Ordering the columns as specified in the original question.
## sorting is by the last character in the column name (A or B)
## and then the year.
lapply(finalList, function(x) x[ order(gsub('[^_]+_([^_]+)_(.*)', '\\2_\\1', colnames(x))) ])
## [[1]]
## Jan_2015_A Jan_2016_A Jan_2015_B Jan_2016_B
## 1 1 1 3 3
## 2 2 2 4 4
##
## [[2]]
## Feb_2015_C Feb_2016_C Feb_2015_D Feb_2016_D
## 1 5 5 7 7
## 2 6 6 8 8

returning from list to data.frame after lapply

I have a very simply question about lapply. I am transitioning from STATA to R and I think there is some very basic concept that I am not getting about looping in R. But I have been reading about it all afternoon and can't figure out a reasonable way to do this very simple thing.
I have three data frames df1, df2, and df3 that all have the same column names, in the same order, etc.
I want to rename their columns all at once.
I put the data frames in a list:
dflist <- list(df1, df2, df3)
What I want the new names to be:
varlist <- c("newname1", "newname2", "newname3")
Write a function that replaces names with those in varlist, and lapply it over the data frames
ChangeNames <- function(x) {
names(x) <- varlist
return(x)
}
dflist <- lapply(dflist, ChangeNames)
So, as far as I understand, R has changed the names of the copies of the data frames that I put in the list, but not the original data frames themselves. I want the data frames themselves to be renamed, not the elements of the list (which are trapped in a list).
Now, I can go
df1 <- as.data.frame(dflist[1])
df2 <- as.data.frame(dflist[2])
df2 <- as.data.frame(dflist[3])
But that seems weird. You need a loop to get back the elements of a loop?
Basically: once you've put some data frames in a list and run your function on them via lapply, how do you get them back out of the list, without starting back at square one?

If you just want to change the names, that isn't too hard in R. Bear in mind that the assignment operator, <-, can be applied in sequence. Hence:
names(df1) <- names(df2) <- names(df3) <- c("newname1", "newname2", "newname3")

I am not sure I understand correctly, do you want to rename the columns of the data frames or the components of the list that contain the data frames?
If it is the first, please always search before asking, the question has been asked here.
So what you can easily do in case you have even more data frames in the list is:
# Creating some sample data first
> dflist <- list(df1 = data.frame(a = 1:3, b = 2:4, c = 3:5),
+ df2 = data.frame(a = 4:6, b = 5:7, c = 6:8),
+ df3 = data.frame(a = 7:9, b = 8:10, c = 9:11))
# See how it looks like
> dflist
$df1
a b c
1 1 2 3
2 2 3 4
3 3 4 5
$df2
a b c
1 4 5 6
2 5 6 7
3 6 7 8
$df3
a b c
1 7 8 9
2 8 9 10
3 9 10 11
# And do the trick
> dflist <- lapply(dflist, setNames, nm = c("newname1", "newname2", "newname3"))
# See how it looks now
> dflist
$df1
newname1 newname2 newname3
1 1 2 3
2 2 3 4
3 3 4 5
$df2
newname1 newname2 newname3
1 4 5 6
2 5 6 7
3 6 7 8
$df3
newname1 newname2 newname3
1 7 8 9
2 8 9 10
3 9 10 11
So the names were changed from a, b and c to newname1, newname2and newname3 for each data frame in the list.
If it is the second, you can do this:
> names(dflist) <- c("newname1", "newname2", "newname3")

Use object names within a list in lapply/ldply

In attempting to answer a question earlier, I ran into a problem that seemed like it should be simple, but I couldn't figure out.
If I have a list of dataframes:
df1 <- data.frame(a=1:3, x=rnorm(3))
df2 <- data.frame(a=1:3, x=rnorm(3))
df3 <- data.frame(a=1:3, x=rnorm(3))
df.list <- list(df1, df2, df3)
That I want to rbind together, I can do the following:
df.all <- ldply(df.list, rbind)
However, I want another column that identifies which data.frame each row came from. I expected to be able to use the deparse(substitute(x)) method (here and elsewhere) to get the name of the relevant data.frame and add a column. This is how I approached it:
fun <- function(x) {
name <- deparse(substitute(x))
x$id <- name
return(x)
}
df.all <- ldply(df.list, fun)
Which returns
a x id
1 1 1.1138062 X[[1L]]
2 2 -0.5742069 X[[1L]]
3 3 0.7546323 X[[1L]]
4 1 1.8358605 X[[2L]]
5 2 0.9107199 X[[2L]]
6 3 0.8313439 X[[2L]]
7 1 0.5827148 X[[3L]]
8 2 -0.9896495 X[[3L]]
9 3 -0.9451503 X[[3L]]
So obviously each element of the list does not contain the name I think it does. Can anyone suggest a way to get what I expected (shown below)?
a x id
1 1 1.1138062 df1
2 2 -0.5742069 df1
3 3 0.7546323 df1
4 1 1.8358605 df2
5 2 0.9107199 df2
6 3 0.8313439 df2
7 1 0.5827148 df3
8 2 -0.9896495 df3
9 3 -0.9451503 df3

Define your list with names and it should give you an .id column with the data.frame name
df.list <- list(df1=df1, df2=df2, df3=df3)
df.all <- ldply(df.list, rbind)
Output:
.id a x
1 df1 1 1.84658809
2 df1 2 -0.01177462
3 df1 3 0.58579469
4 df2 1 -0.64748756
5 df2 2 0.24384614
6 df2 3 0.59012676
7 df3 1 -0.63037679
8 df3 2 -1.17416295
9 df3 3 1.09349618
Then you can know the data.frame name from the column df.all$.id
Edit:
As per #Gary Weissman's comment if you want to generate the names automatically you can do
names(df.list) <- paste0('df',seq_along(df.list)

Using base only, one could try something like:
dd <- lapply(seq_along(df.list), function(x) cbind(df_name = paste0('df',x),df.list[[x]]))
do.call(rbind,dd)

In your definition, df.list does not have names, however, even then the deparse substitute idiom does not appear to work easilty (as lapply calls .Internal(lapply(X, FUN)) -- you would have to look at the source to see if the object name was available and how to get it
Something like
names(df.list) <- paste('df', 1:3, sep = '')
foo <- function(n, .list){
.list[[n]]$id <- n
.list[[n]]
}
a x id
1 1 0.8204213 a
2 2 -0.8881671 a
3 3 1.2880816 a
4 1 -2.2766111 b
5 2 0.3912521 b
6 3 -1.3963381 b
7 1 -1.8057246 c
8 2 0.5862760 c
9 3 0.5605867 c

if you want to use your function, instead of deparse(substitute(x)) use match.call(), and you want the second argument, making sure to convert it to character
name <- as.character(match.call()[[2]])

interweave two data.frames in R

I would like to interweave two data.frame in R. For example:
a = data.frame(x=1:5, y=5:1)
b = data.frame(x=2:6, y=4:0)
I would like the result to look like:
> x y
1 5
2 4
2 4
3 3
3 3
...
obtained by cbinding x[1] with y[1], x[2] with y[2], etc.
What is the cleanest way to do this? Right now my solution involves spitting everthing out to a list and merging. This is pretty ugly:
lst = lapply(1:length(x), function(i) cbind(x[i,], y[i,]))
res = do.call(rbind, lst)

There is, of course, the interleave function in the "gdata" package:
library(gdata)
interleave(a, b)
# x y
# 1 1 5
# 6 2 4
# 2 2 4
# 7 3 3
# 3 3 3
# 8 4 2
# 4 4 2
# 9 5 1
# 5 5 1
# 10 6 0

You can do this by giving x and y an index, rbind them and sort by the index.
a = data.frame(x=1:5, y=5:1)
b = data.frame(x=2:6, y=4:0)
df <- rbind(data.frame(a, index = 1:nrow(a)), data.frame(b, index = 1:nrow(b)))
df <- df[order(df$index), c("x", "y")]

This is how I'd approach:
dat <- do.call(rbind.data.frame, list(a, b))
dat[order(dat$x), ]
do.call was unnecessary in the first step but makes the solution more extendable.

Perhaps this is cheating a bit, but the (non-exported) function interleave from ggplot2 is something I've stolen for my own uses before:
as.data.frame(mapply(FUN=ggplot2:::interleave,a,b))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Add new column to data.frame through loop in R - r

Related

Change column names in list of dataframes

combine list of data frames in list in specific manner

returning from list to data.frame after lapply

Use object names within a list in lapply/ldply

interweave two data.frames in R

Categories

Resources