I have several data frames all with the same colnames. I want to merge two columns in each one of them to create a new column.
data frames looks like this:
I want the output to look like this:
Normally I would do this very easily for one data frame:
a$XY_ID <- paste(a$X,ak$Y,sep=":")
How to do this for all the dataframes in a list?
Thanks for the help!
You can use paste command in lapply :
dfList <- lapply(dfList, function(x) transform(x, XY_ID = paste(X,Y,sep=":")))
In tidyverse you can use map to iterate over list and unite to combine columns.
dfList <- purrr::map(dfList, ~tidyr::unite(.x, XY_ID, X, Y, sep = ":", remove = FALSE))
We can use do.call with paste and it would also work if there are many columns to concatenate
dfList <- lapply(dfList, function(x) {
x$XYID <- do.call(paste, c(x[c('X', 'Y')], sep=":"))
x })
Related
I want do same things to create different data frames, can I use lapply achieve?
I tried to did it but not succeed
xx<-c("a1","b1")
lapply(xx, function(x){
x<-data.frame(c(1,2,3,4),"1")
})
I hope I can get two data frames ,like
a1<-data.frame(c(1,2,3,4),"1")
b1<-data.frame(c(1,2,3,4),"1")
An option that assigns to the .Globalenv. This as pointed out is less efficient but was provided to answer the OP's question as is:
lapply(xx, function(x) assign(x,data.frame(A=c(1,2,3,4),
B="1"),
envir=.GlobalEnv))
You can then call each data frame with their names.
a1, b1.
You could try using sapply over the xx vector of names to populate a list with the data frames:
lst <- list()
xx <- c("a1", "b1")
sapply(xx, function(x) {
lst[[x]] <- data.frame(c(1,2,3,4), "1")
})
Then, you may access each data frame using the list, e.g. lst$a1.
I'm beginning with R so I'm not really good at searching relevant answer for my question. I am sorry if similar questions have been asked.
I have a list made of data frames and lists.
I'd like to know how to keep only data frames so that I can bind them together to produce on huge data frame.
here I give you an example :
L1 <- list(c(1, "abc", 3))
L2 <- list(c("b","d"))
L3 <- list(L1,L2)
brand <- c("A","B","C","D")
price <- c(1,1,3,7)
df <- data.frame(brand , price)
brand2 <- c("E","F","G","H")
price2 <- c(20,3,5,10)
df2 <- data.frame(brand2, price2)
L4 <- list(df, L3, df2)
finaldf <- do.call("rbind.fill", L4)
Unfortunately I got this error : Error: All inputs to rbind.fill must be data.frames
So I know that the problem is that there is a list in that list L4. In my real data, there are even several lists in the big list. So can anyone tell me how to get rid of these lists inside the big list ? Thank you very very much !
You need to filter out which list entries are not data.frames like so:
is_df <- sapply(L4, is.data.frame)
finaldf <- do.call("rbind.fill", L4[is_df])
Alterntatively,
do.call("rbind.fill", Filter(is.data.frame, L4))
You can create an index to subset your list like so:
# Subset list
index <- sapply(L4, is.data.frame)
and then use it to make your final data.frame like so:
finaldf <- do.call("rbind", L4[index])
Keep in mind that in order for this to work both dataframes have to have the same column names, so when you create df2 you should specify the column names like so:
df2 <- data.frame(brand = brand2, price = price)
... before you even do the above.
I would like to convert a data.frame into a list of data.frames by column using base R functions and holding the first column constant. For example, I would like to split DF into a list of three data.frames, each of which includes the first column. That is, I would like to end up with the list named LONG without having to type out each list element out separately. Thank you.
DF <- data.frame(OBS=1:10,HEIGHT=rnorm(10),WEIGHT=rnorm(10),TEMP=rnorm(10))
DF
LONG <- list(HEIGHT = DF[c("OBS", "HEIGHT")],
WEIGHT = DF[c("OBS", "WEIGHT")],
TEMP = DF[c("OBS", "TEMP" )])
LONG
SHORT <- as.list(DF)
SHORT
SPLIT <- split(DF, col(DF))
We can loop through the names of 'DF' except the first one, cbind the first column with the subset of 'DF' from the names.
setNames(lapply(names(DF)[-1], function(x) cbind(DF[1], DF[x])), names(DF)[-1])
Or another option would be
Map(cbind, split.default(DF[-1], names(DF)[-1]), OBS=DF[1])
I have a function to deduplicate a data frame so that each person (indexed by PatID) is represented once by the latest record (largest RecID):
dedupit <- function(x) {
x <- x[order(x$PatID, -x$RecID),]
x <- x[ !duplicated(x$PatID), ]
return(x)
}
It can deduplicate and replace a dataframe if I do:
df <- dedupit(df)
But I have multiple data frames that need deduplication. Rather than write the above code for each individual data frame, I would like to apply a the dedupit function across multiple dataframes at once so that it replaces the unduplicated dataframe with the duplicated version.
I was able to make a list of the dataframes and lapply the function across each element in the list with:
listofdifs <- list(df1, df2, ....)
listofdfs <- lapply(trial, function(x) dedupit(x))
Though, it only modifies the elements of the list and does not replace the unduplicated dataframes. How do I apply this function to modify and replace multiple dataframes?
Does it work?
Name your dataframes when creating the list, so you can recover them afterwards
list.df <- list(df1 = df1, df2 = df2, df3 = df3)
list2env(lapply(list.df, dedupit), .GlobalEnv)
As a result your dataframes df1, df2, df3 will be the deduplicate version.
unlist a list of dataframes
I have several data frames df1, df, 2...., df10. Columns (variables) are the same in all of them.
I want to create a new variable within each of them. I can easily do it "manually" as follows:
df1$newvariable <- ifelse(df1$oldvariable == 999, NA, df1$oldvariable)
or, alternatively
df1 = transform(df1, df1$newvariable= ifelse(df1$oldvariable==999, NA, df1$oldvariable)))
Unfortunately I'm not able to do this in a loop. If I write
for (i in names) { #names is the list of dataframes
i$newvariable <- ifelse(i$oldvariable == 999, NA, i$oldvariable)
}
I get the following output
Error in i$oldvariable : $ operator is invalid for atomic vectors
What I'd do is to pool all data.frame on to a list and then use lapply as follows:
df1 <- as.data.frame(matrix(runif(2*10), ncol=2))
df2 <- as.data.frame(matrix(runif(2*10), ncol=2))
df3 <- as.data.frame(matrix(runif(2*10), ncol=2))
df4 <- as.data.frame(matrix(runif(2*10), ncol=2))
# create a list and use lapply
df.list <- list(df1, df2, df3, df4)
out <- lapply(df.list, function(x) {
x$id <- 1:nrow(x)
x
})
Now, you'll have all the data.frames with a new column id appended and out is a list of data.frames. You can access each of the data.frames with x[[1]], x[[2]] etc...
This has been asked many times. The $<- is not capable of translating that "i" index into either the first or second arguments. The [[<- is capable of doing so for the second argument but not the first. You should be learning to use lapply and you will probably need to do it with two nested lapply's, one for the list of "names" and the other for each column in the dataframes. The question is incomplete since it lacks specific examples. Make up a set of three dataframes, set some of the values to "999" and provide a list of names.