relisting resulted data.frame respect to input list - r

I feed inputList to my custom function, after several workflows(few simple filtration), I end up with data.frame resultDF, which needed to be relisted. I used relist to make resultDF has the same structure of inputList, but I got an error. Is there any simplest way of relisting resultDF? Can anyone point me out how to make this happen? Any idea? sorry for this simple question.
Here is input data.frame within the list:
inputList <- list(
bar=data.frame(from=c(8,18,33,53),
to=c(14,21,39,61), val=c(48,7,10,8)),
cat=data.frame(from=c(6,15,20,44),
to=c(10,17,34,51), val=c(54,21,14,12)),
foo=data.frame(from=c(11,43), to=c(36,49), val=c(49,13)))
After several workflows, I end up with this data.frame:
resultDF <- data.frame(
from=c(53,8,6,15,11,44,43,44,43),
to=c(61,14,10,17,36,51,49,51,49),
val=c(8,48,54,21,49,12,13,12,13)
)
I need to relist resultDF with the same structure of inputList. I used relit method, but I got an error.
This is my desired list:
desiredList <- list(
bar=data.frame(from=c(8,53), to=c(14,61), val=c(48,8)),
cat=data.frame(from=c(6,15,44,44), to=c(10,17,51,51), val=c(54,21,12,12)),
foo=data.frame(from=c(11,43,43), to=c(36,49,49), val=c(49,13,13))
)
How can I achieve desiredList ? Thanks in advance :)

We can loop through the 'inputList' and check whether the pasted row elements in 'resultDF' are %in% list elements and use that index to subset the 'resultDF'
lapply(inputList, function(x) resultDF[do.call(paste, resultDF) %in% do.call(paste, x),])
Another option is a join and then split. We rbind the 'inputList' to a data.table with an additional column 'grp' specifying the list names, join with the 'resultDF' on the column names of 'resultDF', and finally split the dataset using the 'grp' column
library(data.table)
dt <- rbindlist(inputList, idcol = "grp")[resultDF, on = names(resultDF)]
split(dt[,-1, with = FALSE], dt$grp)

Related

Converting list of Characters to Named num in R

I want to create a dataframe with 3 columns.
#First column
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
These names in column 1 are a bunch of named results of the cor.test function. The second column should consist of the correlation coefficents I get by writing ABC_D1$estimate, ABC_D2$estimate.
My problem is now that I dont want to add the $estimate manually to every single name of the first column. I tried this:
df1$C2 = paste0(df1$C1, '$estimate')
But this doesnt work, it only gives me this back:
"ABC_D1$estimate", "ABC_D2$estimate", "ABC_D3$estimate",
"ABC_E1$estimate", "ABC_E2$estimate", "ABC_E3$estimate",
"ABC_F1$estimate", "ABC_F2$estimate", "ABC_F3$estimate")
class(df1$C2)
[1] "character
How can I get the numeric result for ABC_D1$estimate in my dataframe? How can I convert these characters into Named num? The 3rd column should constist of the results of $p.value.
As pointed out by #DSGym there are several problems, including the it is not very convenient to have a list of character names, and it would be better to have a list of object instead.
Anyway, I think you can get where you want using:
estimates <- lapply(name_list, function(dat) {
dat_l <- get(dat)
dat_l[["estimate"]]
}
)
cbind(name_list, estimates)
This is not really advisable but given those premises...
Ok I think now i know what you need.
eval(parse(text = paste0("ABC_D1", '$estimate')))
You connect the two strings and use the functions parse and eval the get your results.
This it how to do it for your whole data.frame:
name_list = c("ABC_D1", "ABC_D2", "ABC_D3",
"ABC_E1", "ABC_E2", "ABC_E3",
"ABC_F1", "ABC_F2", "ABC_F3")
df1 = data.frame(C1 = name_list)
df1$C2 <- map_dbl(paste0(df1$C1, '$estimate'), function(x) eval(parse(text = x)))

Convert nested data.frame to a hierarchical list

Is there a neat way to convert a nested data.frame to a hierarchical list?
I do it below with a for loop, but ideally there is a neater solution that generalizes to an arbitrary number of nested columns.
nested_df <- expand.grid(V1 = c('a','b','c'),
V2 = c('z','y'))%>%
group_by_all()%>%
do(x=runif(10))%>%
ungroup
nested_ls <- list()
for(v1 in unique(nested_df$V1)){
for(v2 in unique(nested_df$V2)){
nested_ls[[v1]][[v2]] <- nested_df%>%
filter(V1==v1 & V2==v2)%>%
pull(x)%>%
unlist
}
}
str(nested_ls)
If you are not very strict with the names z and y, and can also work with [[1]] and [[2]], then you can directly do,
split(nested_df$x, nested_df$V1)
If you need the names, then
lapply(split(nested_df, nested_df$V1), function(i)split(i$x, i$V2))
#Or as #Frank mentions in comments, we can use setNames
lapply(split(nested_df, nested_df$V1), function(i) setNames(i$x, i$V2))

cbind dataframe in R with placeholders

Imagine I have three dataframes:
data.frame1 <- data.frame(x=c(1:10))
data.frame2 <- data.frame(x=c(11:20))
data.frame3 <- data.frame(x=c(21:30))
I could bind them together by explicitely naming each of them:
res.data.frame <- cbind(data.frame1, data.frame2, data.frame3)
However, I am looking for more dynamic ways to do so, e.g. with placeholders.
This saves somehow the three dataframes in a new dataframe, but not in a usable format:
res.data.frame1 <- as.data.frame(mapply(get, grep("^data.frame.$", ls(), value=T)))
This command would only save the three names:
res.data.frame2 <- grep(pattern = "^data.frame.$", ls(), value=T)
This one only gives an error message:
res.data.frame3 <- do.call(cbind, lapply(ls(pattern = "^data.frame.$")), get)
Does anyone know the right way to do this?
Something like this maybe?
Assuming ls()
# [1] "data.frame1" "data.frame2" "data.frame3"
as.data.frame(Reduce("cbind", sapply(ls(), function(i) get(i))))
Based on #akrun's comment, this can be simplified to
as.data.frame(Reduce("cbind", mget(ls())))

consecutive setdiff in datatable list

Using data organised as
dtl <- replicate(10,data.table(id=sample(letters,10),val=sample(10)), simplify=F)
lapply(dtl, function(x){setkey(x,'id')})
I need to extract a list of datatables that contain the rows in dtl[[n+1]]] with id not present in dtl[[n]]. I assume it would be something like
dtl2 <- list(setdiff(dtl[[1]][['id']],dtl[[2]][['id']]),setdiff(dtl[[2]][['id']],dtl[[3]][['id']]...)
Please notice that, while the setdiff should only take the id column into account, I expect the result to contain all columns from each datatable.
I think this will do it for you:
mapply(setdiff, head(dtl, -1), tail(dtl, -1), SIMPLIFY = FALSE)
Edit: with your new expected output, I would still use mapply as above, but with one of the following two changes:
replace setdiff with function(x,y)setdiff(x$id, y$id)
replace dtl with ids <- lapply(dtl, "[", "id")
Edit2:: you've changed your expected output again by adding a plain English description that does not match the code you had provided... I think you are now looking for this:
mapply(function(x,y)y[setdiff(y$id, x$id), ],
head(dtl, -1), tail(dtl, -1), SIMPLIFY = FALSE)

Rename columns of a data frame by searching column name

I am writing a wrapper to ggplot to produce multiple graphs based on various datasets. As I am passing the column names to the function, I need to rename the column names so that ggplot can understand the reference.
However, I am struggling with renaming of the columns of a data frame
here's a data frame:
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
here are my column names for search:
col1_search <- "col1"
col2_search <- "col2"
col3_search <- "col3"
and here are column names to replace:
col1_replace <- "new_col1"
col2_replace <- "new_col2"
col3_replace <- "new_col3"
when I search for column names, R sorts the column indexes and disregards the search location.
for example, when I run the following code, I expected the new headers to be new_col1, new_col2, and new_col3, instead the new column names are: new_col3, new_col2, and new_col1
colnames(df)[names(df) %in% c(col3_search,col2_search,col1_search)] <- c(col3_replace,col2_replace,col1_replace)
Does anyone have a solution where I can search for column names and replace them in that order?
require(plyr)
df <- data.frame(col2=1:3,col1=3:5,col3=6:8)
df <- rename(df, c("col1"="new_col1", "col2"="new_col2", "col3"="new_col3"))
df
And you can be creative in making that second argument to rename so that it is not so manual.
> names(df)[grep("^col", names(df))] <-
paste("new", names(df)[grep("^col", names(df))], sep="_")
> names(df)
[1] "new_col1" "new_col2" "new_col3"
If you want to replace an ordered set of column names with an arbitrary character vector, then this should work:
names(df)[sapply(oldNames, grep, names(df) )] <- newNames
The sapply()-ed grep will give you the proper locations for the 'newNames' vector. I suppose you might want to make sure there are a complete set of matches if you were building this into a function.
hmm, this might be way to complicated, but the first that come into my mind:
lookup <- data.frame(search = c(col3_search,col2_search,col1_search),
replace = c(col3_replace,col2_replace,col1_replace))
colnames(df) <- lookup$replace[match(lookup$search, colnames(df))]
I second #justin's aes_string suggestion. But for future renaming you can try.
require(stringr)
df <- data.frame(col1=1:3,col2=3:5,col3=6:8)
oldNames <- c("col1", "col2", "col3")
newNames <- c("new_col1", "new_col2", "new_col3")
names(df) <- str_replace(string=names(df), pattern=oldNames, replacement=newNames)

Resources