Starting from a list of lists like outcome:
id <- c(1,2,3,4,5,1,2,3,4,5)
month <- c(3,4,2,1,5,7,3,1,8,9)
preds <- c(0.5,0.1,0.15,0.23,0.75,0.6,0.49,0.81,0.37,0.14)
l_1 <- data.frame(id, preds, month)
preds <- c(0.45,0.18,0.35,0.63,0.25,0.63,0.29,0.11,0.17,0.24)
l_2 <- data.frame(id, preds, month)
preds <- c(0.58,0.13,0.55,0.13,0.76,0.3,0.29,0.81,0.27,0.04)
l_3 <- data.frame(id, preds, month)
preds <- c(0.3,0.61,0.18,0.29,0.85,0.76,0.56,0.91,0.48,0.91)
l_4 <- data.frame(id, preds, month)
outcome <- list(l_1, l_2, l_3, l_4)
My interest is to take the assigned unique row values and create a new variable as if we do:
sample <- outcome[[1]]
sample$unique_id <- rownames(sample)
However, I don´t want to go manually because my list has 100 lists.
Moreover, I don´t want to assign values manually to each row because I want to preserve the row names generated by R.
Any clue?
We may also use rownames_to_column
library(dplyr)
library(purrr)
library(tibble)
map(outcome, ~ .x %>%
rownames_to_column('unique_id'))
With lapply and cbind:
lapply(outcome, function(x) {
cbind(unique_id=rownames(x), x)
})
Another base R option is to use Map
Map(function(x){
x$unique_id <- rownames(x)
x
}, outcome)
Try using lapply
lapply(outcome, function(x) {
x$unique_id <- rownames(x)
x
})
Related
I try to write a looping code with ID in a data.frame df. what I did right now is I build another list dm which contains the unique ID from df$ID:
dm<-df %>% select(ID) %>% unique()
for (i in 1:length( dm$ID)){
df_new<-df %>% filter(ID %in% dm$ID[i])
...
Current codes can do what I need. But I wonder whether there is another way to do it without building dm? I want to build subset by each ID in df. Any suggestion?
Instead of looping over the unique 'ID' and subseting, a faster option is split which will split the data.frame into list of data.frame based on the unique values of 'ID'
df_list <- split(df, df$ID)
From here, we can either use lapply or a for loop
pdf(paste0(out_dir, output_date,'.pdf'))
for(i in seq_along(df_list)) {
ggplot(data = df_list[[i]]) +
...
}
dev.off()
Or with lapply
pdf(paste0(out_dir, output_date,'.pdf'))
lapply(df_list, function(dat)
ggplot(data = dat) +
...
)
dev.off()
Regarding the creation of an object of unique 'ID', a better option is
for(un in unique(df$ID)) {
df_new <- df %>%
filter(ID == un)
ggplot(df_new) +
...
}
I have the following large dataframes:
Jan_Feb2019
Mar_Apr2019
May_Jun2019
Jul_Aug2019
Sep_Oct2019
Nov_Dec2019
Jan_Feb2020
Mar_2020
And i use the following code to generate other dataframes and fill the columns with the data i want.
#Jan_Feb2019
Jan_Feb2019_df <- as.data.frame(Jan_Feb2019$reactions$summary$total_count)
colnames(Jan_Feb2019_df)[1] <- "Reactions"
Jan_Feb2019_df$Shares <- Jan_Feb2019$shares$count
Jan_Feb2019_df$Comments <- Jan_Feb2019$comments$summary$total_count
Jan_Feb2019_df$Message <- Jan_Feb2019$message
Jan_Feb2019_df$Likes <- Jan_Feb2019$likes$summary$total_count
Jan_Feb2019_df$CreatedDate <- Jan_Feb2019$created_time
Jan_Feb2019_df$PostID <- Jan_Feb2019$id
Jan_Feb2019_df$Love <- Jan_Feb2019$reacts_love$summary$total_count
Jan_Feb2019_df$Angry <- Jan_Feb2019$reacts_angry$summary$total_count
Jan_Feb2019_df$Sad <- Jan_Feb2019$reacts_sad$summary$total_count
Jan_Feb2019_df$HAHA <- Jan_Feb2019$reacts_haha$summary$total_count
Jan_Feb2019_df$WOW <- Jan_Feb2019$reacts_wow$summary$total_count
Jan_Feb2019_df$CreatedDate <- anytime(Jan_Feb2019_df[,6])
Jan_Feb2019_df$insights.data <- Jan_Feb2019$insights$data
Jan_Feb2019_df <- Jan_Feb2019_df %>%
unnest(insights.data) %>%
unnest(values) %>%
select(Message,Shares,Comments,Reactions,Likes,CreatedDate,PostID,Love,Angry,Sad,HAHA,WOW,name,value) %>%
pivot_wider(names_from = name, values_from = value)
Is there a way to iterate between all the above dataframes, so i won't have to repeat the process 8 times?
Thanks
The code below is untested. I have tried to follow the code in the question, making it general. There are 2 functions.
fillNewDf takes the old object as only argument and creates and fills the new data frame.
makeNewDf takes the old object name as an argument and calls fillNewDf returning its value.
If the objects are in the global environment then makeNewDf argument envir default value is used.
fillNewDf <- function(X){
vec <- X[['reactions']][['summary']][['total_count']]
Y <- data.frame(Reactions = vec)
Y[['Shares']] <- X[['shares']][['count']]
Y[['Comments']] <- X[['comments']][['summary']][['total_count']]
Y[['Message']] <- X[['message']]
Y[['Likes']] <- X[['likes']][['summary']][['total_count']]
Y[['CreatedDate']] <- X[['created_time']]
Y[['PostID']] <- X[['id']]
Y[['Love']] <- X[['reacts_love']][['summary']][['total_count']]
Y[['Angry']] <- X[['reacts_angry']][['summary']][['total_count']]
Y[['Sad']] <- X[['reacts_sad']][['summary']][['total_count']]
Y[['HAHA']] <- X[['reacts_haha']][['summary']][['total_count']]
Y[['WOW']] <- X[['reacts_wow']][['summary']][['total_count']]
Y[['CreatedDate']] <- anytime(Y[, 6])
Y[['insights.data']] <- X[['insights']][['data']]
Y %>%
unnest(insights.data) %>%
unnest(values) %>%
select(Message, Shares, Comments, Reactions, Likes, CreatedDate, PostID, Love, Angry, Sad, HAHA, WOW, name, value) %>%
pivot_wider(names_from = name, values_from = value)
}
makeNewDf <- function(X, envir = .GlobalEnv){
DF <- get(X, envir = envir)
filNewDf(DF)
}
Now get the names of the objects to be processed with ls() and create a list with the new data frames.
old_names <- ls(pattern = '\\d{4}$')
new_list <- lapply(old_list, makeNewDf)
names(new_list) <- paste(old_names, "df", sep = "_")
If these new data frames are to become objects in the global environment, list2env(new_list) will create them with the same names as the names attribute of new_list.
I've got a rather large list that contains many dataframes of the same length. I'd like to rename all the column names in the list. I've tried to use purrr::map, but have hit various issues. Is there a better way to do this?
Here is a reprex of the approach and issues I'm having with it. Thanks.
library(tidyverse)
org_names <- names(
starwars %>%
select_if(
Negate(is.list))
)
df <- starwars %>%
select_if(Negate(is.list))
names(df) <- sample(LETTERS, length(df), replace = F)
df_ls <- list(df, list(df, df), list(df, df, df), df, list(df, df))
map(df_ls, function(x){
x %>%
set_names(org_names)
})
#> `nm` must be `NULL` or a character vector the same length as `x`
As some of the elements are nested list, can use a condition to check if it is a list, then do the set_names by looping inside the list
library(tidyverse)
map(df_ls, ~ if(is.data.frame(.x)) .x %>%
set_names(org_names) else
map(.x, ~ .x %>%
set_names(org_names)))
Or it can be made more compact with map_if
out <- map_if(df_ls, is.data.frame, set_names, org_names,
.else = ~ map(.x, set_names, org_names))
I want to turn a table into a data frame. Three columns should be there: 1. the zip code 2 outcome "0" and 3 outcome "1". But as.data.frame.matrix turns the zip-code into row names and makes them unusable.
I tried to add a fourth column with imaginary ID's (1:100) so R makes them to row names but R tells me, that "all arguments must be the same length" - which they are!
id <- 1:5000
zip <- sample(100:200, 5000, replace = TRUE)
outcome <- rbinom(5000, 1, 0.23)
df <- data.frame(id, outcome, zip)
abs <- table(df$zip, df$outcome)
abs <- as.data.frame.matrix(abs)
Some has a nice and slick idea? Thanks in advance!
Edit:
When:
abs <- as.matrix(as.data.frame(abs))
I get something close to what I want but the outcomes are together in one column. How to untie them, to make them look like the table again?
You can get to your desired result easier with dplyr and tidyr:
library(dplyr)
library(tidyr)
id <- 1:5000
zip <- sample(100:200, 5000, replace = TRUE)
outcome <- rbinom(5000, 1, 0.23)
df <- data.frame(id, outcome, zip)
df <- df %>% group_by(zip, outcome) %>%
summarise(freq = n()) %>%
ungroup() %>%
spread(outcome, freq)
You are supplying only a 100 values to a data.frame that has 101 rows.
> nrow(abs)
[1] 101
so this would work
abs$new_col <- 1:101
I think you want this:
abs2 <- as.data.frame(abs) %>% select(2,3,1)
I want to impute a variable x3 by the its mean corresponding to each cluster calculated considering other 2 variables X1 and X2.
I know that you can pass a function to impute from Hmisc package, like "mean" and it does the work. So I would like to pass a function that does all the following.
I use to write the code to do so:
df1 <- data.frame(x1=runif(1000,0,100),
x2=runif(1000,0,100),
x3=c(runif(900,0,100),rep(NA,100)))
I want to pass a function that does all of this:
clust<-kmeans(df1[,-grep('x3', colnames(df1))], 3)
df1$clust<-clust$cluster
library(plyr)
cc<-ddply(df1, 'clust',summarise, mean=mean(x3, na.rm=TRUE))
df2<-merge(df1,cc, by='clust')
df2$x3imputed2<-ifelse(is.na(df2$x3),df2$mean, df2$x3)
Is there a way to pass all this code as a function and use it in Hmisc? (I had a problem with ddply introducing x3 as a variable).
Something like the following:
ff<-function(i) {
clust<-kmeans(df1[,-grep(i, colnames(df1))], 3)
df1$clust<-clust$cluster
cc<-aggregate(df1[,i], by=list(clust=df1$clust), "mean", na.rm=TRUE)
df2<-merge(df1,cc, by='clust')
df2$x3imputed2<-ifelse(is.na(df2[, i]),df2$x, df2[,i])
}
f1$imputedx3<-with(df1, impute(x3,ff))
But I get an error:
empty cluster: try a better set of initial centers
And when I replace it by x3 I don't get the same error.
Try
library(lazyeval)
library(dplyr)
f1 <- function(dat, cname){
#get the third argument i.e, 'cname'
nm1 <- match.call()[[3]]
#paste 'imputed' for renaming the new column later
nm2 <- paste0(nm1, 'imputed')
#create an numeric column index that will be removed in kmeans calc
indx <- grep(cname, colnames(dat))
#get the 'kmeans' of the columns other than the 'cname'
clust <- kmeans(dat[,-indx],3)$cluster
#group by 'clust' and create new column with 'mutate'
dat %>%
group_by(clust=clust) %>%
mutate_(interp(~ifelse(is.na(v), mean(v, na.rm=TRUE), v),
v=as.name(cname))) %>%
#rename the column
setNames(., c(head(names(.),-1), nm2))
}
f1(df1, 'x3')
Or you could pass it without quotes by using v= lazy(cname)
f2 <- function(dat, cname){
nm1 <- match.call()[[3]]
nm2 <- paste0(nm1, 'imputed')
indx <- grep(nm1, colnames(dat))
clust <- kmeans(dat[,-indx],3)$cluster
dat %>%
group_by(clust=clust) %>%
mutate_(interp(~ifelse(is.na(v), mean(v, na.rm=TRUE), v),
v= lazy(cname))) %>%
setNames(., c(head(names(.),-1), nm2))
}
f2(df1, x3)