I need to append data.tables to an empty list in a way that by calling an index I get the entire data table.
What I'm getting now is a list of the columns of the appended data tables so that at a particular index instead of getting a data table a get a column from one of those tables.
empty_list <- list()
dt<-data.table(c(1,2,3,4,5,6), c(4,5,6,7,8,9))
empty_list <- append(empty_list, dt)
empty_list[1]
You want to append to empty_list another list(...) object. So:
empty_list <- list()
dt<-data.table(c(1,2,3,4,5,6), c(4,5,6,7,8,9))
empty_list <- append(empty_list, list(dt))
empty_list[1]
#[[1]]
# V1 V2
#1: 1 4
#2: 2 5
#3: 3 6
#4: 4 7
#5: 5 8
#6: 6 9
As a simple representative example, consider that a data.table/data.frame is really just a fancy list.
is.list(data.table(3,4))
#[1] TRUE
str(append(list(1,2), list(3,4)))
#List of 4
# $ : num 1
# $ : num 2
# $ : num 3
# $ : num 4
str(append(list(1,2), list(list(3,4))))
#List of 3
# $ : num 1
# $ : num 2
# $ :List of 2
# ..$ : num 3
# ..$ : num 4
Related
If we want to make a reproducible question on a complex/large dataset for SO, we can use dput(head(df)) to reduce the size.
Is there a similar approach to reduce the size of complex nested lists with varying list lengths? I'm thinking an approach could be to take the first few elements from each list (say first 3) irrespective of individual list type (numeric, character etc.) and nested structure but I'm not sure how to do this.
#sample nested list
L <- list(
list(1:10),
list( list(1:10), list(1:10,1:10) ),
list(list(list(list(1:10))))
)
Running dput(L) will naturally produce the structure for the whole list. Is there a simple way to reduce the overall length of this list (something like dput(head(L))?
I don't want to edit the structure of the list, e.g. I don't want to flatten first or anything - I just want to reduce the size of it and keep all attributes etc.
Thanks
Edit
#thelatemail solution works well:
rapply(L, f = head, n = 3, how = "list")
What if we had a data.frame in the list though, this approach splits the df into separate lists (which I assume is to be expected as list is specified in the rapply call)?. Is there a way to modify this so that it returns head(df) as a data.frame. df included:
L_with_df <- list(
list(1:10),
list( list(1:10), list(1:10,1:10), df = data.frame(a = 1:20, b = 21:40) ),
list(list(list(list(1:10))))
)
rapply(L_with_df, f = head, n = 3, how = "list")
Edit 2
It seems rapply wont work on data.frames, see here.
However, rrapply here, which is an extension of rapply seems to do what I want:
library(rrapply)
rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE)
# [[1]]
# [[1]][[1]]
# [1] 1 2 3
# [[2]]
# [[2]][[1]]
# [[2]][[1]][[1]]
# [1] 1 2 3
# [[2]][[2]]
# [[2]][[2]][[1]]
# [1] 1 2 3
# [[2]][[2]][[2]]
# [1] 1 2 3
# [[2]]$df
# a b
# 1 1 21
# 2 2 22
# 3 3 23
# [[3]]
# [[3]][[1]]
# [[3]][[1]][[1]]
# [[3]][[1]][[1]][[1]]
# [[3]][[1]][[1]][[1]][[1]]
# [1] 1 2 3
# Warning message:
# In rrapply(L_with_df, f = head, n = 3, dfaslist = FALSE) :
# 'dfaslist' is deprecated, use classes = 'data.frame' instead
#this produces different output?:
#rrapply(L_with_df, f = head, n = 3, classes = "data.frame")
Let's create a nested list to serve as an example.
L <- list(
list(1:10),
list( list(1:10), list(1:10,1:10) ),
list(list(list(list(1:10))))
)
Which has a structure of this:
str(L)
#List of 3
# $ :List of 1
# ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 2
# ..$ :List of 1
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# ..$ :List of 2
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
# $ :List of 1
# ..$ :List of 1
# .. ..$ :List of 1
# .. .. ..$ :List of 1
# .. .. .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
I think using the recursive apply, rapply, with a f=unction of head can handle this without breaking the structure.
rapply(L, f=head, n=3, how="list")
Checks out:
str(rapply(L, f=head, n=3, how="list"))
#List of 3
# $ :List of 1
# ..$ : int [1:3] 1 2 3
# $ :List of 2
# ..$ :List of 1
# .. ..$ : int [1:3] 1 2 3
# ..$ :List of 2
# .. ..$ : int [1:3] 1 2 3
# .. ..$ : int [1:3] 1 2 3
# $ :List of 1
# ..$ :List of 1
# .. ..$ :List of 1
# .. .. ..$ :List of 1
# .. .. .. ..$ : int [1:3] 1 2 3
I just need to replicate my data.frame n times (e.g. 100) and save all the outputs into a list.
It should be quite easy and straightforward but I could not find any solution yet.
Fake data.frame:
df = read.table(text = 'a b
1 2
5 6
4 4
11 78
23 99', header = TRUE)
With lapply:
df_list <- lapply(1:100, function(x) df)
We can use replicate
n <- 100
lst <- replicate(n, df, simplify = FALSE)
You can use rep if you wrap it in list, as rep tries to return the same type of object you pass it:
df_list <- rep(list(df), 100)
str(df_list[1:2])
#> List of 2
#> $ :'data.frame': 5 obs. of 2 variables:
#> ..$ a: int [1:5] 1 5 4 11 23
#> ..$ b: int [1:5] 2 6 4 78 99
#> $ :'data.frame': 5 obs. of 2 variables:
#> ..$ a: int [1:5] 1 5 4 11 23
#> ..$ b: int [1:5] 2 6 4 78 99
I have a list of dataframes.
Each dataframe is named by person and each dataframe contains events (the row). The columns for each event are called 'Indication for event' and 'Number of biopsies' . I would like to create a summary dataframe (or matrix?) that tells me how many biopsies are taken for each Indication by each person.
List of 3
$ :'data.frame': 3 obs. of 2 variables:
..$ Indication: Factor w/ 2 levels "AbdoPain","Vomiting": 1 2 1
..$ NumOfBx : num [1:3] 2 3 1
$ :'data.frame': 4 obs. of 2 variables:
..$ Indication: Factor w/ 3 levels "AbdoPain","Anaemia",..: 2 2 1 3
..$ NumOfBx : num [1:4] 12 23 1 5
$ :'data.frame': 4 obs. of 2 variables:
..$ Indication: Factor w/ 3 levels "AbdoPain","Anaemia",..: 2 1 3 3
..$ NumOfBx : num [1:4] 1 2 3 7
The results:
dfMrBen dfJohn dfStuart
Abdo pain
Vomiting
Anaemia
I thought this was likely to be a split-apply-combine problem but I don't know how to combine to get the summary as above. At the moment I have:
ReportOp<-function(x){
#To extract the dataframe name
theName<-x
#To extract the dataframe data
x<-data.frame(Dxlst[[x]])
grp<-x%>% group_by(Indication %>% summarise(mean=mean(NumberOfBx)
}
lapply(names(Dxlst,ReportOp)
but this just gives me the summary for each dataframe. How do I combine basically add the dataframes together to get the intended result?
first combine the data in one big dataframe or do this after summary with
do.call(rbind, Dxlst)
or first add id's to each list and then rbind them together like so:
Dxlst <- lapply(1:length(Dxlst),
function(x) cbind(Dxlst[[x]],
id = rep(x,nrow(Dxlst[[x]]))))
do.call(rbind, Dxlst)
Not exactly what you are looking for. But it is close. Also you should combine the data frame then so a summary which would be simpler.
Create the data:-
df1=data.frame(Indication=as.factor(sample(c(0,1), 10, replace = T)), Bx=sample(1:10, 10, replace = T))
df2=data.frame(Indication=as.factor(sample(c(0,1,2), 10, replace = T)), Bx=sample(1:10, 10, replace = T))
l=list(df1,df2)
then
l=lapply(l, function(x) aggregate( Bx ~ Indication, x, sum))
m=max(sapply(l, nrow))
n=lapply(l, function(x){ x <- x[seq_len(m),]; row.names(x) <- NULL; x})
do.call('cbind',n)
I get output like:
Indication Bx Indication Bx
1 0 18 0 9
2 1 28 1 35
3 <NA> NA 2 18
I want to convert a list of integer matrices to numeric. I know that lapply is not friendly to internal structures, but is there a lapply-solution?
mtList = list(matrix(sample(1:10),nrow=5),
matrix(sample(1:21),nrow=7))
str(mtList)
# This works, and I could wrap it in a for loop
mtList[[1]][] = as.numeric(mtList[[1]])
mtList[[2]][] = as.numeric(mtList[[2]])
str(mtList)
# But how to use lapply here? Note that the internal
# matrix structure is flattened
mtList1 = lapply(mtList,function(x)x[] = as.numeric(x))
str(mtList1)
You have to return the value in your lapply workhorse function:
mtList1 = lapply(mtList,function(x) {
x[] = as.numeric(x)
x
})
str(mtList1)
# List of 2
# $ : num [1:5, 1:2] 1 7 6 3 9 10 5 2 8 4
# $ : num [1:7, 1:3] 21 3 15 14 6 4 18 17 9 8 ...
A marginally simpler alternative could be to coerce by multiplying by 1.0 (or arguably, more robustly, raising to the power of 1.0)...
mtList1 <- lapply( mtList , "*" , 1.0 )
str(mtList1)
#List of 2
# $ x: num [1:5, 1:2] 2 8 1 5 7 9 3 10 4 6
# $ x: num [1:7, 1:3] 17 20 9 11 2 18 1 4 12 10 ...
My dataset is pretty big. I have about 2,000 variables and 1,000 observations.
I want to run a model for each variable using other variables.
To do so, I need to drop variables which have missing values where the dependent variable doesn't have.
I meant that for instance, for variable "A" I need to drop variable C and D because those have missing values where variable A doesn't have. for variable "C" I can keep variable "D".
data <- read.table(text="
A B C D
1 3 9 4
2 1 3 4
NA NA 3 5
4 2 NA NA
2 5 4 3
1 1 1 2",header=T,sep="")
I think I need to make a loop to go through each variable.
I think this gets what you need:
for (i in 1:ncol(data)) {
# filter out rows with NA's in on column 'i'
# which is the column we currently care about
tmp <- data[!is.na(data[,i]),]
# now column 'i' has no NA values, so remove other columns
# that have NAs in them from the data frame
tmp <- tmp[sapply(tmp, function(x) !any(is.na(x)))]
#run your model on 'tmp'
}
For each iteration of i, the tmp data frame looks like:
'data.frame': 5 obs. of 2 variables:
$ A: int 1 2 4 2 1
$ B: int 3 1 2 5 1
'data.frame': 5 obs. of 2 variables:
$ A: int 1 2 4 2 1
$ B: int 3 1 2 5 1
'data.frame': 4 obs. of 2 variables:
$ C: int 3 3 4 1
$ D: int 4 5 3 2
'data.frame': 5 obs. of 1 variable:
$ D: int 4 4 5 3 2
I'll provide a way to get the usable vadiables for each column you choose:
getVars <- function(data, col){
tmp<-!sapply(data[!is.na(data[[col]]),], function(x) { any(is.na(x)) })
names(data)[tmp & names(data) != col]
}
PS: I'm on my phone so I didn't test the above nor had the chance for a good code styling.
EDIT: Styling fixed!