Columns names goes away when using lapply - r

I'm trying to keep the column name in a list of data frames when using lapply function.
I have a list of data frames. Let's say:
lst:
[[1]] [[2]]
A ind C ind
1 0 4 2
2 1 8 0
I'm trying to get elements of the first columns of each dataframe ([[1]] and [[2]]) which has the index 0 in the second column of each data frame.
I'm using the code
aux <- lapply(lst, function(x) x[,1][x[,2]==0])
And it is working. The only problem is that I 'd like to keep the first column names. It means I'd like to get
aux:
[[1]] [[2]]
A C
1 8
but I'm getting
aux:
[[1]] [[2]]
V1 V1
1 8
How can I keep the column names intact?
data
lst <- list(
data.frame(A=1:2, ind = 0:1),
data.frame(C=c(4,8), ind = c(2,0))
)

We can just subset the first column
lapply(lst, function(x) x[x[,2] == 0, 1, drop = FALSE])
Or with tidyverse, this can made more compact
library(purrr)
map(lst, ~ .x[!.x[,2],1, drop = FALSE])

Here is another way that might be a bit more readable, using subset,
lapply(l1, function(i) subset(i, i[2] == 0)[1])

Related

How to modify a list of data.frame and then output the data.frame

I want to create a second column in each of a list of data.frames that is just a duplicate of the first column, and then output those data.frames:
store the data frames:
> FileList <- list(DF1, DF2)
Add another column to each data frame:
> ModifiedDataFrames <- lapply(1:length(FileList), function (x) {FileList[[x]]$Column2 == FileList[[x]]$Column1})
but ModifiedDataFrames[[1]] just returns a list which contains what I assume is the content from DF1$Column1
What am I missing here?
There are a few problems with your code. First, you are using the equivalence operator == for assignment and second you are not returning the correct element from your function. Here is a possible solution:
df1 <- data.frame(Column1 = c(1:3))
df2 <- data.frame(Column1 = c(4:6))
FileList <- list(df1, df2)
ModifiedDataFrames <- lapply(FileList, function(x) {
x$Column2 <- x$Column1
return(x)
})
> ModifiedDataFrames
[[1]]
Column1 Column2
1 1 1
2 2 2
3 3 3
[[2]]
Column1 Column2
1 4 4
2 5 5

Extract and append data to new datasets in a for loop

I have (what I think) is a really simple question, but I can't figure out how to do it. I'm fairly new to lists, loops, etc.
I have a small dataset:
df <- c("one","two","three","four")
df <- as.data.frame(df)
df
I need to loop through this dataset and create a list of datasets, such that this is the outcome:
[[1]]
one
[[2]]
one
two
[[3]]
one
two
three
This is more or less as far as I've gotten:
blah <- list()
for(i in 1:3){
blah[[i]]<- i
}
The length will be variable when I use this in the future, so I need to automate it in a loop. Otherwise, I would just do
one <- df[1,]
two <- df[2,]
list(one, rbind(one, two))
Any ideas?
You can try using lapply :
result <- lapply(seq(nrow(df)), function(x) df[seq_len(x), , drop = FALSE])
result
#[[1]]
# df
#1 one
# [[2]]
# df
#1 one
#2 two
#[[3]]
# df
#1 one
#2 two
#3 three
#[[4]]
# df
#1 one
#2 two
#3 three
#4 four
seq(nrow(df)) creates a sequence from 1 to number of rows in your data (which is 4 in this case). function(x) part is called as anonymous function where each value from 1 to 4 is passed to one by one. seq_len(x) creates a sequence from 1 to x i.e 1 to 1 in first iteration, 1 to 2 in second and so on. We use this sequence to subset the rows from dataframe (df[seq_len(x), ]). Since the dataframe has only 1 column when we subset it , it changes it to a vector. To avoid that we add drop = FALSE.
Base R solution:
# Coerce df vector of data.frame to character, store as new data.frame: str_df => data.frame
str_df <- transform(df, df = as.character(df))
# Allocate some memory in order to split data into a list: df_list => empty list
df_list <- vector("list", nrow(str_df))
# Split the string version of the data.frame into a list as required:
# df_list => list of character vectors
df_list <- lapply(seq_len(nrow(str_df)), function(i){
str_df[if(i == 1){1}else{1:i}, grep("df", names(str_df))]
}
)
Data:
df <- c("one","two","three","four")
df <- as.data.frame(df)
df

Renaming columns in list of data frames using match from another data frame

I realise I asked two separate questions in my last post and had one of them answered by very clever peeps super quickly.
Obviously I still can't wrap my head around data frame lists or lapply!
I have csv list of original questions and renamed questions. In this example, I am trying to write the code to update Q.1a to Q.1 as per the Qs data frame.
df1 <- data.frame("ID" = 1, "Q.1" = 2, Q1.1 = 3)
df2 <- data.frame("ID." = 2, "Q.1a" = 3, Q1.1 = 4)
dflist <- lapply(ls(), function(x) if (class(get(x)) == "data.frame") get(x))
dflist <- Filter(length, dflist)
Qs <- data.frame("Original.Name" = "Q.1a", "New.Name" = "Q.1")
The tables look like this: I want to update Q.1a as per the Qs table
ID Q.1a Q1.1
1 1 2 3
ID. Q.1 Q1.1
1 2 3 4
Original.Name New.name
1 Q.1a Q.1
The code I am trying to write to rename the questions that is currently full of errors, I am sure the piping is not supposed to be there!
lapply(dflist, function(x) {
names(x) <- names (x) %in%
Qs$Original.name = Qs$New.name[match(names(x)[names(x) %in% Qs$Original.name],
Qs$Original.name)]
})
Can anyone point me in the right direction? Thanks so much.
Edited to show expected output where Q1a from the original example above has been upated to Q1.
ID Q.1 Q1.1
1 1 2 3
ID. Q.1 Q1.1
1 2 3 4
Ideally I want to be able to match and replace the column names from the Qs table. The original column name replaced with new column name
You can use ifelse with match to get new names of the columns.
dflist <- lapply(dflist, function(x) {
names(x) <- ifelse(names(x) %in% Qs$Original.Name,
Qs$New.Name[match(names(x), Qs$Original.Name)], names(x))
x
})
dflist
#[[1]]
# ID Q.1 Q1.1
#1 1 2 3
#[[2]]
# ID. Q.1 Q1.1
#1 2 3 4

Replace NA by 0 by 'x[is.na(x)]=0' replaces whole data frame, not just NAs? [duplicate]

This question already has an answer here:
How do I change NA to 0 with lapply()?
(1 answer)
Closed 4 years ago.
I am little bit confused. My dataframes in a list contain NA values, which I would like to replace by 0.
On a single dataframe, I can easily to this by df[is.na(df)]=0, and this works well when applied on a single data.frame.
However, when applied on a list ( lapply(l, function(x) x[is.na(x)]=0)), this generates dataframes containing only 0.
Dummy data:
df1<-data.frame(class = rep("BO", 3),
a = c(NA,2,3))
df2<-data.frame(class = rep("BS", 3),
a = c(5,NA,7))
l<-list(df1, df2)
# Convert NA to 0
l2<-lapply(l, function(x) x[is.na(x)]=0)
Results in:
[[1]]
[1] 0
[[2]]
[1] 0
But how can I get this?
[[1]]
class a
1 BO 0
2 BO 2
3 BO 3
[[2]]
class a
1 BS 5
2 BS 0
3 BS 7
We need to return 'x'. Here, we are only returning the assignment 0. The dataset is x from the lambda function call
lapply(l, function(x) {x[is.na(x)] <- 0
x})
This can be done in a single statement with a wrapper replace (which does the assignment internally and return the 'x'
lapply(l, function(x) replace(x, is.na(x), 0))
where replace is
function (x, list, values) {
x[list] <- values
x
}
In addition to the base R, option, we can do this with tidyverse as well
library(tidyverse)
map(l, ~ .x %>%
mutate_all(replace_na, 0))
As we are replacing only the numeric column missing values with 0, we can use mutate_if
map(l, ~ .x %>%
mutate_if(is.numeric, replace_na, 0))

Select column name based on data frame content R

I want to build a matrix or data frame by choosing names of columns where the element in the data frame contains does not contain an NA. For example, suppose I have:
zz <- data.frame(a = c(1, NA, 3, 5),
b = c(NA, 5, 4, NA),
c = c(5, 6, NA, 8))
which gives:
a b c
1 1 NA 5
2 NA 5 6
3 3 4 NA
4 5 NA 8
I want to recognize each NA and build a new matrix or df that looks like:
a c
b c
a b
a c
There will be the same number of NAs in each row of the input matrix/df. I can't seem to get the right code to do this. Suggestions appreciated!
library(dplyr)
library(tidyr)
zz %>%
mutate(k = row_number()) %>%
gather(column, value, a, b, c) %>%
filter(!is.na(value)) %>%
group_by(k) %>%
summarise(temp_var = paste(column, collapse = " ")) %>%
separate(temp_var, into = c("var1", "var2"))
# A tibble: 4 × 3
k var1 var2
* <int> <chr> <chr>
1 1 a c
2 2 b c
3 3 a b
4 4 a c
Here's a possible vectorized base R approach
indx <- which(!is.na(zz), arr.ind = TRUE)
matrix(names(zz)[indx[order(indx[, "row"]), "col"]], ncol = 2, byrow = TRUE)
# [,1] [,2]
#[1,] "a" "c"
#[2,] "b" "c"
#[3,] "a" "b"
#[4,] "a" "c"
This finds non-NA indices, sorts by rows order and then subsets the names of your zz data set according to the sorted index. You can wrap it into as.data.frame if you prefer it over a matrix.
EDIT: transpose the data frame one time before process, so don't need to transpose twice in loop in first version.
cols <- names(zz)
for (column in cols) {
zz[[column]] <- ifelse(is.na(zz[[column]]), NA, column)
}
t_zz <- t(zz)
cols <- vector("list", length = ncol(t_zz))
for (i in 1:ncol(t_zz)) {
cols[[i]] <- na.omit(t_zz[, i])
}
new_dt <- as.data.frame(t(do.call("cbind", cols)))
The tricky part here is your goal actually change data frame structure, so the task of "remove NA in each row" have to build row by row as new data frame, since every column in each row could came from different column of original data frame.
zz[1, ] is a one row data frame, use t to convert it into vector so we can use na.omit, then transpose back to row.
I used 2 for loops, but for loops are not necessarily bad in R. The first one is vectorized for each column. The second one need to be done row by row anyway.
EDIT: growing objects is very bad in performance in R. I knew I can use rbindlist from data.table which can take a list of data frames, but OP don't want new packages. My first attempt just use rbind which could not take list as input. Later I found an alternative is to use do.call. It's still slower than rbindlist though.

Resources