How to collect outputs of vector-valued function into a dataframe? - r

I have a function f1 that takes a number k as input and returns 3 numbers k, k+1, k+2. I would like to ask how to concatenate these results into a dataframe for k from 1 to 10. In this way, the line k corresponds to the output f1(k).
f1 <- function(k){
return (c(k, k+1, k+2))
}
f1(1)
f1(2)

An option is to Vectorize the function 'f1', pass the values 1 to 10, returns a matrix, and then convert it to data.frame with as.data.frame
as.data.frame(Vectorize(f1)(1:10))
If it needs to be vertical, then transpose the output and apply as.data.frame
as.data.frame(t(Vectorize(f1)(1:10)))
-output
# V1 V2 V3
#1 1 2 3
#2 2 3 4
#3 3 4 5
#4 4 5 6
#5 5 6 7
#6 6 7 8
#7 7 8 9
#8 8 9 10
#9 9 10 11
#10 10 11 12
Or we can use outer
as.data.frame(outer(1:10, 0:2, `+`))

You can also use:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
Output:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
V1 V2 V3
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12

Related

selecting common columns from different elements of a list

I have a data set in list format. The list is further divide into 20 elements. Each element contains 12 rows and some columns. Now I want to extract common columns from each element of the list and make a new data set. I try to make a reproducible example. Please see code
a<-data.frame(x=(1:10),y=(1:10),z=(1:10))
b<-data.frame(x=(1:10),y=(1:10),n=(1:10))
c<-data.frame(x=(1:10),y=(1:10),q=(1:10))
data<-list(a,b,c)
data1<-ldply(data)
required_data<-data1[,-3:-5]
Find the common columns using Reduce, subset them from list and bind them together
cols <- Reduce(intersect, lapply(data, colnames))
do.call(rbind, lapply(data, `[`, cols))
# x y
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
#8 8 8
#9 9 9
#10 10 10
#11 1 1
#...
The last step can also be performed using
purrr::map_df(data, `[`, cols)
with base R, you can fist find the names in common
commonName <- names((r<-table(unlist(Map(names,data))))[r>1])
then retrieve the columns from list and integrate (similar to the second step in the solution by #Ronak Shah)
res <- Reduce(rbind,lapply(data, '[',commonName))
which gives:
> res
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 1 1
12 2 2
13 3 3
14 4 4
15 5 5
16 6 6
17 7 7
18 8 8
19 9 9
20 10 10
21 1 1
22 2 2
23 3 3
24 4 4
25 5 5
26 6 6
27 7 7
28 8 8
29 9 9
30 10 10

Function to remove columns with max value less than a given value,

I'm doing initial data clean up with 34,000 columns in a dataframe and in order to do that, i have to remove columns whose max value is less than 2.
I'm clueless as to how to remove columns with maxvalue less than 2 but for just getting max values, I tried creating a function as below without converting data with is.numeric:
protein <- is.numeric(protein)
#a:
colMax <- function(data) sapply(data, max, na.rm = TRUE)
colMax(protein)
I got the max not meaningful for factors error, which is why i used the is.numeric function to convert all data to numeric form. despite doing that I still am not getting the desired result. When running the function I got 0 as a result rather than a list of max values for each column.
Why am i getting 0 for my max function?How do I setup a function that can generate max values for each column and remove any columns whose max values are less than 2? Would I need 2 separate functions?
Here is another way using dplyr to select columns where max value is greater than equal to 2. Assuming, we want to test for all the columns and all those columns are of class factor. Using #Maurits data
library(dplyr)
df %>%
#Convert column from factor to numeric
mutate_all(~as.numeric(as.character(.))) %>%
#Select column whose max value is greater than equal to 2
select_if(~max(., na.rm = TRUE) >= 2)
# V3 V4 V5 V6 V7 V8 V9 V10
#1 3 4 5 6 7 8 9 10
#2 3 4 5 6 7 8 9 10
#3 3 4 5 6 7 8 9 10
#4 3 4 5 6 7 8 9 10
#5 3 4 5 6 7 8 9 10
#6 3 4 5 6 7 8 9 10
#7 3 4 5 6 7 8 9 10
#8 3 4 5 6 7 8 9 10
#9 3 4 5 6 7 8 9 10
#10 3 4 5 6 7 8 9 10
Instead of max, we can also use any
df %>%
mutate_all(~as.numeric(as.character(.))) %>%
select_if(~any(. >= 2))
You say that you have 34000 columns. Do you want to check for greater than 2 condition for all the columns? Are all the columns factors ? The above code checks for all the columns and selects the one which do not satisfy the condition. If you want to do this on selected columns (not all), you might need to subset data, select those column and then apply the code.
In base R, we can also use colSums after converting the data from factor to numeric
df[] <- lapply(df, function(x) as.numeric(as.character(x)))
df[, colSums(df >= 2) > 0]
You were nearly there.
Since you don't provide reproducible sample data let's first create some minimal sample data
df <- as.data.frame(matrix(rep(1:10, each = 10), ncol = 10))
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 1 2 3 4 5 6 7 8 9 10
#2 1 2 3 4 5 6 7 8 9 10
#3 1 2 3 4 5 6 7 8 9 10
#4 1 2 3 4 5 6 7 8 9 10
#5 1 2 3 4 5 6 7 8 9 10
#6 1 2 3 4 5 6 7 8 9 10
#7 1 2 3 4 5 6 7 8 9 10
#8 1 2 3 4 5 6 7 8 9 10
#9 1 2 3 4 5 6 7 8 9 10
#10 1 2 3 4 5 6 7 8 9 10
We now would like to keep only those columns where the max value is >2; we can do this using sapply
df[sapply(df, function(x) max(x, na.rm = T) > 2)]
# V3 V4 V5 V6 V7 V8 V9 V10
#1 3 4 5 6 7 8 9 10
#2 3 4 5 6 7 8 9 10
#3 3 4 5 6 7 8 9 10
#4 3 4 5 6 7 8 9 10
#5 3 4 5 6 7 8 9 10
#6 3 4 5 6 7 8 9 10
#7 3 4 5 6 7 8 9 10
#8 3 4 5 6 7 8 9 10
#9 3 4 5 6 7 8 9 10
#10 3 4 5 6 7 8 9 10
Explanation: sapply loops over the columns of the data.frame df and returns a logical vector (with as many entries as there are columns in df).
Or we can use pmax with apply
df[apply(pmax(df) > 2, 2, all)]
giving the same result. The difference to the first method is that pmax returns a matrix on which we operate column-wise with apply(..., MARGIN = 2, ...).

Merge 2 rows with duplicated pair of values into a single row

I have the dataframe below in which there are 2 rows with the same pair of values for columns A and B -3RD AND 4RTH with 2 3 -, -7TH AND 8TH with 4 6-.
master <- data.frame(A=c(1,1,2,2,3,3,4,4,5,5), B=c(1,2,3,3,4,5,6,6,7,8),C=c(5,2,5,7,7,5,7,9,7,8),D=c(1,2,5,3,7,5,9,6,7,0))
A B C D
1 1 1 5 1
2 1 2 2 2
3 2 3 5 5
4 2 3 7 3
5 3 4 7 7
6 3 5 5 5
7 4 6 7 9
8 4 6 9 6
9 5 7 7 7
10 5 8 8 0
I would like to merge these rows into one by adding the pipe | operator between values of C and D. The 2nd and 3rd line for example would be like:
A B C D
2 3 2|5 2|5
I think your combined pairs are off by a row in your example, assuming that's the case, this is what you're looking for. We group by the columns we want to collapse the duplicates out of, and then use summarize_all with paste0 to combine the values with a separator.
library(tidyverse)
master %>% group_by(A,B) %>% summarize_all(funs(paste0(., collapse="|")))
A B C D
<dbl> <dbl> <chr> <chr>
1 1 1 5 1
2 1 2 2 2
3 2 3 5|7 5|3
4 3 4 7 7
5 3 5 5 5
6 4 6 7|9 9|6
7 5 7 7 7
8 5 8 8 0
We can do this in base R with aggregate
aggregate(.~ A + B, master, FUN = paste, collapse= '|')
# A B C D
#1 1 1 5 1
#2 1 2 2 2
#3 2 3 5|7 5|3
#4 3 4 7 7
#5 3 5 5 5
#6 4 6 7|9 9|6
#7 5 7 7 7
#8 5 8 8 0

Removing rows from each dataframe in list with condition in R

I have such a list:
df1 <- data.frame(a=c(NA, NA, 1:10), b=c(NA, 1:11))
df2 <- data.frame(a=1:10, b=c(NA,1:9))
mylist <- list(df1, df2)
> mylist
[[1]]
a b
1 NA NA
2 NA 1
3 1 2
4 2 3
5 3 4
6 4 5
7 5 6
8 6 7
9 7 8
10 8 9
11 9 10
12 10 11
[[2]]
a b
1 1 NA
2 2 1
3 3 2
4 4 3
5 5 4
6 6 5
7 7 6
8 8 7
9 9 8
10 10 9
I'd like to remove all rows with more than 1 NA in a row in each data frame. How can I do that?
I found out how to delete rows
lapply(mylist, `[`, -1,)
and how to calculate the sum of NAs
NAsums <- function(x) {rowSums(is.na(x))}
lapply(mylist, NAsums)
But I can't figure out how to combine the two steps..
We loop through the list (lapply), use rowSums to get the number of NA elements in each row, convert to a logical vector (<2), and use that to subset the rows.
lapply(mylist, function(x) x[rowSums(is.na(x))<2,])
#[[1]]
# a b
#2 NA 1
#3 1 2
#4 2 3
#5 3 4
#6 4 5
#7 5 6
#8 6 7
#9 7 8
#10 8 9
#11 9 10
#12 10 11
#[[2]]
# a b
#1 1 NA
#2 2 1
#3 3 2
#4 4 3
#5 5 4
#6 6 5
#7 7 6
#8 8 7
#9 9 8
#10 10 9

Eliminate in an increasing order rows in a data frame

Eliminate in an increasing order rows in a data frame
x<-c(4,5,6,23,5,6,7,8,0,3)
y<-c(2,4,5,6,23,5,6,7,8,0)
z<-c(1,2,4,5,6,23,5,6,7,8)
df<-data.frame(x,y,z)
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 23 6 5
5 5 23 6
6 6 5 23
7 7 6 5
8 8 7 6
9 0 8 7
10 3 0 8
I would like to eliminate number 23 in the df from all columns by instructing to sequentially increasingly remove a row per column (not by matching the value 23, but by its initial x location).
df
x y z
1 4 2 1
2 5 4 2
3 6 5 4
4 5 6 5
5 6 5 6
6 7 6 5
7 8 7 6
8 0 8 7
9 3 0 8
Thank you
You can iterate through the columns and remove the element from each, then reassemble as a data frame:
result <- as.data.frame(lapply(1:ncol(df), function(x) df[-(x+3),x]))
names(result) <- names(df)
result
## x y z
## 1 4 2 1
## 2 5 4 2
## 3 6 5 4
## 4 5 6 5
## 5 6 5 6
## 6 7 6 5
## 7 8 7 6
## 8 0 8 7
## 9 3 0 8
df[-(x+3),x] is the column with the value removed, by location. To start with row N in column x you would use df[-(x+N-1),x].
You could also try:
n <- 4
df1 <- df[-n,]
df1[] <- unlist(df,use.names=FALSE)[-seq(n, prod(dim(df)), by=nrow(df)+1)]
df1
# x y z
#1 4 2 1
#2 5 4 2
#3 6 5 4
#5 5 6 5
#6 6 5 6
#7 7 6 5
#8 8 7 6
#9 0 8 7
#10 3 0 8

Resources