selecting common columns from different elements of a list - r

I have a data set in list format. The list is further divide into 20 elements. Each element contains 12 rows and some columns. Now I want to extract common columns from each element of the list and make a new data set. I try to make a reproducible example. Please see code
a<-data.frame(x=(1:10),y=(1:10),z=(1:10))
b<-data.frame(x=(1:10),y=(1:10),n=(1:10))
c<-data.frame(x=(1:10),y=(1:10),q=(1:10))
data<-list(a,b,c)
data1<-ldply(data)
required_data<-data1[,-3:-5]

Find the common columns using Reduce, subset them from list and bind them together
cols <- Reduce(intersect, lapply(data, colnames))
do.call(rbind, lapply(data, `[`, cols))
# x y
#1 1 1
#2 2 2
#3 3 3
#4 4 4
#5 5 5
#6 6 6
#7 7 7
#8 8 8
#9 9 9
#10 10 10
#11 1 1
#...
The last step can also be performed using
purrr::map_df(data, `[`, cols)

with base R, you can fist find the names in common
commonName <- names((r<-table(unlist(Map(names,data))))[r>1])
then retrieve the columns from list and integrate (similar to the second step in the solution by #Ronak Shah)
res <- Reduce(rbind,lapply(data, '[',commonName))
which gives:
> res
x y
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 1 1
12 2 2
13 3 3
14 4 4
15 5 5
16 6 6
17 7 7
18 8 8
19 9 9
20 10 10
21 1 1
22 2 2
23 3 3
24 4 4
25 5 5
26 6 6
27 7 7
28 8 8
29 9 9
30 10 10

Related

Match values based on multiple conditions from dataframes of different sizes in R

I have two dataframes of different sizes. Example:
t1 <- data.frame("id"=c(1,1,1,2,2,2,4,5,5,5,6,7,8),"condition"=c(3,3,1,5,5,5,10,10,5,5,2,3,1) )
t2 <- data.frame("ind"=c(1,2,4,5,6,7,8),"test_c"=c(3,5,10,10,2,3,1), "time"=c(32,55,21,34,55,22,19))
I would like to match the cases based on two criteria:
t1$id==t2$ind and t1$condition==t2$test_c and create an additional column in t1 based on the outcome of the variable t2$time under these two conditions.
Expected outcome:
t3 <- data.frame("id"=c(1,1,1,2,2,2,4,5,5,5,6,7,8),"condition"=c(3,3,1,5,5,5,10,10,5,5,2,3,1) , "time"=c (32,32,NA,55,55,55,21,34,NA,NA,55,22,19))
I suspect I should use merge or match functions but I am not sure which would be the right approach.
Base R
> out <- merge(t1, t2, by.x=c("id","condition"), by.y=c("ind","test_c"), all.x=TRUE)
> out
id condition time
1 1 1 NA
2 1 3 32
3 1 3 32
4 2 5 55
5 2 5 55
6 2 5 55
7 4 10 21
8 5 5 NA
9 5 5 NA
10 5 10 34
11 6 2 55
12 7 3 22
13 8 1 19
dplyr
library(dplyr)
left_join(t1, t2, by = c("id" = "ind", "condition" = "test_c"))
Differences with your t3
There are some differences between them. For the sake of display, I'll show them side-by-side, arranged so that we have an easier comparison.
cbind(out[with(out,order(id,condition)),], t3[with(t3,order(id,condition)),])
# id condition time id condition time
# 1 1 1 NA 1 1 NA
# 2 1 3 32 1 3 32
# 3 1 3 32 1 3 32
# 4 2 5 55 2 5 55
# 5 2 5 55 2 5 NA
# 6 2 5 55 2 5 NA
# 7 4 10 21 4 10 21
# 8 5 5 NA 5 5 NA
# 9 5 5 NA 5 5 NA
# 10 5 10 34 5 10 34
# 11 6 2 55 6 2 55
# 12 7 3 22 7 3 22
# 13 8 1 19 8 1 19
The only differences are with id=2,condition=5, where all of them in the merge are assigned the same time=55, and your t3 fills only the first of them. I don't think this is a "first only" logic, as there are other repeat id,condition that do not elicit the same response. I suspect this is just a mistake with the sample data, or perhaps there is post-merge processing you haven't told us yet :-)
In case you want to use match you can use in addition interaction (or paste) to use multiple columns.
t1$time <- t2[match(interaction(t1), interaction(t2[-3])), 3]
t1
# id condition time
#1 1 3 32
#2 1 3 32
#3 1 1 NA
#4 2 5 55
#5 2 5 55
#6 2 5 55
#7 4 10 21
#8 5 10 34
#9 5 5 NA
#10 5 5 NA
#11 6 2 55
#12 7 3 22
#13 8 1 19

How to collect outputs of vector-valued function into a dataframe?

I have a function f1 that takes a number k as input and returns 3 numbers k, k+1, k+2. I would like to ask how to concatenate these results into a dataframe for k from 1 to 10. In this way, the line k corresponds to the output f1(k).
f1 <- function(k){
return (c(k, k+1, k+2))
}
f1(1)
f1(2)
An option is to Vectorize the function 'f1', pass the values 1 to 10, returns a matrix, and then convert it to data.frame with as.data.frame
as.data.frame(Vectorize(f1)(1:10))
If it needs to be vertical, then transpose the output and apply as.data.frame
as.data.frame(t(Vectorize(f1)(1:10)))
-output
# V1 V2 V3
#1 1 2 3
#2 2 3 4
#3 3 4 5
#4 4 5 6
#5 5 6 7
#6 6 7 8
#7 7 8 9
#8 8 9 10
#9 9 10 11
#10 10 11 12
Or we can use outer
as.data.frame(outer(1:10, 0:2, `+`))
You can also use:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
Output:
as.data.frame(do.call(rbind,lapply(1:10,f1)))
V1 V2 V3
1 1 2 3
2 2 3 4
3 3 4 5
4 4 5 6
5 5 6 7
6 6 7 8
7 7 8 9
8 8 9 10
9 9 10 11
10 10 11 12

From one vector delete all elements of another vector in r [duplicate]

This question already has answers here:
R: Remove the number of occurrences of values in one vector from another vector, but not all
(2 answers)
Closed 6 years ago.
I have 2 vectors
vec_1
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14 2 3 4 5 6 7 8 9
[35] 10 11 12 13 14 2 3 4 5 6 7 8 9 10 11 12 13 14
vec_2
[1] 12 3 13 3 14 4 10 8 9 5 7 5 13 11 6 10 8 8 14 12 6 11 8 5 3 6
I want to delete all elements of vec_2 from vec_1
And sure, that function setdiff is not the case,because, for example, in vec_2 there are two 10s values. And I want to delete only to 10(not all four values of 10).
EDITED: expected output:
vec_1
[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
How can i do this in r?
Here is one idea via union
unlist(sapply(union(vec_1, vec_2), function(i)
rep(i, each = length(vec_1[vec_1 == i]) - length(vec_2[vec_2 == i]))))
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14
Definitely, not the best solution but here is one way.
I created a simplified example.
vec1 <- c(1, 2, 3, 1, 1, 5)
vec2 <- c(1, 3, 5)
#Converting the frequency table to a data frame
x1 <- data.frame(table(vec1))
x2 <- data.frame(table(vec2))
#Assuming your vec1 has all the elements present in vec2
new_df <- merge(x1, x2, by.x = "vec1", by.y = "vec2", all.x = TRUE)
new_df
# vec1 Freq.x Freq.y
#1 1 3 1
#2 2 1 NA
#3 3 1 1
#4 5 1 1
#Replacing NA's by 0
new_df[is.na(new_df)] <- 0
#Subtracting the frequencies of common elements in two vectors
final <- cbind(new_df[1], new_df[2] - new_df[3])
final
# vec1 Freq.x
#1 1 2
#2 2 1
#3 3 0
#4 5 0
#Recreating a new vector based on the final dataframe
rep(final$vec1, times = final$Freq.x)
# [1] 1 1 2
You can do this using a simple for loop:
for(i in 1:length(vec2)){
i=which(vec1 %in% vec2[i])[1]
vec1=vec1[-i]
}
You just identify the first position and remove from the original vector.
You can try this too:
for (el in vec2[vec2 %in% intersect(vec1, vec2)])
vec1 <- vec1[-which(vec1==el)[1]]
sort(vec1)
#[1] 2 2 2 2 3 4 4 4 5 6 7 7 7 9 9 9 10 10 11 11 12 12 13 13 14 14

How to generate an uneven sequence of numbers in R

Here's an example data frame:
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
I want to generate a sequence of numbers according to the number of observations of y per x group (e.g. there are 2 observations of y for x=1). I want the sequence to be continuously increasing and jumps by 2 after each x group.
The desired output for this example would be:
1,2,5,6,7,10,11,14,17,20,21,22,25,26
How can I do this simply in R?
To expand on my comment, the groupings can be arbitrary, you simply need to recast it to the correct ordering. There are a few ways to do this, #akrun has shown that this can be accomplished using match function, or you can make use the the as.numeric function if this is easier to understand for yourself.
df <- data.frame(x=c(1,1,2,2,2,3,3,4,5,6,6,6,9,9),y=c(1,2,3,4,6,3,7,8,6,4,3,7,3,2))
# these are equivalent
df$newx <- as.numeric(factor(df$x, levels=unique(df$x)))
df$newx <- match(df$x, unique(df$x))
Since you now have a "new" releveling which is sequential, we can use the logic that was discussed in the comments.
df$newNumber <- 1:nrow(df) + (df$newx-1)*2
For this example, this will result in the following dataframe:
x y newx newNumber
1 1 1 1
1 2 1 2
2 3 2 5
2 4 2 6
2 6 2 7
3 3 3 10
3 7 3 11
4 8 4 14
5 6 5 17
6 4 6 20
6 3 6 21
6 7 6 22
9 3 7 25
9 2 7 26
where df$newNumber is the output you wanted.
To create the sequence 0,0,4,4,4,9,..., basically what you're doing is taking the minimum of each group and subtracting 1. The easiest way to do this is using the library(dplyr).
library(dplyr)
df %>%
group_by(x) %>%
mutate(newNumber2 = min(newNumber) -1)
Which will have the output:
Source: local data frame [14 x 5]
Groups: x
x y newx newNumber newNumber2
1 1 1 1 1 0
2 1 2 1 2 0
3 2 3 2 5 4
4 2 4 2 6 4
5 2 6 2 7 4
6 3 3 3 10 9
7 3 7 3 11 9
8 4 8 4 14 13
9 5 6 5 17 16
10 6 4 6 20 19
11 6 3 6 21 19
12 6 7 6 22 19
13 9 3 7 25 24
14 9 2 7 26 24

Remove rows from a single-column data frame

When I try to remove the last row from a single column data frame, I get a vector back instead of a data frame:
> df = data.frame(a=1:10)
> df
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
> df[-(length(df[,1])),]
[1] 1 2 3 4 5 6 7 8 9
The behavior I'm looking for is what happens when I use this command on a two-column data frame:
> df = data.frame(a=1:10,b=11:20)
> df
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
> df[-(length(df[,1])),]
a b
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
My code is general, and I don't know a priori whether the data frame will contain one or many columns. Is there an easy workaround for this problem that will let me remove the last row no matter how many columns exist?
Try adding the drop = FALSE option:
R> df[-(length(df[,1])), , drop = FALSE]
a
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9

Resources