Compare two frames by parts R - r

I want to compare two data frames by parts. Here is an example of my data frames:
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = c(1,6,3,4), b=letters[1:4])
I would like to write a function which finds the two sequential rows in a1 which also exists in data frame a2 ( both columns have to match) and save it in new frame.
Any help would be appreciated.

dual.matches <- match(a1$a, a2$a) == match(a1$b, a2$b)
sequential.dual.matches <- with(rle(dual.matches), rep(replace(values, lengths==1, FALSE), lengths))
a1[sequential.dual.matches, ]
# a b
# 3 3 c
# 4 4 d

Related

Taking a subset of a main dataset based on the values of another data frame that is a subset of the main data frame

I have these two datasets : df as the main data frame and g as a created data frame
df = data.frame(x = seq(1,20,2),y = letters[1:10] )
df
g = data.frame(xx = c(2,3,4,5,7,8,9) )
and I want to take a subset of the data frame df based on the values xx of the data frame g as follows
m = df[df$x==g$xx,]
but the result is based on the match between the two data frames for the order of the matched values. not the matched values themselves.
output
> m
x y
2 3 b
I don't what the error I am making.
Maybe you need to use %in% instead of ==
> df[df$x %in% g$xx,]
x y
2 3 b
3 5 c
4 7 d
5 9 e
You can also use inner_join from dplyr:
library(dplyr)
df %>%
inner_join(g, by = c("x" = "xx"))
intersect can be useful too
df[intersect(df$x, g$xx),]
using merge
merge(df, g, by.x = "x", by.y = 'xx')
x y
1 3 b
2 5 c
3 7 d
4 9 e

Using lapply to a list of data frames so that column names are alphabetical for binding together?

I have a list of about 20 data frames that I would like to combine into 1 big dataframe the problem being some of the column orders in some of the dataframes do not match (the names do the order just does not)
I am trying to apply a: order colnames too all of the dataframes so they are all alphabetical and will be able to rbind them all together.
I am fairly new and may be going about it the wrong way. Any guidance would be appreciated
A fairly easy way of doing this is by using dplyr::bind_rows
lst <- list(
data.frame(a = 'A1', b = 'B1', c = 'C1'),
data.frame(a = 'A2', c = 'C2', b = 'B2'),
data.frame(c = 'C3', b = 'B3', a = 'A3')
)
dplyr::bind_rows(lst)
which yields
a b c
1 A1 B1 C1
2 A2 B2 C2
3 A3 B3 C3

Nested for loop leading to: Error in [<-.data.frame`(`*tmp*` replacement has x rows, data has y

I have 6 data frames (dfs) with a lot of data of different biological groups and another 6 data frames (tax.dfs) with taxonomical information about those groups. I want to replace a column of each of the 6 dfs with a column with the scientific name of each species present in the 6 tax.dfs.
To do that I created two lists of the data frames and I'm trying to apply a nested for loop:
dfs <- list(df.birds, df.mammals, df.crocs, df.snakes, df.turtles, df.lizards)
tax.dfs <- list(tax.birds,tax.mammals, tax.crocs, tax.snakes, tax.turtles, tax.lizards )
for(i in dfs){
for(y in tax.dfs){
i[,1] <- y[,2]
}}
And this is the output I'm getting:
Error in `[<-.data.frame`(`*tmp*`, , 1, value = c("Aotus trivirgatus", :
replacement has 64 rows, data has 43
But both data frames have the same number of rows, I actually used dfs to create tax.dfs applying the tnrs_match_names function from rotl package.
Any suggestions of how I could fix this error or that help me to find another way to do what I need to will be greatly appreciated.
Thank You!
For what it is worth, to iterate over two objects simultaneously, the following works:
Example Data:
df1 <- data.frame(a=1, b=2)
df2 <- data.frame(c=3, d=4)
df3 <- data.frame(e=5, f=6)
df_1 <- data.frame(a='A', b='B')
df_2 <- data.frame(c='C', d='D')
df_3 <- data.frame(e='E', f='F')
dfs <- list(df1, df2, df3)
df_s <- list(df_1, df_2, df_3)
Using mapply:
out <- mapply(function(one, two) {
one[,1] <- two[,2]
return(one)
}, dfs, df_s, SIMPLIFY = F )
out
[[1]]
a b
1 B 2
[[2]]
c d
1 D 4
[[3]]
e f
1 F 6
Here, one and two in mapply correspond to the different elements in dfs and df_s. Having said that, let's make it a bit more interesting. Let's change my third example to the following:
df_3 <- data.frame(e=c('E', 'e'), f=c('F', 'f'))
df_s <- list(df_1, df_2, df_3) # needs to be executed again
Now, let's adjust the function:
out <- mapply(function(one, two) {
if(nrow(one) != nrow(two)){return('Wrong dimensions')}
one[,1] <- two[,2]
return(one)
}, dfs, df_s, SIMPLIFY = F )
out
[[1]]
a b
1 B 2
[[2]]
c d
1 D 4
[[3]]
[1] "Wrong dimensions"

Change all cases of elements and headers in a list of data frames to lowercase

I have a list of data frames:
mylist<-list(df1=data.frame(Var1=c("A","b","c"), var2=
c("a","b","c")), df2= data.frame(var1 = c("a","B","c"),
VAr2=c("A","B","C")))
I would like to change all cases within the column headings and each element of a string to lowercase. (This way when I merge data frames all variables merge correctly and 'cat' vs "Cat' are not read as different entries).
The output would look like:
mylist<-list(df1=data.frame(var1=c("a","b","c"), var2=
c("a","b","c")), df2= data.frame(var1 = c("a","b","c"),
var2=c("a","b","c")))
I have tried the following:
cleandf <- lapply(mylist, function(x) tolower(mylist[x])
Here is a similar approach to Masoud's answer
lapply(mylist, function(x) {
names(x) <- tolower(names(x))
x[] <- lapply(x, tolower)
x
})
#$df1
# var1 var2
#1 a a
#2 b b
#3 c c
#$df2
# var1 var2
#1 a a
#2 b b
#3 c c
The first lapply iterates over your list. For each data frame - represented by the x - we first change its column names. The second lapply then applies tolower to each column of your data frames.

Compare 2 tables and extract rows only unique to table 2

I have two matrices with differing row numbers and column numbers. Can I compare row names and extract only the rows in table 2 that are not in table 1?
i.e.
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3], c=letters[4:6])
a3 <- as.matrix(a1)
a4 <- as.matrix (a2)
row.names(a3) <- c("chr1:981994", "chr1:1025751", "chr1:1026919", "chr1:1118414", "chr1:1119410" )
row.names(a4) <- c("chr1:1118414", "chr1:1119410", "chr1:1216877")
So then compare the two and create a new matrix with
the last row from table 2 as it unique to table 2.
We can use %in% to compare between the row names
a4[!row.names(a4) %in% row.names(a3), , drop=FALSE]
# a b c
#chr1:1216877 "3" "c" "f"

Resources