Compare 2 tables and extract rows only unique to table 2 - r

I have two matrices with differing row numbers and column numbers. Can I compare row names and extract only the rows in table 2 that are not in table 1?
i.e.
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = 1:3, b=letters[1:3], c=letters[4:6])
a3 <- as.matrix(a1)
a4 <- as.matrix (a2)
row.names(a3) <- c("chr1:981994", "chr1:1025751", "chr1:1026919", "chr1:1118414", "chr1:1119410" )
row.names(a4) <- c("chr1:1118414", "chr1:1119410", "chr1:1216877")
So then compare the two and create a new matrix with
the last row from table 2 as it unique to table 2.

We can use %in% to compare between the row names
a4[!row.names(a4) %in% row.names(a3), , drop=FALSE]
# a b c
#chr1:1216877 "3" "c" "f"

Related

In which column there is a value of a specific variable

I have this dataframe:
a <- c(2,5,90,77,56,65,85,75,12,24,52,32)
b <- c(45,78,98,55,63,12,23,38,75,68,99,73)
c <- c(77,85,3,22,4,69,86,39,78,36,96,11)
d <- c(52,68,4,25,79,120,97,20,7,19,37,67)
e <- c(14,73,91,87,94,38,1,685,47,102,666,74)
df <- data.frame(a,b,c,d,e)
and this variable:
bb <- 120
I need to know the column number of df in which there is the value of the variable "bb". How can I do?
Thx everyone!
We could use which with arr.ind = TRUE to extract the row/col index after creating a logical matrix. Then, extract the second column to get the column index
which(df == bb, arr.ind = TRUE)[,2]
col
4
If there are duplicate elements in the column for the value compared, wrap with unique to return the unique column index
unique(which(df == bb, arr.ind = TRUE)[,2])
[1] 4
I think we could use grep
grep(bb, df)
[1] 4

R Check if multiple variables with the same pattern have the same values

I have some variables in my data frame that show the same pattern, and that should also have the same content. Now I want to check whether all rows show the same values for these variables. In this example, I want to compare all variables that start with "a" and want to get "True" if they are indeed all the same. How do I do that?
df = data.frame(
a1 = c(1,2,3),
nn22 = c(8,9,3),
a2 = c(1,2,3),
nn = c(8,9,3),
u6 = c(8,4,3),
o8 = c(3,9,1),
a3 = c(1,2,3),
a4 = c(1,2,3),
a5 = c(1,2,3),
a6 = c(1,2,3),
b= c(2,2,2))
We could split the data into a list of data.frame based on the prefix names and then use == by comparing the first column with all other columns after looping over the list with sapply. Wrap with all to check if we have all TRUEs
sapply(split.default(df, sub("\\d+$", "", names(df))), function(x) all(x[,1] == x))
# a b nn o u
#TRUE TRUE TRUE TRUE TRUE
If we need only to compare 'a' columns
dfa <- df[startsWith(names(df), 'a')]
all(dfa == dfa[,1])
#[1] TRUE

convert/combine 2 column dataframe to 1 column dataframe- R

The script below shows X, Y data that is stored in a two columns data.frame
a1 <- as.character(c(3456,2569))
a2 <- as.character(c(956,569))
a3 <- as.character(c(156,269))
mydf <- rbind(a1, a2, a3)
How can I stored it in a data.frame with one column in the format “X,Y” and add zero to each X and Y (characters).
so the output will be
"3456.000, 2569.000"
"956.000, 569.000"
"156.000, 269.000"
Something like this could work:
data.frame(col1 = apply(mydf, 1, function(x) paste(paste0(x, '.000'), collapse = ', ')))
# col1
#a1 3456.000, 2569.000
#a2 956.000, 569.000
#a3 156.000, 269.000
apply iterates per row of your matrix and firstly creates the number with the zeroes (that's paste0) and then merges everything in one comma separated string (that's paste).
Are all the numbers integers, or do some of them already have a decimal point? If it's the latter, you might want to do something like
sprintf("%.3f, %.3f", as.numeric(mydf[,1]), as.numeric(mydf[,2]))

Get only the structure(row names & column name) of data set in R

Consider a data frame with row names and column names:
> data <- data.frame(a=1:3,b=2:4,c=3:5,row.names=c("x","y","z"))
> data
a b c
x 1 2 3
y 2 3 4
z 3 4 5
I just want to display the row names and column names of data like:
a b c
x
y
z
Perhaps you need
data[] <- ''
data
# a b c
#x
#y
#z
If we need only the names, then dimnames is an option which return the row names and column names in a list.
dimnames(data)
#[[1]]
#[1] "x" "y" "z"
#[[2]]
#[1] "a" "b" "c"
Or may be
m1 <- matrix("", ncol = ncol(data), nrow = nrow(data),
dimnames = list(rownames(data), colnames(data)) )
If you want to see the column names in your dataset just use this
print(names(dataset_name))
For its structure,
str(dataset_name)

Compare two frames by parts R

I want to compare two data frames by parts. Here is an example of my data frames:
a1 <- data.frame(a = 1:5, b=letters[1:5])
a2 <- data.frame(a = c(1,6,3,4), b=letters[1:4])
I would like to write a function which finds the two sequential rows in a1 which also exists in data frame a2 ( both columns have to match) and save it in new frame.
Any help would be appreciated.
dual.matches <- match(a1$a, a2$a) == match(a1$b, a2$b)
sequential.dual.matches <- with(rle(dual.matches), rep(replace(values, lengths==1, FALSE), lengths))
a1[sequential.dual.matches, ]
# a b
# 3 3 c
# 4 4 d

Resources