Sort a data.frame using row names of another data.frame - r

I need to sort the following data.frame (table 1):
X Y
A 1
B 5
C 0
D 3
based on the results of another data.frame (table 2):
X Y
C 10
B 9
A 8
D 7
So, data.frame # 1 ends like this:
X Y
C 0
B 5
A 1
D 3
How do I do this? I've tried to use:
table1[order(row names(table1),]
But I get the following error:
Subscript out of bound.

This should give the desired result:
table1[order(table2$X),]

Related

Count of unique values across all columns in a data frame

We have a data frame as below :
raw<-data.frame(v1=c("A","B","C","D"),v2=c(NA,"B","C","A"),v3=c(NA,"A",NA,"D"),v4=c(NA,"D",NA,NA))
I need a result data frame in the following format :
result<-data.frame(v1=c("A","B","C","D"), v2=c(3,2,2,3))
Used the following code to get the count across one particular column :
count_raw<-sqldf("SELECT DISTINCT(v1) AS V1, COUNT(v1) AS count FROM raw GROUP BY v1")
This would return count of unique values across an individual column.
Any help would be highly appreciated.
Use this
table(unlist(raw))
Output
A B C D
3 2 2 3
For data frame type output wrap this with as.data.frame.table
as.data.frame.table(table(unlist(raw)))
Output
Var1 Freq
1 A 3
2 B 2
3 C 2
4 D 3
If you want a total count,
sapply(unique(raw[!is.na(raw)]), function(i) length(which(raw == i)))
#A B C D
#3 2 2 3
We can use apply with MARGIN = 1
cbind(raw[1], v2=apply(raw, 1, function(x) length(unique(x[!is.na(x)]))))
If it is for each column
sapply(raw, function(x) length(unique(x[!is.na(x)])))
Or if we need the count based on all the columns, convert to matrix and use the table
table(as.matrix(raw))
# A B C D
# 3 2 2 3
If you have only character values in your dataframe as you've provided, you can unlist it and use unique or to count the freq, use count
> library(plyr)
> raw<-data.frame(v1=c("A","B","C","D"),v2=c(NA,"B","C","A"),v3=c(NA,"A",NA,"D"),v4=c(NA,"D",NA,NA))
> unique(unlist(raw))
[1] A B C D <NA>
Levels: A B C D
> count(unlist(raw))
x freq
1 A 3
2 B 2
3 C 2
4 D 3
5 <NA> 6

Matching and merging headers in R

In R, I want to match and merge two matrices.
For example,
> A
ID a b c d e f g
1 ex 3 8 7 6 9 8 4
2 am 7 5 3 0 1 8 3
3 ple 8 5 7 9 2 3 1
> B
col1
1 a
2 c
3 e
4 f
Then, I want to match header of matrix A and 1st column of matrix B.
The final result should be a matrix like below.
> C
ID a c e f
1 ex 3 7 9 8
2 am 7 3 1 8
3 ple 8 7 2 3
*(My original data has more than 500 columns and more than 20,000 rows.)
Are there any tips for that? Would really appreciate your help.
*In advance, if the matrix B is like below,
> B
col1 col2 col3 col4
1 a c e f
How to make the matrix C in this case?
You want:
A[, c('ID', B[, 1])]
For the second case, you want to use row number 1 of the second matrix, instead of its first column.
A[, c('ID', B[1, ])]
If B is a data.frame instead of a matrix, the syntax changes somewhat — you can use B$col1 instead of B[, 1], and to select by row, you need to transform the result to a vector, because the result of selecting a row in a data.frame is again a data.frame, i.e. you need to do unlist(B[1, ]).
You can use a subset:
cbind(A$ID, A[names(A) %in% B$col1])

Converting a dataframe of label/values to a named numeric vector

I am trying to convert a dataframe with labels/values to a named numeric vecotr.
For example I have the following dataframe
>df=data.frame(lab=c("A","B","C","D"),values=c(1,2,3,4))
> df
lab values
1 A 1
2 B 2
3 C 3
4 D 4
So what I am trying to do is to iterate or use a function on this data frame to get the following
>v_needed=c("A"=1,"B"=2,"C"=3,"D"=4)
> v_needed
A B C D
1 2 3 4
I tried to convert this to a factor but it didn't give the desired output
>v_failure=factor(df$values,labels=df$lab)
You can use the setNames function
v <- with(df, setNames(values, lab))
v
# A B C D
# 1 2 3 4

Subseting data frame by another data frame

The data is as follows:
> x
a b
1 1 a
2 2 a
3 3 a
4 1 b
5 2 b
6 3 b
> y
a b
1 2 a
2 3 a
3 3 b
My goal is to compare both data frames, and for each row in x indicate whether equivalent row exists in y. All of the y rows are actually contained in x, so I would like to end up with something like this:
> x
a b intersect.x.y
1 1 a F
2 2 a T
3 3 a T
4 1 b F
5 2 b F
6 3 b T
How about that?
How about this?
x$rn <- 1:nrow(x)
xyrows <- merge(x,y)$rn # maybe you just want to look at the merge ...?
x$iny <- FALSE
x$iny[xyrows] <- TRUE
I suspect there is a more standard approach, but this way is easy to understand.

Generating random number by length of blocks of data in R data frame

I am trying to simulate n times the measuring order and see how measuring order effects my study subject. To do this I am trying to generate integer random numbers to a new column in a dataframe. I have a big dataframe and i would like to add a column into the dataframe that consists a random number according to the number of observations in a block.
Example of data(each row is an observation):
df <- data.frame(A=c(1,1,1,2,2,3,3,3,3),
B=c("x","b","c","g","h","g","g","u","l"),
C=c(1,2,4,1,5,7,1,2,5))
A B C
1 1 x 1
2 1 b 2
3 1 c 4
4 2 g 1
5 2 h 5
6 3 g 7
7 3 g 1
8 3 u 2
9 3 l 5
What I'd like to do is add a D column and generate random integer numbers according to the length of each block. Blocks are defined in column A.
Result should look something like this:
df <- data.frame(A=c(1,1,1,2,2,3,3,3,3),
B=c("x","b","c","g","h","g","g","u","l"),
C=c(1,2,4,1,5,7,1,2,5),
D=c(2,1,3,2,1,4,3,1,2))
> df
A B C D
1 1 x 1 2
2 1 b 2 1
3 1 c 4 3
4 2 g 1 2
5 2 h 5 1
6 3 g 7 4
7 3 g 1 3
8 3 u 2 1
9 3 l 5 2
I have tried to use R:s sample() function to generate random numbers but my problem is splitting the data according to block length and adding the new column. Any help is greatly appreciated.
It can be done easily with ave
df$D <- ave( df$A, df$A, FUN = function(x) sample(length(x)) )
(you could replace length() with max(), or whatever, but length will work even if A is not numbers matching the length of their blocks)
This is really easy with ddply from plyr.
ddply(df, .(A), transform, D = sample(length(A)))
The longer manual version is:
Use split to split the data frame by the first column.
split_df <- split(df, df$A)
Then call sample on each member of the list.
split_df <- lapply(split_df, function(df)
{
df$D <- sample(nrow(df))
df
})
Then recombine with
df <- do.call(rbind, split_df)
One simple way:
df$D = 0
counts = table(df$A)
for (i in 1:length(counts)){
df$D[df$A == names(counts)[i]] = sample(counts[i])
}

Resources