r - How to subset a dataframe based on another dataframe

r - How to subset a dataframe based on another dataframe - r

I have data dat like this:
s A chan
10 0.1 1
20 0.2 1
30 0.3 1
40 0.5 1
50 0.7 1
60 0.5 1
10 0.1 2
20 0.3 2
30 0.4 2
40 0.5 2
50 0.6 2
60 0.6 2
10 0.2 3
20 0.2 3
30 0.3 3
40 0.4 3
50 0.5 3
40 0.7 3
10 0.2 4
20 0.2 4
30 0.3 4
40 0.3 4
50 0.6 4
60 0.8 4
and I want to subset my data frame dat based on s (time) for each chan (channel) with a data frame df like this
s chan
10 1
20 2
30 3
40 4
If I use dat %>% filter(s %in% df$s) I get each value for every channel like this:
s A chan
10 0.1 1
20 0.2 1
30 0.3 1
40 0.5 1
10 0.1 2
20 0.3 2
30 0.4 2
40 0.5 2
10 0.2 3
20 0.2 3
30 0.3 3
40 0.4 3
10 0.2 4
20 0.2 4
30 0.3 4
40 0.3 4
but what I actualy want it this:
s A chan
10 0.1 1
20 0.3 2
30 0.3 3
40 0.3 4
How can I achieve this result?

what you are looking for is semi_join; it filters rows from left data frame based on the presence or absence of matches in right data frame,
semi_join(dat, df, by = c("s", "chan"))

I think this should do it
dat[which(dat[,3]==df[1:4,2] & dat[,1]==df[1:4,1]),]
1:4 being the range of lines in df.

Related

Multiply values depending on values of certains columns

I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6

One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3

Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3

A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.

Using grep to get the rows of a dataframe, instead of the row number

I am trying to make a sub dataframe based on the already existing dataframe. My sub dataframe is being filled with the number of the row instead of the row itself.
rates = read.csv("file.txt")
genes = unique(gsub('_[0-9]+', '', rates[,1]))
for (k in unique(gsub('_[0-9]+', '', rates[,1])) ){
sub = print(grep(k, rates[,1]), value=T)
sub
}
file.txt
clothing,freq,temp
coat_1,0.3,10
coat_1,0.9,0
coat_1,0.1,20
coat_2,0.5,20
coat_2,0.3,15
coat_2,0.1,5
scarf,0.4,30
scarf,0.2,20
scarf,0.1,10
This is what is currently output
[1] 1 2 3 4 5 6
[1] 7 8 9
I would like something like this instead
clothing freq temp
1 coat_1 0.3 10
2 coat_1 0.9 0
3 coat_1 0.1 20
4 coat_2 0.5 20
5 coat_2 0.3 15
6 coat_2 0.1 5
clothing freq temp
1 scarf 0.4 30
2 scarf 0.2 20
3 scarf 0.1 10

rates <- read.csv("file.txt", stringsAsFactors = FALSE)
rates
# clothing freq temp
# 1 coat_1 0.3 10
# 2 coat_1 0.9 0
# 3 coat_1 0.1 20
# 4 coat_2 0.5 20
# 5 coat_2 0.3 15
# 6 coat_2 0.1 5
# 7 scarf 0.4 30
# 8 scarf 0.2 20
# 9 scarf 0.1 10
rates[rates$clothing != "scarf",]
# clothing freq temp
# 1 coat_1 0.3 10
# 2 coat_1 0.9 0
# 3 coat_1 0.1 20
# 4 coat_2 0.5 20
# 5 coat_2 0.3 15
# 6 coat_2 0.1 5
rates[rates$clothing == "scarf",]
# clothing freq temp
#7 scarf 0.4 30
#8 scarf 0.2 20
#9 scarf 0.1 10

How to get correct ticklabs in a 3d-scatterplot in R?

Please see this example. Look at y axis. The data there has only two levels: 1 and 2. But in the plot 6 tickmarks drawn on that axis. How could I fix that. The x axis has the same problem.
The data
extra group ID
1 0.7 1 1
2 -1.6 1 2
3 -0.2 1 3
4 -1.2 1 4
5 -0.1 1 5
6 3.4 1 6
7 3.7 1 7
8 0.8 1 8
9 0.0 1 9
10 2.0 1 10
11 1.9 2 1
12 0.8 2 2
13 1.1 2 3
14 0.1 2 4
15 -0.1 2 5
16 4.4 2 6
17 5.5 2 7
18 1.6 2 8
19 4.6 2 9
20 3.4 2 10
The script
require('mise')
require('scatterplot3d')
mise() # clear the workspace
# example data
print(sleep)
# plot it
scatterplot3d(x=sleep$ID,
x.ticklabs=levels(sleep$ID),
y=sleep$group,
y.ticklabs=levels(sleep$group),
z=sleep$extra)
The result

How about this:
scatterplot3d(x=sleep$ID, y=sleep$extra, z=sleep$group, lab.z = c(1, 2))

Reshape matrix to data frame

I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?

Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1

This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1

Selecting Rows which contain daily max value in R

So I want to subset my data frame to select rows with a daily maximum value.
Site Year Day Time Cover Size TempChange
ST1 2011 97 0.0 Closed small 0.97
ST1 2011 97 0.5 Closed small 1.02
ST1 2011 97 1.0 Closed small 1.10
Section of data frame is above. I would like to select only the rows which have the maximum value of the variable TempChange for each variable Day. I want to do this because I am interested in specific variables (not shown) for these particular times.
AMENDED EXAMPLE AND REQUIRED OUTPUT
Site Day Temp Row
a 10 0.2 1
a 10 0.3 2
a 11 0.5 3
a 11 0.4 4
b 10 0.1 5
b 10 0.8 6
b 11 0.7 7
b 11 0.6 8
c 10 0.2 9
c 10 0.3 10
c 11 0.5 11
c 11 0.8 12
REQUIRED OUTPUT
Site Day Temp Row
a 10 0.3 2
a 11 0.5 3
b 10 0.8 6
b 11 0.7 7
c 10 0.3 10
c 11 0.8 12
Hope that makes it clearer.

After faffing with raw data frame code, I realised plyr could do this in one:
> df
Day V Z
1 97 0.26575207 1
2 97 0.09443351 2
3 97 0.88097858 3
4 98 0.62241515 4
5 98 0.61985937 5
6 99 0.06956219 6
7 100 0.86638108 7
8 100 0.08382254 8
> ddply(df,~Day,function(x){x[which.max(x$V),]})
Day V Z
1 97 0.88097858 3
2 98 0.62241515 4
3 99 0.06956219 6
4 100 0.86638108 7
To get the rows for max values for unique combinations of more than one column, just add the variable to the formula. For your modified example, its then:
> df
Site Day Temp Row
1 a 10 0.2 1
2 a 10 0.3 2
3 a 11 0.5 3
4 a 11 0.4 4
5 b 10 0.1 5
6 b 10 0.8 6
7 b 11 0.7 7
8 b 11 0.6 8
9 c 10 0.2 9
10 c 10 0.3 10
11 c 11 0.5 11
12 c 11 0.8 12
> ddply(df,~Day+Site,function(x){x[which.max(x$Temp),]})
Site Day Temp Row
1 a 10 0.3 2
2 b 10 0.8 6
3 c 10 0.3 10
4 a 11 0.5 3
5 b 11 0.7 7
6 c 11 0.8 12
Note this isn't in the same order as your original dataframe, but you can fix that.
> dmax = ddply(df,~Day+Site,function(x){x[which.max(x$Temp),]})
> dmax[order(dmax$Row),]
Site Day Temp Row
1 a 10 0.3 2
4 a 11 0.5 3
2 b 10 0.8 6
5 b 11 0.7 7
3 c 10 0.3 10
6 c 11 0.8 12

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

r - How to subset a dataframe based on another dataframe - r

what you are looking for is semi_join; it filters rows from left data frame based on the presence or absence of matches in right data frame, semi_join(dat, df, by = c("s", "chan"))

I think this should do it dat[which(dat[,3]==df[1:4,2] & dat[,1]==df[1:4,1]),] 1:4 being the range of lines in df.

Related

Multiply values depending on values of certains columns

Using grep to get the rows of a dataframe, instead of the row number

How to get correct ticklabs in a 3d-scatterplot in R?

Reshape matrix to data frame

Selecting Rows which contain daily max value in R

Categories

Resources