I have association matrix file that looks like this (4 rows and 3 columns) .
test=read.table("test.csv", sep=",", header=T)
head(test)
LosAngeles SanDiego Seattle
1 2 3
A 1 0.1 0.2 0.2
B 2 0.2 0.4 0.2
C 3 0.3 0.5 0.3
D 4 0.2 0.5 0.1
What I want to is reshape this matrix file into data frame. The result should look something like this (12(= 4 * 3) rows and 3 columns):
RowNum ColumnNum Value
1 1 0.1
2 1 0.2
3 1 0.3
4 1 0.2
1 2 0.2
2 2 0.4
3 2 0.5
4 2 0.5
1 3 0.2
2 3 0.2
3 3 0.3
4 3 0.1
That is, if my matrix file has 100 rows and 90 columns. I want to make new data frame file that contains 9000 (= 100 * 90) rows and 3 columns. I've tried to use reshape package but but I do not seem to be able to get it right. Any suggestions how to solve this problem?
Use as.data.frame.table. Its the boss:
m <- matrix(data = c(0.1, 0.2, 0.2,
0.2, 0.4, 0.2,
0.3, 0.5, 0.3,
0.2, 0.5, 0.1),
nrow = 4, byrow = TRUE,
dimnames = list(row = 1:4, col = 1:3))
m
# col
# row 1 2 3
# 1 0.1 0.2 0.2
# 2 0.2 0.4 0.2
# 3 0.3 0.5 0.3
# 4 0.2 0.5 0.1
as.data.frame.table(m)
# row col Freq
# 1 1 1 0.1
# 2 2 1 0.2
# 3 3 1 0.3
# 4 4 1 0.2
# 5 1 2 0.2
# 6 2 2 0.4
# 7 3 2 0.5
# 8 4 2 0.5
# 9 1 3 0.2
# 10 2 3 0.2
# 11 3 3 0.3
# 12 4 3 0.1
This should do the trick:
test <- as.matrix(read.table(text="
1 2 3
1 0.1 0.2 0.2
2 0.2 0.4 0.2
3 0.3 0.5 0.3
4 0.2 0.5 0.1", header=TRUE))
data.frame(which(test==test, arr.ind=TRUE),
Value=test[which(test==test)],
row.names=NULL)
# row col Value
#1 1 1 0.1
#2 2 1 0.2
#3 3 1 0.3
#4 4 1 0.2
#5 1 2 0.2
#6 2 2 0.4
#7 3 2 0.5
#8 4 2 0.5
#9 1 3 0.2
#10 2 3 0.2
#11 3 3 0.3
#12 4 3 0.1
Related
I have data dat like this:
s A chan
10 0.1 1
20 0.2 1
30 0.3 1
40 0.5 1
50 0.7 1
60 0.5 1
10 0.1 2
20 0.3 2
30 0.4 2
40 0.5 2
50 0.6 2
60 0.6 2
10 0.2 3
20 0.2 3
30 0.3 3
40 0.4 3
50 0.5 3
40 0.7 3
10 0.2 4
20 0.2 4
30 0.3 4
40 0.3 4
50 0.6 4
60 0.8 4
and I want to subset my data frame dat based on s (time) for each chan (channel) with a data frame df like this
s chan
10 1
20 2
30 3
40 4
If I use dat %>% filter(s %in% df$s) I get each value for every channel like this:
s A chan
10 0.1 1
20 0.2 1
30 0.3 1
40 0.5 1
10 0.1 2
20 0.3 2
30 0.4 2
40 0.5 2
10 0.2 3
20 0.2 3
30 0.3 3
40 0.4 3
10 0.2 4
20 0.2 4
30 0.3 4
40 0.3 4
but what I actualy want it this:
s A chan
10 0.1 1
20 0.3 2
30 0.3 3
40 0.3 4
How can I achieve this result?
what you are looking for is semi_join; it filters rows from left data frame based on the presence or absence of matches in right data frame,
semi_join(dat, df, by = c("s", "chan"))
I think this should do it
dat[which(dat[,3]==df[1:4,2] & dat[,1]==df[1:4,1]),]
1:4 being the range of lines in df.
I have got the following dataset:
ID s1 s2 s3
A 0.6 1 0.3
B 3 0.4 0.4
C 3 2 1
D 0 0.3 0.2
E 3 2 0.1
i would like to retain the rows which have the value >=0.5 at least two of the 3 samples
So, the new data frame would be:
ID s1 s2 s3
A 0.6 1 0.3
C 3 2 1
E 3 2 0.1
Thanks in advance
You can do
df[rowSums(df[-1] > 0.5) >= 2, ]
# ID s1 s2 s3
#1 A 0.6 1 0.3
#3 C 3.0 2 1.0
#5 E 3.0 2 0.1
We create a logical matrix df[-1] > 0.5 and check if at least two values per row are TRUE.
data
df <- read.table(text="ID s1 s2 s3
A 0.6 1 0.3
B 3 0.4 0.4
C 3 2 1
D 0 0.3 0.2
E 3 2 0.1", header = TRUE, stringsAsFactor = FALSE)
I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.
I'm desperately trying to lag a variable by group. I found this post that deals with essentially the same problem I'm facing, but the solution does not work for me, no idea why.
This is my problem:
library(dplyr)
df <- data.frame(monthvec = c(rep(1:2, 2), rep(3:5, 3)))
df <- df %>%
arrange(monthvec) %>%
mutate(growth=ifelse(monthvec==1, 0.3,
ifelse(monthvec==2, 0.5,
ifelse(monthvec==3, 0.7,
ifelse(monthvec==4, 0.1,
ifelse(monthvec==5, 0.6,NA))))))
df%>%
group_by(monthvec) %>%
mutate(lag.growth = lag(growth, order_by=monthvec))
Source: local data frame [13 x 3]
Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 0.3
3 2 0.5 NA
4 2 0.5 0.5
5 3 0.7 NA
6 3 0.7 0.7
7 3 0.7 0.7
8 4 0.1 NA
9 4 0.1 0.1
10 4 0.1 0.1
11 5 0.6 NA
12 5 0.6 0.6
13 5 0.6 0.6
This is what I'd like it to be in the end:
df$lag.growth <- c(NA, NA, 0.3, 0.3, 0.5, 0.5, 0.5, 0.7,0.7,0.7, 0.1,0.1,0.1)
monthvec growth lag.growth
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
I believe that one problem is that my groups are not of equal length...
Thanks for helping out.
Here is an idea. We group by monthvec in order to get the number of rows (cnt) of each group. We ungroup and use the first value of cnt as the size of the lag. We regroup on monthvec and replace the values in each group with the first value of each group.
library(dplyr)
df %>%
group_by(monthvec) %>%
mutate(cnt = n()) %>%
ungroup() %>%
mutate(lag.growth = lag(growth, first(cnt))) %>%
group_by(monthvec) %>%
mutate(lag.growth = first(lag.growth)) %>%
select(-cnt)
which gives,
# A tibble: 13 x 3
# Groups: monthvec [5]
monthvec growth lag.growth
<int> <dbl> <dbl>
1 1 0.3 NA
2 1 0.3 NA
3 2 0.5 0.3
4 2 0.5 0.3
5 3 0.7 0.5
6 3 0.7 0.5
7 3 0.7 0.5
8 4 0.1 0.7
9 4 0.1 0.7
10 4 0.1 0.7
11 5 0.6 0.1
12 5 0.6 0.1
13 5 0.6 0.1
You may join your original data with a dataframe with a shifted "monthvec".
left_join(df, df %>% mutate(monthvec = monthvec + 1) %>% unique(), by = "monthvec")
# monthvec growth.x growth.y
# 1 1 0.3 NA
# 2 1 0.3 NA
# 3 2 0.5 0.3
# 4 2 0.5 0.3
# 5 3 0.7 0.5
# 6 3 0.7 0.5
# 7 3 0.7 0.5
# 8 4 0.1 0.7
# 9 4 0.1 0.7
# 10 4 0.1 0.7
# 11 5 0.6 0.1
# 12 5 0.6 0.1
# 13 5 0.6 0.1
I am trying to make a sub dataframe based on the already existing dataframe. My sub dataframe is being filled with the number of the row instead of the row itself.
rates = read.csv("file.txt")
genes = unique(gsub('_[0-9]+', '', rates[,1]))
for (k in unique(gsub('_[0-9]+', '', rates[,1])) ){
sub = print(grep(k, rates[,1]), value=T)
sub
}
file.txt
clothing,freq,temp
coat_1,0.3,10
coat_1,0.9,0
coat_1,0.1,20
coat_2,0.5,20
coat_2,0.3,15
coat_2,0.1,5
scarf,0.4,30
scarf,0.2,20
scarf,0.1,10
This is what is currently output
[1] 1 2 3 4 5 6
[1] 7 8 9
I would like something like this instead
clothing freq temp
1 coat_1 0.3 10
2 coat_1 0.9 0
3 coat_1 0.1 20
4 coat_2 0.5 20
5 coat_2 0.3 15
6 coat_2 0.1 5
clothing freq temp
1 scarf 0.4 30
2 scarf 0.2 20
3 scarf 0.1 10
rates <- read.csv("file.txt", stringsAsFactors = FALSE)
rates
# clothing freq temp
# 1 coat_1 0.3 10
# 2 coat_1 0.9 0
# 3 coat_1 0.1 20
# 4 coat_2 0.5 20
# 5 coat_2 0.3 15
# 6 coat_2 0.1 5
# 7 scarf 0.4 30
# 8 scarf 0.2 20
# 9 scarf 0.1 10
rates[rates$clothing != "scarf",]
# clothing freq temp
# 1 coat_1 0.3 10
# 2 coat_1 0.9 0
# 3 coat_1 0.1 20
# 4 coat_2 0.5 20
# 5 coat_2 0.3 15
# 6 coat_2 0.1 5
rates[rates$clothing == "scarf",]
# clothing freq temp
#7 scarf 0.4 30
#8 scarf 0.2 20
#9 scarf 0.1 10