This question already has answers here:
Multiply rows of matrix by vector?
(6 answers)
Closed 10 months ago.
A small sample of my data is as follows:
A=c(0.1, 0.3, 0.6, 0.1)
dat<-read.table (text=" D1 D2 D3 D4
10 11 13 14
9 8 8 0
70 100 2 3
4 3 3 200
1 2 3 4
", header=TRUE)
The logic is that 0.1 x D1, 0.3xD2, 0.6xD3 and 0.1xD4.
Here is the outcome
1 3.3 7.8 1.4
0.9 2.4 4.8 0
7 30 1.2 0.3
0.4 0.9 1.8 20
0.1 0.6 1.8 0.4
Please assume I have more than 4 Ds
A possible solution, using dplyr:
library(dplyr)
dat %>%
mutate(across(everything(), ~ .x * A[which(names(dat) == cur_column())]))
#> D1 D2 D3 D4
#> 1 1.0 3.3 7.8 1.4
#> 2 0.9 2.4 4.8 0.0
#> 3 7.0 30.0 1.2 0.3
#> 4 0.4 0.9 1.8 20.0
#> 5 0.1 0.6 1.8 0.4
Another possible solution, in base R:
as.data.frame(t(apply(dat, 1, \(x) x * A)))
Yet another possible solution, using purrr::map2_df:
purrr::map2_df(dat, A, `*`)
Or even:
mapply(`*`, dat, A)
Related
I have a dataframe which looks like this example (just much larger):
var <- c('Peter','Ben','Mary','Peter.1','Ben.1','Mary.1','Peter.2','Ben.2','Mary.2')
v1 <- c(0.4, 0.6, 0.7, 0.3, 0.9, 0.2, 0.4, 0.6, 0.7)
v2 <- c(0.5, 0.4, 0.2, 0.5, 0.4, 0.2, 0.1, 0.4, 0.2)
df <- data.frame(var, v1, v2)
var v1 v2
1 Peter 0.4 0.5
2 Ben 0.6 0.4
3 Mary 0.7 0.2
4 Peter.1 0.3 0.5
5 Ben.1 0.9 0.4
6 Mary.1 0.2 0.2
7 Peter.2 0.4 0.1
8 Ben.2 0.6 0.4
9 Mary.2 0.7 0.2
I want to group the strings in 'var' according to the names without the suffixes, and keep the original order of first appearance. Desired output:
var v1 v2
1 Peter 0.4 0.5 # Peter appears first in the original data
2 Peter.1 0.3 0.5
3 Peter.2 0.4 0.1
4 Ben 0.6 0.4 # Ben appears second in the original data
5 Ben.1 0.9 0.4
6 Ben.2 0.6 0.4
7 Mary 0.7 0.2 # Mary appears third in the original data
8 Mary.1 0.2 0.2
9 Mary.2 0.7 0.2
How can I achieve that?
Thank you!
An option is to create a temporary column without the . and the digits (\\d+) at the end with str_remove, then use factor with levels specified as the unique values or use match to arrange the data
library(dplyr)
library(stringr)
df <- df %>%
mutate(var1 = str_remove(var, "\\.\\d+$")) %>%
arrange(factor(var1, levels = unique(var1))) %>%
select(-var1)
Or use fct_inorder from forcats which will convert to factor with levels in the order of first appearance
library(forcats)
df %>%
arrange(fct_inorder(str_remove(var, "\\.\\d+$")))
-output
var v1 v2
1 Peter 0.4 0.5
2 Peter.1 0.3 0.5
3 Peter.2 0.4 0.1
4 Ben 0.6 0.4
5 Ben.1 0.9 0.4
6 Ben.2 0.6 0.4
7 Mary 0.7 0.2
8 Mary.1 0.2 0.2
9 Mary.2 0.7 0.2
Compact option with sub and data.table::chgroup
df[chgroup(sub("\\..", "", df$var)),]
var v1 v2
1 Peter 0.4 0.5
4 Peter.1 0.3 0.5
7 Peter.2 0.4 0.1
2 Ben 0.6 0.4
5 Ben.1 0.9 0.4
8 Ben.2 0.6 0.4
3 Mary 0.7 0.2
6 Mary.1 0.2 0.2
9 Mary.2 0.7 0.2
chgroup groups together duplicated values but retains the group order (according the first appearance order of each group), efficiently
If you don't mind that the values in var are ordered alphabetically, then the simplest solution is this:
df %>%
arrange(var)
var v1 v2
1 Ben 0.6 0.4
2 Ben.1 0.9 0.4
3 Ben.2 0.6 0.4
4 Mary 0.7 0.2
5 Mary.1 0.2 0.2
6 Mary.2 0.7 0.2
7 Peter 0.4 0.5
8 Peter.1 0.3 0.5
9 Peter.2 0.4 0.1
separate the var column into two columns, replace the NAs that get generated with 0, sort and remove the extra columns.
This works on the numeric value of the numbers rather than the character representation so that for example, 10 won't come before 2. Also, the match in arrange ensures that the order is based on the first occurrence order.
df %>%
separate(var, c("alpha", "no"), convert=TRUE, remove=FALSE, fill="right") %>%
mutate(no = replace_na(no, 0)) %>%
arrange(match(alpha, alpha), no) %>%
select(-alpha, -no)
giving
var v1 v2
1 Peter 0.4 0.5
2 Peter.1 0.3 0.5
3 Peter.2 0.4 0.1
4 Ben 0.6 0.4
5 Ben.1 0.9 0.4
6 Ben.2 0.6 0.4
7 Mary 0.7 0.2
8 Mary.1 0.2 0.2
9 Mary.2 0.7 0.2
Update
Have removed what was previously the first solution after reading the update to the question.
I have a dataframe that looks like this:
data <- as.data.frame(cbind('01-01-2018' = c(1.2,3.1,0.7,-0.3,2.0), '02-01-2018' = c(-0.1, 2.4, 4.9,-3.3,-2.7), '03-01-2018' = c(3.4, -2.6, -1.8, 0.1, 0.3)))
01-01-2018 02-01-2018 03-01-2018
1 1.2 -0.1 3.4
2 3.1 2.4 -2.6
3 0.7 4.9 -1.8
4 -0.3 -3.3 0.1
5 2.0 -2.7 0.3
I want to count how many times per each row, a value is bigger than the average of the corresponding row.
data$mn <- apply(data, 1, mean)
01-01-2018 02-01-2018 03-01-2018 mn
1 1.2 -0.1 3.4 1.5000000
2 3.1 2.4 -2.6 0.9666667
3 0.7 4.9 -1.8 1.2666667
4 -0.3 -3.3 0.1 -1.1666667
5 2.0 -2.7 0.3 -0.1333333
My last attempt was the following:
df$events <- apply(data, 1, function(x) sum(x > data$mn))
uhi_events <- numeric(nrow(data))
for (i in 1:nrow(data)) {
uhi <- data[[6]][[i]][["values"]]
uhi_events[i] <- sum(uhi)
}
data$uhi_events <- uhi_events
Is there a more efficient option?
EDIT:
What if the condition is on another column, let's say data$c1, that is not obtained through a simple formula?
data$md <- apply(data, 1, median)
01-01-2018 02-01-2018 03-01-2018 md
1 1.2 -0.1 3.4 1.5000000
2 3.1 2.4 -2.6 0.9666667
3 0.7 4.9 -1.8 1.2666667
4 -0.3 -3.3 0.1 -1.1666667
5 2.0 -2.7 0.3 -0.1333333
Using rowMeans and rowSums:
data$cnt <- rowSums(data > rowMeans(data))
data
# 01-01-2018 02-01-2018 03-01-2018 cnt
# 1 1.2 -0.1 3.4 1
# 2 3.1 2.4 -2.6 2
# 3 0.7 4.9 -1.8 1
# 4 -0.3 -3.3 0.1 2
# 5 2.0 -2.7 0.3 2
If the column was already computed replace rowMeans with existing column data$c1:
#get index excluding "c1":
ix <- grep("c1", colnames(data), invert = TRUE)
data$cnt <- rowSums(data[, ix ] > data$c1)
Using a user defined function to sum from a logical operation (logical vector is coerced by sum() to an integer vector such that TRUE = 1 and FALSE = 0)
data$uhi_events <-
apply(data, 1, function(i){
sum(i>mean(i))
})
library(data.table)
setDT(data)
data[, above_mean := rowSums(.SD > rowMeans(.SD))]
# 01-01-2018 02-01-2018 03-01-2018 above_mean
# 1: 1.2 -0.1 3.4 1
# 2: 3.1 2.4 -2.6 2
# 3: 0.7 4.9 -1.8 1
# 4: -0.3 -3.3 0.1 2
# 5: 2.0 -2.7 0.3 2
edit for question in comments
compare to value in first column
data[, above_col1 := rowSums(.SD > `01-01-2018`)]
# 01-01-2018 02-01-2018 03-01-2018 above_col1
# 1: 1.2 -0.1 3.4 1
# 2: 3.1 2.4 -2.6 0
# 3: 0.7 4.9 -1.8 1
# 4: -0.3 -3.3 0.1 1
# 5: 2.0 -2.7 0.3 0
Using a dplyr approach:
library(dplyr)
data <- as.data.frame(cbind('01-01-2018' = c(1.2,3.1,0.7,-0.3,2.0), '02-01-2018' = c(-0.1, 2.4, 4.9,-3.3,-2.7), '03-01-2018' = c(3.4, -2.6, -1.8, 0.1, 0.3)))
data$mm <- apply(data,1,median)
data %>%
rowwise %>%
mutate(count = sum(c_across(1:3) > mm))
#> # A tibble: 5 × 5
#> # Rowwise:
#> `01-01-2018` `02-01-2018` `03-01-2018` mm count
#> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 1.2 -0.1 3.4 1.2 1
#> 2 3.1 2.4 -2.6 2.4 1
#> 3 0.7 4.9 -1.8 0.7 1
#> 4 -0.3 -3.3 0.1 -0.3 1
#> 5 2 -2.7 0.3 0.3 1
I have distance matrix like this
1 2 3 4 5
A 0.1 0.2 0.3 0.5 0.6
B 0.7 0.8 0.9 1 1.1
C 1.2 1.3 1.4 1.5 1.6
D 1.7 1.8 1.9 2 2.1
E 2.2 2.3 2.4 2.5 2.6
and now I want to create lower triangle matrix like this
1 2 3 4 5 A B C D E
1 0
2 0.1 0
3 0.2 0.1 0
4 0.4 0.3 0.2 0
5 0.5 0.4 0.3 0.1 0
A 0.1 0.2 0.3 0.5 0.6 0
B 0.7 0.8 0.9 1 1.1 0.6 0
C 1.2 1.3 1.4 1.5 1.6 1.1 0.5 0
D 1.7 1.8 1.9 2 2.1 1.6 1 0.5 0
E 2.2 2.3 2.4 2.5 2.6 2.1 1.5 1 0.5 0
I just deducted distance between 2 from 1 from first table to get genetic distance between 1 and 2 (0.2 - 0.1=0.1) and like this I did for rest of the entries and I do not know doing like this is correct or not?, after doing calculation like that made lower triangle matrix. I tried like this in R
x <- read.csv("AD2.csv", head = FALSE, sep = ",")
b<-lower.tri(b, diag = FALSE)
but I am getting only TRUE and FALSE as output not like distance matrix.
can any one help to solve this problem and here is link to my example data.
You can make use of dist to calculate sub-matrices. Then use cbind and create the top and bottom half. Then rbind the 2 halves. Then set upper triangular to NA to create the desired output.
mat <- rbind(
cbind(as.matrix(dist(tbl[1,])), tbl),
cbind(tbl, as.matrix(dist(tbl[,1])))
)
mat[upper.tri(mat, diag=FALSE)] <- NA
mat
Hope it helps.
data:
tbl <- as.matrix(read.table(text="1 2 3 4 5
A 0.1 0.2 0.3 0.5 0.6
B 0.7 0.8 0.9 1 1.1
C 1.2 1.3 1.4 1.5 1.6
D 1.7 1.8 1.9 2 2.1
E 2.2 2.3 2.4 2.5 2.6", header=TRUE, check.names=FALSE, row.names=1))
I have two data base, df and cf. I want to multiply each value of A in df by each coefficient in cf depending on the value of B and C in table df.
For example
row 2 in df A= 20 B= 4 and C= 2 so the correct coefficient is 0.3,
the result is 20*0.3 = 6
There is a simple way to do that in R!?
Thanks in advance!!
df
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
cf
C
B/C 1 2 3 4 5
1 0.2 0.3 0.5 0.6 0.7
2 0.1 0.5 0.3 0.3 0.4
3 0.9 0.1 0.6 0.6 0.8
4 0.7 0.3 0.7 0.4 0.6
One solution with apply:
#iterate over df's rows
apply(df, 1, function(x) {
x[1] * cf[x[2], x[3]]
})
#[1] 6.0 18.0 17.5 14.4 4.3
Try this vectorized:
df[,1] * cf[as.matrix(df[,2:3])]
#[1] 6.0 18.0 17.5 14.4 4.3
A solution using dplyr and a vectorised function:
df = read.table(text = "
A B C
20 4 2
30 4 5
35 2 2
24 3 3
43 2 1
", header=T, stringsAsFactors=F)
cf = read.table(text = "
0.2 0.3 0.5 0.6 0.7
0.1 0.5 0.3 0.3 0.4
0.9 0.1 0.6 0.6 0.8
0.7 0.3 0.7 0.4 0.6
")
library(dplyr)
# function to get the correct element of cf
# vectorised version
f = function(x,y) cf[x,y]
f = Vectorize(f)
df %>%
mutate(val = f(B,C),
result = val * A)
# A B C val result
# 1 20 4 2 0.3 6.0
# 2 30 4 5 0.6 18.0
# 3 35 2 2 0.5 17.5
# 4 24 3 3 0.6 14.4
# 5 43 2 1 0.1 4.3
The final dataset has both result and val in order to check which value from cf was used each time.
i have a data frame like this
A B value
1 1 0.123
2 1 0.213
3 1 0.543
1 2 0.313
2 2 0.123
3 2 0.412
what i want to do is to create a function that shift this data frame by a value. for example:
if the value of shifting is 1 the data frame will become:
A B value
3 2 0.412
1 1 0.123
2 1 0.213
3 1 0.543
1 2 0.313
2 2 0.123
etc...
the function should be like this.
shift<-function(dataframe,shiftvalue)
is there any simple way to do this in R without entering in a lot of loops??
You can do it many ways, but one way is to use head and tail:
df <- data.frame(a=1:10, b = 11:20)
shift <- function(d, k) rbind( tail(d,k), head(d,-k), deparse.level = 0 )
> shift(df,3)
a b
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
1 1 11
2 2 12
3 3 13
I prefer plain old modulo ;-)
shift<-function(df,offset) df[((1:nrow(df))-1-offset)%%nrow(df)+1,]
It is pretty straightforward, the only quirk is R's from-one indexing. Also it works for offsets like 0, -7 or 7*nrow(df)...
here is my implementation:
> shift <- function(df, sv = 1) df[c((sv+1):nrow(df), 1:sv),]
> head(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
> tail(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
>
Updated:
> shift <- function(df, sv = 1) df[c((nrow(df)-sv+1):nrow(df), 1:(nrow(df)-sv)),]
> head(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
> tail(shift(iris, 3))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
There's a shift function in taRifx that works on vectors. Applying it results in coersion of all columns to character if any are character, so we'll use a trick from plyr. I'll likely write a data.frame method for it soon:
dd <- data.frame(b = seq(4),
x = c("A", "D", "A", "C"), y = c('a','b','c','d'),
z = c(1, 1, 1, 2),stringsAsFactors=FALSE)
> dd
b x y z
1 1 A a 1
2 2 D b 1
3 3 A c 1
4 4 C d 2
library(taRifx)
library(plyr)
shift.data.frame <- colwise(shift)
> shift.data.frame(dd)
b x y z
1 2 D b 1
2 3 A c 1
3 4 C d 2
4 1 A a 1
> shift(dd,n=-1)
b x y z
1 4 C d 2
2 1 A a 1
3 2 D b 1
4 3 A c 1
> shift(dd,n=-1,wrap=FALSE)
b x y z
1 1 A a 1
2 2 D b 1
3 3 A c 1
> shift(dd,n=-1,wrap=FALSE,pad=TRUE)
b x y z
1 NA <NA> <NA> NA
2 1 A a 1
3 2 D b 1
4 3 A c 1
The advantage of shift is that it takes a bunch of options:
n can be positive or negative to wrap from left/right
wrap can be turned on or off
If wrap is turned off, pad can be turned on to pad with NAs so vector remains the same length
https://dplyr.tidyverse.org/reference/lead-lag.html
lag(1:5, n = 1)
#> [1] NA 1 2 3 4
lag(1:5, n = 2)
#> [1] NA NA 1 2 3