multiply a dataframe element-wise by a row - r

I have a dataframe df and a row vector named mult of the same size as a row in df.
I need to multiply each row of df by mult as element-wise multiplication. But I do not want to write a loop, as there are probably faster ways to do this.
Here are my failed attempts:
df = data.frame(matrix(nrow = 5, ncol = 5))
df[,] = 5
mult = as.data.frame(c(1,2,3,4,5))
df * t(mult[1:5,])
Whether I transpose mult[1:5,], I get the same result.
The correct answer should be a dataframe of five rows of 5 10 15 20 25.
However, I am getting the result as if I am doing element-wise multiplying by mult as a column vector.
5 5 5 5 5
10 10 10 10 10
15 15 15 15 15
20 20 20 20 20
25 25 25 25 25
Multiplying a row at a time works, but that will involve a loop.
I have searched the SO and found sweep(), but it does not seem to work in my case.

df * matrix(rep(mult[,1], NROW(df)), nrow = NROW(df), byrow = TRUE)
# X1 X2 X3 X4 X5
#1 5 10 15 20 25
#2 5 10 15 20 25
#3 5 10 15 20 25
#4 5 10 15 20 25
#5 5 10 15 20 25

We could just replicate the 'mult'
df * mult[,1][col(df)]
# X1 X2 X3 X4 X5
#1 5 10 15 20 25
#2 5 10 15 20 25
#3 5 10 15 20 25
#4 5 10 15 20 25
#5 5 10 15 20 25

Related

How to create a column which use its own lag value using dplyr

Suppose I have the following data frame
c1<- c(1:10)
c2<- c(11:20)
df<- data.frame(c1,c2)
c1 c2
1 11
2 12
3 13
4 14
5 15
6 16
7 17
8 18
9 19
10 20
I would like to add a column c3 which is the sum of c3(-1)+c2-c1. For instance,
in the example above the expected result will be:
c1 c2 c3
1 11 0
2 12 10
3 13 20
4 14 30
5 15 40
6 16 50
7 17 60
8 18 70
9 19 80
10 20 90
Is it possible to perform this operation using dplyr ? I have tried several approaches without success. Any suggestion will be much appreciated.
This is a good use for cumsum - cumulative summation.
c3 = lag(cumsum(c2 - c1), default = 0)
Don't think of c3 as c3(-1) + c2 - c1, think of it as c3(n) = sum (from 1 to n - 1) c2(i) - c1(i)
This creates column c3. Assuming the first entry is always 0, since there is no preceding element.
df$c3 <- df$c2 - df$c1
df[1,"c3"] <- 0
df$c3 <- cumsum(df$c3)
output
> df
c1 c2 c3
1 1 11 0
2 2 12 10
3 3 13 20
4 4 14 30
5 5 15 40
6 6 16 50
7 7 17 60
8 8 18 70
9 9 19 80
10 10 20 90
>

Assigning Values based on row value

I have a large vector (column of a data frame) where values containing integers 1 to 30. I want to replace numbers from 1 to 5 with 1, 6 to 10 with 5, 11 to 15 with 9...
> x3 <- sample(1:30, 100, rep=TRUE)
> x3
[1] 13 24 16 30 10 6 15 10 3 17 18 22 11 13 29 7 25 28 17 27 1 5 6 20 15 15 8 10 13 26 27 24 3 24 5 7 10 6 28 27 1 4 22 25 14 13 2 10 4 29 23 24 30 24 29 11 2 28 23 1 1 2
[63] 3 23 13 26 21 22 11 4 8 26 17 11 20 23 6 14 24 5 15 21 11 13 6 14 20 11 22 9 6 29 4 30 20 30 4 24 23 29
As I mentioned this is a column in a data frame and with above assignment I want to create a different column. If I do the following I have to do this 30 times.
myFrame$NewColumn[myFrame$oldColumn==1] <- 1
myFrame$NewColumn[myFrame$oldColumn==2] <- 1
myFrame$NewColumn[myFrame$oldColumn==3] <- 1
...
Whats a better way to do this?
We can do this with cut (suppose what you mean by '...' is 10, 11, 12):
x4 <- cut(x3,
breaks = c(seq(1, 30, 5), 30), right = F, include.lowest = T, # generate correct intervals
labels = 4 * (0:5) + 1) # number to fill
# x4 is factor. We should convert it to character first then to the number
x4 <- as.numeric(as.character(x4))
Did you try:
myFrame$NewColumn[myFrame$oldColumn > 0 & myFrame$oldColumn< 6] <- 1
myFrame$NewColumn[myFrame$oldColumn > 5 & myFrame$oldColumn< 11] <- 1
...
Or even better:
myFrame$NewColumn <- as.integer((myFrame$oldColumn - 1)/5)) * 4 + 1

How to reference the values of different columns in a dataframe depending upon the data in them

I have a dataframe of records of varying lengths, with NAs at the end. If there are more than three x-values in a record, I want to make the value of the third x-value equal to the value of the last x-value. Each record already tells me how many x-values it has.
I can make x3 be equal to the name of the last x-value (x4 or x5 etc) but what I really need is to make x3 take the value of that last x-value.
I'm sure there is some simple answer. Any help would be greatly appreciated! Thank you.
Here is a simple case:
ii <- "n x1 x2 x3 x4 x5 x6
1 3 30 40 20 NA NA NA
2 4 10 50 16 25 NA NA
3 6 20 15 26 16 18 28
4 5 10 10 18 17 19 NA
5 2 65 41 NA NA NA NA
6 5 10 11 23 16 23 NA
7 1 99 NA NA NA NA NA"
df <- read.table(text=ii, header = TRUE, na.strings="NA", colClasses="character")
oo <- "n x1 x2 x3
1 3 30 40 20
2 4 10 50 25
3 6 20 15 28
4 5 10 10 19
5 2 65 41 NA
6 5 10 11 23
7 1 99 NA NA"
desireddf <- read.table(text=oo, header = TRUE, na.strings="NA", colClasses="character")
df$lastx <- as.character(paste("x", df$n, sep=""))
#df$lastx <- df[[get(df$lastx)]] #How can I make lastx equal to the _value_ of lastx???
df[df$n>3, c('x3')] <- df[df$n>3, 'lastx']
df <- df[,1:4]
print(df)
yields the following, not the desireddf above.
n x1 x2 x3
1 3 30 40 20
2 4 10 50 x4
3 6 20 15 x6
4 5 10 10 x5
5 2 65 41 <NA>
6 5 10 11 x5
7 1 99 <NA> <NA>
This seems like a pretty aribtrary task, but here goes:
desireddf <- data.frame(n=df$n, x1=df$x1, x2=df$x2, x3=df[cbind(1:nrow(df), paste("x", pmax(3,as.numeric(df$n)), sep=""))])

Computing values for R dataFrame cells without using for loops

I have a R dataFrame with the followings:
Serial N year current Average
B 10 14 15
B 10 16 15
C 12 13 12
D 40 20 20
B 11 15 15
C 12 11 12
I would like to have a new column based on the average for a unique serial number. I would like to have something like :
Serial N year current Average temp
B 10 14 15 (15+12+20)/15
B 10 16 15 (15+12+20)/15
C 12 13 12 (15+12+20)/12
D 40 20 20 (15+12+20)/20
B 11 15 15 (15+12+20)/15
C 12 11 12 (15+12+20)/12
temp column is the addition of the average value for each Serial N ( for B,C and D) over the value of the average for that row. How can I computing it without using for loops as rows 1,2 and 5 (Serial N: B) is the same in terms of Average column and temp? I started with this:
for (i in unique(df$Serial_N))
{
.........
}
but I got stuck as I also need the average for other Serial N. How can I do this?
For example, you can try something like the following (assuming your computation matches):
df$temp <- sum(tapply(df$Average, df$SerialN, mean)) / df$Average
Resulting output:
SerialN year current Average temp
1 B 10 14 15 3.133333
2 B 10 16 15 3.133333
3 C 12 13 12 3.916667
4 D 40 20 20 2.350000
5 B 11 15 15 3.133333
6 C 12 11 12 3.916667
Using unique.data.frame() can avoid repeat in Average between different groups
df$temp <- sum((unique.data.frame(df[c("Serial_N","Average")]))$Average) / df$Average
In base R, you can use either
df <- transform(df, temp = sum(tapply(df$Average, df$Serial_N, unique))/df$Average)
or
df$temp <- sum(tapply(df$Average, df$Serial_N, unique))/df$Average
both of which will give you
df
# Serial_N year current Average temp
# 1 B 10 14 15 3.133333
# 2 B 10 16 15 3.133333
# 3 C 12 13 12 3.916667
# 4 D 40 20 20 2.350000
# 5 B 11 15 15 3.133333
# 6 C 12 11 12 3.916667
tapply splits df$Average by the levels of df$Serial_N, and then calls unique on them, which gives you a single average for each group, which you can then sum and divide. transform adds a column (equivalent to dplyr::mutate).

R : data frame Randomize columns by row

I have a dataframe in R that I want to randomize, keeping the first column like it is but randomizing the last two columns together, so that values that appear in the same rows in these columns will appear in the same row both after randomizing. So if I started with this:
1 a b c
2 d e f
3 g h i
when randomized it might look like:
1 a e f
2 d h i
3 g b c
I know that sample works fine but does it conserve the columns equivalence?
> t <- data.frame(matrix(nrow=4,ncol=10,data=1:40))
> t
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 5 9 13 17 21 25 29 33 37
2 2 6 10 14 18 22 26 30 34 38
3 3 7 11 15 19 23 27 31 35 39
4 4 8 12 16 20 24 28 32 36 40
> columns_to_random <- c(8,9,10)
> t[,columns_to_random] <- t[sample(1:nrow(t),size=nrow(t)), columns_to_random]
> X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 1 5 9 13 17 21 25 32 36 40
2 2 6 10 14 18 22 26 29 33 37
3 3 7 11 15 19 23 27 30 34 38
4 4 8 12 16 20 24 28 31 35 39
Just sample one column at a time and you'll be fine. For example:
data[,2] = sample(data[,2])
data[,3] = sample(data[,3])
...
If you have many columns, you can extend this like:
data[,-1] = apply(data[,-1], 2, sample)
EDIT: With your clarification about row equivalence, this is just:
data[,-1] = data[sample(nrow(data)),-1]
What do you mean by "values equivalence"?
Honestly I do not get the message, but here's my guess. As you said, you could use sample, but use it separately on the on your columns, e.g. by apply:
# create a reproducible example
test <- data.frame(indx=c(1,2,3),col1=c("a","d","g"),
col2=c("b","e","h"),col3=c("c","f","i"))
xyz <- apply(test[,-1],MARGIN=2,sample)
as.data.frame(xyz)
Approach using colwise in plyr for elegant column wise permutation:
test <- data.frame(matrix(nrow=4,ncol=10,data=1:40))
Load plyr
require(plyr)
Creat a column wise "sample" function
colwise.sample <- colwise(sample)
Apply to the desired rows
permutation.test <- test
permutation.test[,c(1,3,4)] <- colwise.sample(test[,c(1,3,4)])

Resources