R: how to change values in a data.frame - r

> dummy <- data.frame(X = c(1, 2, 3, 4, 5, 5, 2, 6, 7, 2), Y = c(3, 2, 1, 4, 5, 6, 7, 3, 4, 2))
> dummy
X Y
1 1 3
2 2 2
3 3 1
4 4 4
5 5 5
6 5 6
7 2 7
8 6 3
9 7 4
10 2 2
I have a data.frame that consists of values from 1 to 7. I want to change the 1's to 7's (and vice versa), 2's to 6's (and vice versa), 3's to 5's (and vice versa), and the 4's will stay as 4's. I.e. essentially I want to 'reverse' the numbers. I thought about writing a for loop to iterate over each value in each column and use ifelse statements, but how can I change, say, the 7's to 1's and the 1's to 7s simultaneously?

Considering all the pairs of numbers you want to switch have a sum of 8, you can subtract your original data frame from 8 and all the values should be reverted as you want, so you can just do 8 - dummy:
dummy = 8 - dummy
dummy
# X Y
#1 7 5
#2 6 6
#3 5 7
#4 4 4
#5 3 3
#6 3 2
#7 6 1
#8 2 5
#9 1 4
#10 6 6

match is the right generic way to do this - it will work even when you can't find a nice simple mathematical operation:
First set up key and value vectors, where the ith entry of key you want to replace with the corresponding entry of value:
key = 1:7 # key to look up (current value)
value = 7:1 # desired value corresponding to key
dummy$newX = value[match(dummy$X, key)]
dummy$newY = value[match(dummy$Y, key)]
# X Y newX newY
# 1 1 3 7 5
# 2 2 2 6 6
# 3 3 1 5 7
# 4 4 4 4 4
# 5 5 5 3 3
# 6 5 6 3 2
# 7 2 7 6 1
# 8 6 3 2 5
# 9 7 4 1 4
# 10 2 2 6 6
You could, of course, directly overwrite X and Y - I keep them both here to demonstrate that it worked.

Making a little more generic:
max(dummy) + min(dummy) - dummy
X Y
1 7 5
2 6 6
3 5 7
4 4 4
5 3 3
6 3 2
7 6 1
8 2 5
9 1 4
10 6 6

Related

Extract cumulative unique values in a rolling basis (reset and resume) using data.table R

Given a data.table, I would like to extract cumulative unique elements until it reachs three unique values, than reset and resume:
y <- data.table(a=c(1, 2, 2, 3, 3, 4, 3, 2, 2, 5, 6, 7, 9, 8))
The desired output unique_acc_roll_3 is:
a unique_acc_roll_3
1 1
2 1 2
2 1 2
3 1 2 3
3 1 2 3
4 4 #4 is the forth element, so it resets and start again
3 3 4
2 2 3 4
2 2 3 4
5 5 #5 is the forth element, so it resets and start again
6 5 6
7 5 6 7
9 9 #9 is the forth element, so it resets and start again
8 8 9
Because it refers back recursively, I really got stucked... Real data is large, so data.table solutions would be great.
I can't think of any way to avoid a for loop essentially, except to hide it behind a Reduce call. My logic is to keep union-ing each new value at each row, until the set grows to length == n, at which point the new value is used as the starting point to the next iteration of the loop.
unionlim <- function(x, y, n=4) {
u <- union(x,y)
if(length(u) == n) y else u
}
y[, out := sapply(Reduce(unionlim, a, accumulate=TRUE), paste, collapse=" ")]
# a out
# 1: 1 1
# 2: 2 1 2
# 3: 2 1 2
# 4: 3 1 2 3
# 5: 3 1 2 3
# 6: 4 4
# 7: 3 4 3
# 8: 2 4 3 2
# 9: 2 4 3 2
#10: 5 5
#11: 6 5 6
#12: 7 5 6 7
#13: 9 9
#14: 8 9 8
This is far from the fastest code on the planet, but a quick test suggests it will chew about 1M cases in ~15 seconds on my decent machine.
bigy <- y[rep(1:nrow(y), 75e3)]
system.time({
bigy[, out := sapply(Reduce(unionlim, a, accumulate=TRUE), paste, collapse=" ")]
})
# user system elapsed
# 14.27 0.09 15.06
purrr::accumulate also does the work here
y$b <- accumulate(y$a, ~if(length(union(.x, .y)) == 4) .y else union(.x, .y))
y
a b
1 1 1
2 2 1, 2
3 2 1, 2
4 3 1, 2, 3
5 3 1, 2, 3
6 4 4
7 3 4, 3
8 2 4, 3, 2
9 2 4, 3, 2
10 5 5
11 6 5, 6
12 7 5, 6, 7
13 9 9
14 8 9, 8

Transforming a looping factor variable into a sequence of numerics

I have a factor variable with 6 levels, which simplified looks like:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 1 1 1 2 2 2 2... 1 1 1 2 2... (with n = 78)
Note, that each number is repeated mostly but not always three times.
I need to transform this variable into the following pattern:
1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8...
where each repetition of the 6 levels continuous counting ascending.
Is there any way / any function that lets me do that?
Sorry for my bad description!
Assuming that you have a numerical vector that represents your simplified version you posted. i.e. x = c(1,1,1,2,2,3,3,3,1,1,2,2), you can use this:
library(dplyr)
cumsum(x != lag(x, default = 0))
# [1] 1 1 1 2 2 3 3 3 4 4 5 5
which compares each value to its previous one and if they are different it adds 1 (starting from 1).
Maybe you can try rle, i.e.,
v <- rep(seq_along((v<-rle(x))$values),v$lengths)
Example with dummy data
x = c(1,1,1,2,2,3,3,3,4,4,5,6,1,1,2,2,3,3,3,4,4)
then we can get
> v
[1] 1 1 1 2 2 3 3 3 4 4 5 6 7 7 8 8 9 9
[19] 9 10 10
In base you can use diff and cumsum.
c(1, cumsum(diff(x)!=0)+1)
# [1] 1 1 2 2 2 3 3 3 4 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 8
Data:
x <- c(1,1,2,2,2,3,3,3,4,4,4,4,5,5,5,6,6,6,1,1,1,2,2,2,2)

Create new column and carry forward value from previous group to next

I am trying to carry forward value from the previous group to the next group. I tried to solve it using rleid but that could not get the desired result.
df <- data.frame(signal = c(1,1,5,5,5,2,3,3,3,4,4,5,5,5,5,6,7,7,8,9,9,9,10),
desired_outcome = c(NA, NA, 1, 1, 1, 5, 2, 2, 2, 3, 3, 4, 4,4,4,5,6,6,7,8,8,8,9))
# outcome column has the expected result -
signal desired_outcome
1 1 NA
2 1 NA
3 5 1
4 5 1
5 5 1
6 2 5
7 3 2
8 3 2
9 3 2
10 4 3
11 4 3
12 5 4
13 5 4
14 5 4
15 5 4
16 6 5
17 7 6
18 7 6
19 8 7
20 9 8
21 9 8
22 9 8
23 10 9
rle will give the lengths and values of sequences where the same value occur. Then: remove the last value, shift remaining values one over, add an NA to the beginning of the value to account for removing the last value, and repeat each value as given by lengths (i.e. the lengths of sequences of same value in the original vector).
with(rle(df$signal), rep(c(NA, head(values, -1)), lengths))
# [1] NA NA 1 1 1 5 2 2 2 3 3 4 4 4 4 5 6 6 7 8 8 8 9
Another way could be to first lag signal then use rleid to create groups and use mutate to broadcast first value of each group to all the values.
library(dplyr)
df %>%
mutate(out = lag(signal)) %>%
group_by(group = data.table::rleid(signal)) %>%
mutate(out = first(out)) %>%
ungroup() %>%
select(-group)
# A tibble: 23 x 2
# signal out
# <dbl> <dbl>
# 1 1 NA
# 2 1 NA
# 3 5 1
# 4 5 1
# 5 5 1
# 6 2 5
# 7 3 2
# 8 3 2
# 9 3 2
#10 4 3
# … with 13 more rows

Compare 2 values of the same row of a matrix with the row and column index of another matrix in R

I have a matrix1 with 11217 rows and 2 columns, a second matrix2 which has 10 rows and 10 columns. Now, I want to compare the values in the rows of matrix 1 with the indices of matrix 2 and if these are the same then the value of the corresponding index (currently 0) of the matrix2 should be increased with +1.
c1 <- x[2:11218] #these values go from 1 to 10
#second column from index 3 to N
c2 <- x[3:11219] #these values also go from 1 to 10
#matrix with column c1 and c2
m1 <- as.matrix(cbind(c1 = c1, c2 = c2))
#empty matrix which will count the frequencies
m2 <- matrix(0, nrow = 10, ncol = 10)
#change row and column names of m2 to the numbers of 1 to 10
dimnames(m2) <-list(c(1:10), c(1:10))
#go through every row of the matrix m1 and look which rotation appears, add 1 to m2 if the rotation
#equals the corresponding index
r <- c(1:10)
c <- c(1:10)
for (i in 1:nrow(m1)) {
if(m1[i,1] == r & m1[i,2] == c)
m2[r,c]+1
}
no frequencies where calculated, i don't understand why?
It appears that you are trying to replicate the behavior of table. I'd recommend just using it instead.
Simpler data (it appears you did not include variable x):
m1 <-
matrix(round(runif(20, 1,10))
, ncol = 2)
Then, use table. Here, I am setting the values of each column to be a factor to ensure that the right columns are generated:
table(factor(m1[,1], 1:10)
, factor(m1[,2], 1:10))
gives:
1 2 3 4 5 6 7 8 9 10
1 3 4 0 4 2 0 5 3 2 0
2 3 7 9 7 4 5 3 4 5 2
3 4 6 3 10 8 9 4 2 7 3
4 5 2 14 3 7 13 8 11 3 3
5 2 13 2 5 8 5 7 7 8 6
6 1 10 7 4 5 6 8 5 8 5
7 3 3 6 5 4 5 4 8 7 7
8 5 5 8 7 6 10 5 4 3 4
9 2 5 8 4 7 4 4 6 4 2
10 3 1 2 3 3 5 3 5 1 0

R Vectorization: How to return the index of the first element of each row in a matrix that meets a condition and sum all elements until that index?

I am looking for a vectorized solution. Say I generate 100 samples of 10 draws with replacement. Next, I want to find the first index of the first element of a matrix of cumulative sums that means some condition, say, >=10. Then, I want to sum all elements of each row up until the index of the first element meeting that condition. MWE:
set <- c(1, 5, 7, 13, 15, 17)
samp <- samp <- matrix(sample(set, size = 100*10, replace = TRUE), nrow=simCount) # generate 100 samples of 10 draws
b <- matrix(apply(samp, 1, cumsum),
nrow = 100, byrow=TRUE) >= 10 # compare each element with 10, return boolean
I'm not sure how to use apply with which(x)=="TRUE". I tried a few variations but I'm not sure how code it correctly.
After I get that, I'll be able to use apply(b, 1, min) to return the first element (minimum index) for each row that is >=10.
Set seed please for "random" examples:
set.seed(111)
samp <- matrix(sample(1:5, s=1000, r=T), nrow=100)
(answer1 <- samp[which(apply(samp,1,function(x)sum(x)>30)),1])
# [1] 4 3 3 3 1 1 3 5 2 4 2 5 4 2 4 1 3 2 4 4 5 4 2 4 5 5 4 5 3 3 1 1 2 1 4 3 4 5
#[39] 1 5 1 4 4 3 3 2 5 5
Explanation:
apply(samp,1, function(x) sum(x) > 30)
Well, if you add 10 positive integers, >=10 will ALWAYS be true.
apply to "samp" for each row this function.
which(x) returns the index of all TRUE values of x. (the rows of interest)
samp[(rows returned by which), (1)st column] ... basic indexing
unwrap step by step from the outside in for better understanding.
b <- matrix(apply(samp, 1, cumsum), nrow=100, byrow=T)>=10
apply(b,1,function(x)which(x)[1])
# [1] 4 5 4 3 3 5 3 4 3 4 3 3 5 4 5 4 2 4 3 6 3 3 5 4 3 3 2 4 4 6 3 4 3 4 5 4 4
# [38] 4 3 5 3 6 3 3 5 5 3 3 4 6 4 5 4 4 3 4 4 4 2 5 3 4 3 4 4 3 4 6 3 5 4 4 4 4
# [75] 3 3 5 4 4 3 3 4 4 5 4 4 4 3 4 3 5 4 3 5 3 6 4 5 5 3
We could use rowCumsums from library(matrixStats)
library(matrixStats)
apply(rowCumsums(samp)>=10, 1, which.max)

Resources