How is it possible to transform the following vector:
x <- c(0, 0, 0, 1, 0, 3, 2, 0, 0, 0, 5, 0, 0, 0, 8)
into the desired form:
y <- c(1, 1, 1, 1, 3, 3, 2, 5, 5, 5, 5, 8, 8, 8, 8)
Any idea would be highly appreciated.
Here's another approach using only base R:
idx <- x != 0
split(x, cumsum(idx) - idx) <- x[idx]
The x-vector is now:
x
#[1] 1 1 1 1 3 3 2 5 5 5 5 8 8 8 8
you can use zoo to fill NAs via na.locf function as follows,
zoo::na.locf(replace(x, x==0, NA), fromLast = TRUE)
#[1] 1 1 1 1 3 3 2 5 5 5 5 8 8 8 8
Using rle, you can do the following in base R.
tmp <- rle(x)
tmp$values[which(tmp$values == 0)] <- tmp$values[which(tmp$values == 0) + 1L]
inverse.rle(tmp)
[1] 1 1 1 1 3 3 2 5 5 5 5 8 8 8 8
Note that this assumes the final value is not 0. If this is not the case, you could use head(which(tmp$values == 0), -1) in place of which(tmp$values == 0) to drop the final value.
Related
I was asked to create a table with three columns, A, B and C and eight rows. Column A must go 1, 1, 1, 1, 2, 2, 2, 2. Column B must alternate 1, 2, 1, 2, 1, 2, 1, 2. And column C must go 1, 1, 2, 2, 1, 1, 2, 2. I am able to produce the A column data fine, but don't know how to get B or C. This is the code I have so far:
dataSheet <- matrix(nrow = 0, ncol = 3)
colnames(dataSheet) <- c('A', 'B', 'C')
A <- 1
B <- 1
C <- 1
for (A in 1:4){
A=1
dataSheet <- rbind(dataSheet, c(A, B, C))
}
for (A in 5:8){
A=2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
This seems like a good excuse to get familiar with the rep() function as it easily supports this question, but many more complicated questions if you're clever enough:
dt <- data.frame(A = rep(1:2, each = 4),
B = rep(1:2, times = 4),
C = rep(1:2, each = 2))
dt
#> A B C
#> 1 1 1 1
#> 2 1 2 1
#> 3 1 1 2
#> 4 1 2 2
#> 5 2 1 1
#> 6 2 2 1
#> 7 2 1 2
#> 8 2 2 2
Created on 2019-01-26 by the reprex package (v0.2.1)
Simply use R's vectorization for this task, i.e.
A <- c(1, 1, 1, 1, 2, 2, 2, 2)
B <- c(1, 2, 1, 2, 1, 2, 1, 2) # or rep(1:2, 4)
C <- c(1, 1, 2, 2, 1, 1, 2, 2)
cbind(A,B,C)
Maybe something along the lines of the following will be acceptable by your professor.
for (i in 1:8){
A <- if(i <= 4) 1 else 2
B <- if(i %% 2) 1 else 2
C <- if(any(i %% 4 == c(0, 1, 4, 5))) 1 else 2
dataSheet <- rbind(dataSheet, c(A, B, C))
}
dataSheet
# A B C
#[1,] 1 1 1
#[2,] 1 2 2
#[3,] 1 1 2
#[4,] 1 2 1
#[5,] 2 1 1
#[6,] 2 2 2
#[7,] 2 1 2
#[8,] 2 2 1
How can I remove only AN element from a vector in R? For example,
x = c(1, 2, 0, 3, 1, 4, 2, 0)
I want to delete only one of the zeros, randomly. Then
x = c(1, 2, 0, 3, 1, 4, 2)
or
x = c(1, 2, 3, 1, 4, 2, 0)
To randomly choose which zero gets removed, you can use
x[-sample(which(x == 0), 1)]
Obviously the above will only work if there is at least one zero in x. As a safeguard, you can use an if() statement.
if(length(w <- which(x == 0))) x[-sample(w, 1)] else x
# [1] 1 2 0 3 1 4 2
if(length(w <- which(x == 0))) x[-sample(w, 1)] else x
# [1] 1 2 3 1 4 2 0
Searching for 11, where there are none, we get the entire vector x back.
if(length(w <- which(x == 11))) x[-sample(w, 1)] else x
# [1] 1 2 0 3 1 4 2 0
Given n, generate a sequence like this:
0, 0, 1, 0, 1, 2, ........, 0, 1, 2, 3, 4, 5, 6, ....n
Let's say n=3, then the sequence should be:
0, 0, 1, 0, 1, 2, 0, 1, 2, 3
I've tried using rep, but it only generates a fixed length, where as I need the sequence length to increase each time.
You can use a simply Map with an unlist to get the result you want
n <- 3
unlist(Map(seq, from=0, to=0:n))
# [1] 0 0 1 0 1 2 0 1 2 3
From this answer
n <- 3
sequence(0:(n+1))-1
# [1] 0 0 1 0 1 2 0 1 2 3
thank you for your time. I have the following data (snippet). Its from longitudinal data, reformed to a wide-format-file of work status, each colum represents one month, each row an individual.
Code:
j1992_12 = c(1, 10, 1, 7, 1, 1)
j1993_01 = c( 1, 1, 1, NA, 3, 1)
j1993_02 = c( 1, 1, 1, NA, 3, 1)
j1993_03 = c( 1, 8, 1, NA, 3, 1)
j1993_04 = c( 1, 8, 1, NA, 3, 1)
j1993_05 = c( 1, 8, 1, NA, 3, 1)
j1993_06 = c( 1, 8, 1, NA, 3, 1)
j1993_07 = c( 1, 8, 1, NA, 3, 1)
j1993_08 = c( 1, 8, 1, NA, 3, 1)
j1993_09 = c( 1, 8, 1, NA, 3, 1)
j1993_10 = c( 1, 8, 1, NA, 3, 1)
j1993_11 = c( 1, 8, 1, NA, 3, 1)
j1993_12 = c( 1, 8, 1, NA, 3, 1)
j1994_01 = c( 1, 8, 1, 7, 3, 1)
DF93= data.frame(j1992_12, j1993_01, j1993_02, j1993_03, j1993_04, j1993_05, j1993_06, j1993_07, j1993_08, j1993_09, j1993_10, j1993_11, j1993_12, j1994_01)
Output:
j1992_12 j1993_01 j1993_02 j1993_03 j1993_04 j1993_05 j1993_06 j1993_07 j1993_08 j1993_09 j1993_10 j1993_11 j1993_12 j1994_01
R1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
R2 10 1 1 8 8 8 8 8 8 8 8 8 8 8
R3 1 1 1 1 1 1 1 1 1 1 1 1 1 1
R4 7 NA NA NA NA NA NA NA NA NA NA NA NA 7
R5 1 3 3 3 3 3 3 3 3 3 3 3 3 3
R6 1 1 1 1 1 1 1 1 1 1 1 1 1 1
My wish is to check für occurrences of 12 months straight withe "NA" as in line R4. I would like then to check if the last occurence of the year before (j1992_12) has the same value as the first occurence of the year that follows ((j1994_01). If yes I assume there was no change in work status and therefore all 12 months should get the value, that is given in the last month of the year before. If not, all should stay untouched.
Method so far:
DF93_2 = DF93
DF93_2[,2:13] <- ifelse (is.na( DF93[,2:13]) && (DF93[,1]==DF93[,14]), DF93[,1] , DF93[,2:13])
I now see, that if I try it with just a single colum like the code beneath, it replaces the whole column. How to teach R to just replace rowwise?
DF93_2[,2] <- ifelse (is.na( DF93[,2:13]) && (DF93[,1]==DF93[,14]), DF93[,1] , DF93[,2])
If someone could please give me a hint where the flaw in my understanding of R is, I would be very grateful.
EDIT! Only the original file is longitudinal, this format now is WIDE and what I need for a time series analysis. It is already cross-checked with survey data of all years (18 years, beginning 1992 going to 2010) so I would rather not retransform in into long-format an am looking for an possibility with conditions as pointed out above, that I could adjust as the condition differs.
After further testing, I think the problem lies within the search for 12 subsequent NA in a row. I just cannot find a solution to that. If you have any idea, please share. Thank you!
EWAZ99_2[,15:26] <- ifelse ( is.na( EWAZ99[,15:26]) & (EWAZ99[,14]==EWAZ99[,27]), EWAZ99[,14] , EWAZ99[,15:26])
I think this is what you are looking for.
Not sure if I understood your right, does something like this help?
naAction <- function(x) {
if (any(is.na(x))) {
if (x[1] == x[length(x)]) {
x[is.na(x)] <- x[1]
}
}
x
}
apply(DF93, 2, naAction)
Here's one way:
as.data.frame(t(apply(DF93, 1, function(x)
if(x[1] == tail(x, 1) && all(is.na(head(x, -1)[-1])))
replace(x, is.na(x), x[1]) else x)))
I would like to create a new variable, Number, which sequentially generate numbers within a group ID, starting at a particular condition (in this case, when Percent > 5).
groupID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Percent <- c( 3, 4, 5, 10, 2, 1, 6, 8, 4, 8, 10, 11)
Number <- ifelse (Percent < 5, 0, 1:4)
I get:
> Number
[1] 0 0 3 4 0 0 3 4 0 2 3 4
But I'd like:
0 0 1 2 0 0 1 2 0 1 2 3
I did not include groupID variable within the ifelse statement and used 1:4 instead, as there are always 4 rows within each groupID.
Any suggestions or clues? Thank you!
ave(Percent, groupID, FUN=function(x) cumsum(x>=5))
[1] 0 0 1 2 0 0 1 2 0 1 2 3
To the example in the comments below, this is my alternate logical test to be cumsum()-ed:
ave(Percent, groupID, FUN=function(x) cumsum(seq_along(x)>= which(x >=5)[1]) )
It's ugly and throws warnings, but it gets you what you want:
ave(Percent,groupID,FUN=function(x) {x[x<5] <- 0; x[x>=5] <- 1:4; x} )
#[1] 0 0 1 2 0 0 1 2 0 1 2 3
#BondedDust's answer below using cumsum is almost certainly more appropriate though.
If your data was not always in ascending order in each group, you could also replace all the >=5 values like:
Percent <- c( 3, 5, 4, 10, 2, 1, 6, 8, 4, 8, 10, 11)
ave(Percent, list(groupID,Percent>=5), FUN=function(x) cumsum(x>=5))
#[1] 0 1 0 2 0 0 1 2 0 1 2 3
Try this:
ID <- c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3)
Percent <- c( 3, 4, 5, 10, 2, 1, 6, 8, 4, 8, 10, 11)
Number <- Percent >= 5
result = lapply(seq_along(Number), function(i){
if( length(which(! Number[1:i]) ) == 0){start = 1}
else {start =max(which(! Number[1:i]) )}
sum( Number[start : i])
})
> unlist(result)
[1] 0 0 1 2 0 0 1 2 0 1 2 3