R: Replace values between two numbers with the number - r

Here is the dataframe
sampledf = data.frame(timeinterval = c(1:120), hour = c(rep(NA, times = 85), 1, rep(NA, times = 5), 1, rep(NA, times = 4),1, rep(NA, times = 4), 1, rep(NA, times = 18)))
I want to replace the NAs in column hour such that values between 86th row and 92 (inclusive) and then between 97 and 102 (inclusive) should all be 1.
Here is what I've tried so far:
1. Getting the list of rownames with value 1 in hour column
2. Looping through (This is what is not working!)
ones = which(sampledf$hour == 1)
n = (length(ones)+1)/2
chunk <- function(ones,n) split(ones, cut(seq_along(ones), n, labels = FALSE))
y = chunk(ones,n)
for (i in y) {
sampledf$Hour[c(y$i[1]:y$i[2])] == 1
}
Help me out, I'm new to R.
In python we have ffill method for this, what an equivalent here?
Thanks!

sampledf$hour[between(sampledf$timeinterval,86,92) | between(sampledf$timeinterval,97,102)]<-1
Basically you subset sampledf's hour column by those cases where timeinterval is between 86-92 or (|) 92-102, and assign 1 to all those cases.

If you want to assign 1 to all timeintervals in the given ranges:
sampledf$hour[sampledf$timeinterval %in% c(86:92,97:102)] <- 1
If you want to assign 1 to cases based on the rownumbers of your data:
sampledf$hour[c(86:92,97:102)] <- 1
If you want to add a cumulated sum to your values as in your comment, you can just use the cumsum() function and do:
sampledf$hour[which(sampledf$hour == 1)] <- cumsum(sampledf$hour[which(sampledf$hour == 1)])

Related

Loop only running for the last iteration in R - Looping over participants

I am very new to R and I am trying to run a loop, so any help is greatly appreciated.
I have longitudinal data with multiple timepoints for each participant, which looks like the picture attached1
I need to replace the NA values with the values from when the Years variable is equal to 0, and I want to write a loop to do this for each participant. I have written some code which seems to work, however it only gives output for the last iteration of the loop (the last participant). This is the code I am using:
x <- c(1:4)
n = length(x)
for(i in 1:n)
{
data <- subset(df, ID %in% c(x[i]))
data$outcome <- ifelse(is.na(data$outcome),
data[1,3],
data$outcome)
}
Using this code, the output gives only the last iteration (i.e. in this case, ID 4). I need to complete this for all IDs.
Any help is much appreciated! Thankyou.
I'm not 100% clear on your intent, but this will, within an ID, fill all outcomes missing values by the (first) outcomes value from a row where Years == 0.
library(dplyr)
df %>%
group_by(ID) %>%
mutate(outcome = coalesce(outcome, first(outcomes[Years == 0])))
Obvioustly untested, but if you provide some sample data I'll happily help debug.
Your loop replaces data$outcome each iteration. That is why you only get the last result.
Here's my inelegant solution:
Making sample data to match yours (not including unused column)
my_dat <- data.frame("years" = sample(c(0, 1.5, 3), 30, replace = T),
"outcome" = as.numeric(sample(c("", 1, 2), 30, replace = T)))
Find which rows are both 0 for years and missing outcome
my_index <- my_dat$years == 0 * is.na(my_dat$outcome)
Assign 0 to replace NA:
my_dat$outcome[my_index] <- 0
A simpler tidyverse method:
library(tidyverse)
df %>%
filter(ID %in% x) %>%
mutate(outcome = ifelse(is.na(Outcome), Years, Outcome))
your question could do with some clarification and a repreducible example. As I understand it from: "I need to replace the NA values with the values from when the Years variable is equal to 0". So if outcome equals NA and Years equals 0 you want outcom to equal 0?
set.seed(1984) # ser the seed so that my_dat is the same each time
# using a modified df from markhogue answer...
my_dat <- data.frame(
ID = 1:30,
years = sample(c(0, 1.5, 3), 30, replace = T),
outcome = as.numeric(sample(c("", 1, 2), 30, replace = T))
)
my_dat # have a look at rows 9 and 22
# ifelse given two conditions does year == 0 and is.na(outcome)
my_dat$outcome <- ifelse(my_dat$year == 0 & is.na(my_dat$outcome), my_dat$years, my_dat$outcome)
my_dat # have a look at rows 9 and 22
Let me know if this is what you need :)

Create a function that replace values where n <5 with a random number between 1 and 4 (integer)

I am quite new to R and have run into a problem I apparently can't solve by myself. It should be fairly easy thou.
I aim to write a generic function that manipulates column n in dataframe df. I want it to peform a simple task, for each row, when n < 5 it should replace that value with a random number between 1 and 4.
df <- data.frame(n= 1:10, y = letters[1:10],
stringsAsFactors = FALSE)
What is the most elegant solution?
One way to do is create a logical index based on the column, subset the column based on the index and assign the sampled values
f1 <- function(dat, col) {
i1 <- dat[[col]] < 5
dat[[col]][i1] <- sample(1:4, sum(i1), replace = TRUE)
dat
}
f1(df, "n")

generate variable based on first occurrence of a value

I have 5 repeat measures called pub1:pub5 each taking a value of 1 to 4. Each was measured at a different age age1:age5. That is, pub1 was measured at age1....pub5 at age5 etc.
I would like to create a new variable age_pb2 that shows the age at which a value of 2 first occurred in pub. For example, for individual x, age_pb2 will equal age3 if the first time a value of 2 is scored is in pub3
I have tried modifying previous code but not had much luck.
library(tidyverse)
#Example data
N <- 2000
data <- data.frame(id = 1:2000,age1 = rnorm(N,6:8),age2 = rnorm(N,7:9),age3 = rnorm(N,8:10),
age4 = rnorm(N,9:11),age5 = rnorm(N,10:12),pub1 = rnorm(N,1:2),pub2 = rnorm(N,1:2),
pub3 = rnorm(N,1:2),pub4 = rnorm(N,1:2),pub5 = rnorm(N,1:2))
data <- data %>% mutate_at(vars(starts_with("pub")), funs(round(replace(., .< 0, NA), 0)))
#New variable showing first age at getting a score of 2 (doesn't work)
i1 <- grepl('^pub', names(data)) # index for pub columns
i2 <- grepl('^age', names(data)) # index for age columns
data[paste0("age_pb2")] <- lapply(2, function(i) {
j1 <- max.col(data[i1] == i, 'first')
j2 <- rowSums(data[i1] == i) == 0
data[i2][cbind(seq_len(nrow(data)), j1 *(NA^j2))]
})
set.seed(1)
N <- 2000
data <- data.frame(id = 1:2000,age1 = rnorm(N,6:8),age2 = rnorm(N,7:9),age3 = rnorm(N,8:10),
age4 = rnorm(N,9:11),age5 = rnorm(N,10:12),pub1 = rnorm(N,1:2),pub2 = rnorm(N,1:2),
pub3 = rnorm(N,1:2),pub4 = rnorm(N,1:2),pub5 = rnorm(N,1:2)) %>%
mutate_at(vars(starts_with("pub")), funs(round(replace(., .< 0, NA), 0))) %>%
mutate(age_pb2 = eval(parse(text = paste0("age", which.min(apply(select(., starts_with("pub")), 2, function(x) which(x == 2)[1]))))))
The way it works, you apply over the pubs columns and take with which(x == 2)[1] the first matched row per column, then take the which.min to get the column index number (of pub respectively age) which you then paste with "age" to assign (using eval(parse(text = variable name))) the respective column.
E.g. here after apply you get
[pub1 = 2, pub2 = 1, pub3 = 2, pub4 = 4, pub5 = 2]
which is the first occurrence of 2 per column. The earliest (which.min) occurrence is for the second pub column, thus index is 2. This pasted with "age" and eval parsed to mutate.
EDIT
It is probably more convenient to do it in a for loop for all age_pbi, or there is an easy solution in dplyr that I am not aware of.
for (i in 1:5) {
index <- which.min(apply(select(data, starts_with("pub")), 2, function(x) which(x == i)[1]))
data[ ,paste0("age_pb", i)] <- data[ ,paste0("age", index)]
}
Note however, that which.min takes the first minimum. E.g. pub1 and pub2 both have a 1 in the first row, so the above approach assigns age1 to age_pb1 whereas it could be age2 as well. I don't know what you want to do with this, so can't say what is a better option.

How to get counter incremented in apply loop

I'm trying to make a counter count each row of a data frame which column 1 needs to equal "vsrv11" and column 3 must is a date that needs to have year 2017.
So I did this code and the counter increments inside the if statement but for every iteration of the loop the counter becomes 0 again.
count <- 0
funcao.teste <- function (x) {
if (x[1] == "vsrv11" && substring(x[3],0,4) == "2017") {
count <<- count + 1
}
}
apply(vpnsessions, 1, funcao.teste, count)
Generally, I'd advise against using global variables and also, you could check this with simple filtering.
df <- data.frame(x = sample(c("vsrv11", rnorm(10)), 100, replace = TRUE),
y = rnorm(100),
z = as.character(sample(c(2017, 2018), 100, replace = TRUE)))
nrow(df[df[, 1] == "vsrv11" & grepl("2017", df[, 3]), ])
or just
sum(df[, 1] == "vsrv11" & grepl("2017", df[, 3]))
In the tidyverse you can perform such an operation using dplyr::count:
# Sample data
vpnsessions <- data.frame(
srv = "vsrv11",
id = c(rep("2017_abc", 10), rep("2018_def", 8)),
stringsAsFactors = F)
library(dplyr);
count(vpnsessions, year = substr(id, 1, 4))
## A tibble: 2 x 2
# year n
# <chr> <int>
#1 2017 10
#2 2018 8
Note how count counts the number of occurrences of ids. It's easy to extract relevant rows from the resulting data.frame/tibble.
To nitpick, in R indexing starts with 1 not with 0, so substring(..., 0, 4) from your code should be substring(..., 1, 4).

How to find values less than -1 in each row for every 12 columns in R?

I have a matrix(100*120) and I am trying to find values <=-1 in each row for every 12 columns. I have tried several times but failed. It is easy to find values which are <= -1, but I do not know how to consider for every 12 columns and store the results for each row. Thanks for any help.
set.seed(100)
Mydata <- sample(x=-3:3,size = 100*120,replace = T)
Mydata <- matrix(data = Mydata,nrow = 100,ncol = 120)
results <- which(Mydata<=-1,arr.ind = T)
You can use the apply function to apply the which function across each column for each row at a time. If I misinterpreted what you wanted, you can adjust the MARGIN argument accordingly.
# MARGIN=1 to apply across rows
dd <- apply(Mydata,MARGIN=1,function(x) which(x <= -1))
dd[1] # which columns in row 1 have a value <= -1
You can do this using a combination of apply functions and seq()
#Example Data
set.seed(100)
Mydata <- sample(x=-3:3,size = 100*120,replace = T)
Mydata <- matrix(data = Mydata,nrow = 100,ncol = 120)
#Solution:
Myseq <- sapply(0:9,function(x) seq(1,12,1) + 12*x)
sapply(1:dim(Myseq)[2], function(x) which(Mydata[,Myseq[,x]] == -1))
This results in a list with:
each subset of the list representing one of your 10 groups of 12 columns
each value under each subset representing the position in the matrix of any value in those 12 columns with a value equal to -1.

Resources