How to remove sequences from longitudal data based on a given condition?

How to remove sequences from longitudal data based on a given condition? - r

I have a vector of consecutive states (you can only go from 3 to 4, from 4 to 5 etc., and there's no way back):
cons_states <- c(3,4,5,6)
Simultenously I have data:
from to status id
2 3 1 1
2 4 0 2
2 5 0 3
2 6 0 4
2 8 0 5
2 16 0 6
3 4 0 7
3 8 0 8
3 16 1 9
16 3 0 10
16 4 0 11
16 5 0 12
16 6 0 13
16 8 1 14
8 3 0 15
8 4 1 16
8 5 0 17
8 6 0 18
I have two assumptions that I would like my data to perform:
if state was visited there's no way back, for example once state 3 was visited (to=3 & status=1) there shouldn't be anymore possibility to move to state 3 from the next states (there shouldn't be anymore to=3):
from to status id
2 3 1 1
2 4 0 2
2 5 0 3
2 6 0 4
2 8 0 5
2 16 0 6
3 4 0 7
3 8 0 8
3 16 1 9
16 4 0 11
16 5 0 12
16 6 0 13
16 8 1 14
8 4 1 16
8 5 0 17
8 6 0 18
I managed to do it with (it's ugly I realize it, but it works):
ind <- data[which(data$status == 1),]
res <- NULL
for (j in 1:nrow(ind)){
ind_to <- unlist(ind [j,c("to")])
ind_id <- unlist(ind [j,c("id")])
id_remove <- data[which(data$to == ind_to & data$id> ind_id ),"seq"]
if(length(id_remove) == 0) next
res <- rbind(id_remove, res)
}
Which gives me a vector of IDs to remove from my data that fulfills my first assumption.
Also I would like to meet an assumption that if we going to state that belongs to vector cons_states we can go only to the consecutive one yet no visited. As we can see if the state number in "from" belongs to cons_states vector - the problem doesn't exist. Otherwise there's a possibility to move to other states only than the consecutive.
My desired output would be:
from to status id
2 3 1 1
2 8 0 5
2 16 0 6
3 4 0 7
3 8 0 8
3 16 1 9
16 4 0 11
16 8 1 14
8 4 1 16
I spent a lot of time trying to figure it out but I'm stucking on writing complicated loops that doesn't work. Is there any not super complicated way to do it?

Related

Creating new column names using dplyr across and .names

I have the following data frame:
df <- data.frame(A_TR1=sample(10:20, 8, replace = TRUE),A_TR2=seq(2, 16, by=2), A_TR3=seq(1, 16, by=2),
B_TR1=seq(1, 16, by=2),B_TR2=seq(2, 16, by=2), B_TR3=seq(1, 16, by=2))
> df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3
1 11 2 1 1 2 1
2 12 4 3 3 4 3
3 18 6 5 5 6 5
4 11 8 7 7 8 7
5 17 10 9 9 10 9
6 17 12 11 11 12 11
7 14 14 13 13 14 13
8 11 16 15 15 16 15
What I would like to do, is subtract B_TR1 from A_TR1, B_TR2 from A_TR2, and so on and create new columns from these, similar to below:
df$x_TR1 <- (df$A_TR1 - df$B_TR1)
df$x_TR2 <- (df$A_TR2 - df$B_TR2)
df$x_TR3 <- (df$A_TR3 - df$B_TR3)
> df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1 12 2 1 1 2 1 11 0 0
2 11 4 3 3 4 3 8 0 0
3 19 6 5 5 6 5 14 0 0
4 13 8 7 7 8 7 6 0 0
5 12 10 9 9 10 9 3 0 0
6 16 12 11 11 12 11 5 0 0
7 16 14 13 13 14 13 3 0 0
8 18 16 15 15 16 15 3 0 0
I would like to name these columns "x TR1", "x TR2", etc. I tried to do the following:
xdf <- df%>%mutate(across(starts_with("A_TR"), -across(starts_with("B_TR")), .names="x TR{.col}"))
However, I get an error in mutate():
attempt to select less than one element in integerOneIndex
I also don't know how to create the proper column names, in terms of getting the numbers right -- I am not even sure the glue() syntax allows for it. Any help appreciated here.

We could use .names in the first across to replace the substring 'a' with 'x' from the column names (.col) while subtracting from the second set of columns
library(dplyr)
library(stringr)
df <- df %>%
mutate(across(starts_with("A_TR"),
.names = "{str_replace(.col, 'A', 'x')}") -
across(starts_with("B_TR")))
-output
df
A_TR1 A_TR2 A_TR3 B_TR1 B_TR2 B_TR3 x_TR1 x_TR2 x_TR3
1 10 2 1 1 2 1 9 0 0
2 10 4 3 3 4 3 7 0 0
3 16 6 5 5 6 5 11 0 0
4 12 8 7 7 8 7 5 0 0
5 20 10 9 9 10 9 11 0 0
6 19 12 11 11 12 11 8 0 0
7 17 14 13 13 14 13 4 0 0
8 14 16 15 15 16 15 -1 0 0

accumulation in R by row

Hi I'm trying to do a simple accumulation model in R. Very simple to do in excel, but of course i need to do it for about 1000 data sets so i would like to code it in R.
Simply put the model is for accumulating and melting snow. The result should be in the 'pack' column. Which should just be previous days pack + snow - melt. Any thoughts on the best way to call the previous days pack? (should initiate with 0 snowpack on day 1)
The second problem is that pack cannot be negative, so on days when it melts but there is no accumulated snow, the pack should stay at 0.
df <- read.csv(file = "ddf_mod.csv", header = TRUE)
> df
day snow melt pack
1 1 0 6 0
2 2 0 2 0
3 3 0 8 0
4 4 0 2 0
5 5 2 0 2
6 6 3 0 5
7 7 4 0 9
8 8 5 0 14
9 9 0 5 9
10 10 0 6 3
11 11 0 3 0
12 12 5 0 5
13 13 8 0 13
14 14 1 0 14
15 15 3 0 17
16 16 0 0 17

The part where it can't be below 0 makes this a bit trickier than normal, but you can accomplish this stepwise calculation with Reduce(). For example
new_melt <- Reduce(function(prev, change) {
max(prev + change$snow - change$melt, 0)
},
split(df[c("snow","melt")], seq.int(nrow(df))),
init=0,
accumulate = TRUE)[-1]
Here we split the snow/melt values into a list of pairs of observations using split() and then we iterate over them. Each time taking the previous value, adding snow, removing melt, and using max() to make sure it never goes below 0. (We then remove the initial value with [-1]). Can can merge this new value with the original data to see that it gives what you want
cbind(df, new_melt)
# day snow melt pack new_melt
# 1 1 0 6 0 0
# 2 2 0 2 0 0
# 3 3 0 8 0 0
# 4 4 0 2 0 0
# 5 5 2 0 2 2
# 6 6 3 0 5 5
# 7 7 4 0 9 9
# 8 8 5 0 14 14
# 9 9 0 5 9 9
# 10 10 0 6 3 3
# 11 11 0 3 0 0
# 12 12 5 0 5 5
# 13 13 8 0 13 13
# 14 14 1 0 14 14
# 15 15 3 0 17 17
# 16 16 0 0 17 17

Delete single occurances in longitudinal data

I am working with longitudinal data. I want to remove the observations of people that were only measured once (ids 5,7,9 below). How do I do this? Assume id is the unique identifier for people in the data set. Therefore, I would want to remove observations associated with ids 5,7, and 9. I've played with duplicated, unique, the table function, and the count function in plyr but haven't been successful. Example data below.
y<-sample(1:10, 20, replace=TRUE)
x<-sample(c(0,1),20, replace=TRUE)
id<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,6,6,7,8,8,8,9)
data<-data.frame(cbind(y,x,id))

You would have received immediate assistance had you tagged the post as R,data.frame
Here, the ! "not" function is used to remove id rows which match the values c(5,7,9)
> data[!data$id %in% c(5,7,9),]
y x id
1 3 0 1
2 2 1 1
3 3 0 1
4 9 0 2
5 9 0 2
6 1 0 2
7 9 0 3
8 7 0 3
9 4 0 3
10 9 1 4
11 7 0 4
12 8 1 4
14 4 1 6
15 1 0 6
17 2 0 8
18 8 0 8
19 2 0 8

Replacing values in one column with another based on a 3rd column matching a 4th

I'm working with the following example:
Original Modified New_Orig New
1 2 1 0
2 4 1 0
3 6 4 0
4 8 5 0
5 10 5 0
6 12 5 0
7 14 5 0
8 16 5 0
9 18 9 0
10 20 10 0
I want to replace values in New with values from Modified if New_Orig matches with any value in Original.
Ideally New will look like this:
New
2
2
8
10
10
10
10
10
18
20
Any help much appreciated.
Kind regards,

Here, a new column New is created:
within(dat, New <- Modified*(New_Orig == Original))
Original Modified New_Orig New
1 1 2 1 2
2 2 4 1 0
3 3 6 4 0
4 4 8 5 0
5 5 10 5 10
6 6 12 5 0
7 7 14 5 0
8 8 16 5 0
9 9 18 9 18
10 10 20 10 20
Update
Match values and choose appropriate value from Modified:
within(dat, New <- Modified[match(New_Orig, Original)])
Original Modified New_Orig New
1 1 2 1 2
2 2 4 1 2
3 3 6 4 8
4 4 8 5 10
5 5 10 5 10
6 6 12 5 10
7 7 14 5 10
8 8 16 5 10
9 9 18 9 18
10 10 20 10 20

Since #rcs gave exactly the answer I would give, I thought I would show you an alternative approach to creating this "New" column rather than initializing it as all zeroes.
data <- data.frame(Original = 1:10,
Modified = seq(2, 20, 2),
New_Orig = c(1, 1, 4, 5, 5,
5, 5, 5, 9, 10))
within(data, {
New <- ifelse(Original == New_Orig, Modified, 0)
})
# Original Modified New_Orig New
# 1 1 2 1 2
# 2 2 4 1 0
# 3 3 6 4 0
# 4 4 8 5 0
# 5 5 10 5 10
# 6 6 12 5 0
# 7 7 14 5 0
# 8 8 16 5 0
# 9 9 18 9 18
# 10 10 20 10 20

Try the following:
v <- dat$New_Orig==dat$Original # this gives a logical vector,
# you could also use which(dat$New_Orig==dat$Original)
# to obtain the indices
dat[v, "New"] <- dat[v, "Modified"]

Getting table() to return zeroes in R [duplicate]

This question already has an answer here:
Include levels of zero count in result of table()
(1 answer)
Closed 4 years ago.
This is my first question, and I'm quite new to R and I haven't been able to find how to do it.
So I get some numbers that I want to put into table of frequencies.
12 0 12 2 1 14 1 1 0 3 1 2 8 0 1 3 11 1 1 8 8 8 0 4 4
2 6 1 1 4 1 1 7 0 6 4 6 1 1 1 1 2 4 2 3 7 3 1 1 7
0 0 11 8 1 5 0 5 6 0 0 0 13 1 1 5 2 7 2 1 7 3 4 4 2
4 0 4 4 0 4 0 2 1 1 1 0 5 6 1 4 1 5 3 3 4 0 3 1 0
When I use the function table(X), I get something that looks like
X
0 1 2 3 4 5 6 7 8 11 12 13 14
17 27 9 8 13 5 5 5 5 2 2 1 1
Which leaves out the values 9 and 10! I am trying to make the values 9, 10 appear with zero count.
I have tried using xtabs and tapply, but I'm just not sure how to do this.

I haven't tested this, but I believe you want
table(factor(x, levels = 0:14))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to remove sequences from longitudal data based on a given condition? - r

Related

Creating new column names using dplyr across and .names

accumulation in R by row

Delete single occurances in longitudinal data

Replacing values in one column with another based on a 3rd column matching a 4th

Getting table() to return zeroes in R [duplicate]

Categories

Resources