How to consider following and previous rows of each observation in R - r

I need to create 2 columns: PRETARGET and TARGET based on several conditions.
To create PRETARGET, for each row of my data (for each participant PPT and trial TRIAL) I need to check that the CURRENT_ID is associated with a value of 0 in the column CanBePretarget, and that the following row is the value of CURRENT_ID + 1. If these conditions are fulfil, then I would like to have a value of 0, if they are not fulfil a value of 1.
To create TARGET, for each row of my data (for each participant PPT and trial TRIAL) I need to check that the CURRENT_ID is associated with a value of 0 in the column CanBeTarget, and that the previous row is the value of CURRENT_ID - 1. If these conditions are fulfil, then I would like to have a value of 0, if they are not fulfil a value of 1.
In addition, if the result in PRETARGET is 1, then the value of the next row in TARGET should also be 1.
I have added the desired output in the following example.
I was thinking to use for loops and ifelse statements, but I am not sure how to consider the following/previous row of each observation.
PPT TRIAL PREVIOUS_ID CURRENT_ID NEXT_ID CURRENT_INDEX CanBePretarget CanBeTarget PRETARGET TARGET
ppt01 11 2 3 4 3 0 0 0 1
ppt01 11 3 4 3 4 1 0 1 0
ppt01 11 4 5 6 8 0 0 1 1
ppt01 11 6 7 8 10 0 0 1 1
ppt01 11 7 10 11 18 0 1 0 1
ppt01 11 10 11 12 19 0 0 0 0
ppt01 11 11 12 14 20 1 0 1 0
ppt01 12 1 2 1 2 1 0 1 1
ppt01 12 2 3 4 5 0 0 1 1
ppt01 12 5 6 6 8 0 0 0 1
ppt01 12 6 7 7 10 0 0 0 0
ppt01 12 7 8 9 12 0 0 0 0
ppt01 12 8 9 9 13 0 0 0 0
ppt01 12 9 10 11 16 0 0 0 0
ppt01 12 10 11 11 17 0 0 0 0
ppt01 13 1 2 2 2 1 0 1 1
ppt01 13 3 3 3 10 0 0 1 1
ppt01 13 4 5 6 13 0 0 0 1
ppt01 13 5 6 7 14 0 0 1 0
ppt01 13 9 9 10 19 0 0 0 1
ppt01 13 9 10 10 20 0 0 0 0
ppt01 13 10 11 12 22 0 0 0 0
ppt01 13 11 12 12 23 0 0 1 0
ppt01 14 10 11 11 15 0 0 0 1
ppt01 14 11 12 12 17 0 0 1 0

This can be achieved by using dplyr
df.new <- df %>%
mutate(PRETARGET1 = abs(as.numeric(CanBePretarget == 0 & lead(CURRENT_ID, default = 0) == (CURRENT_ID + 1)) - 1)) %>%
group_by(PPT, TRIAL) %>%
mutate(TARGET1 = abs(as.numeric((CanBeTarget == 0 & lag(CURRENT_ID, default = 0) == (CURRENT_ID - 1)) ) -1),
TARGET1 = ifelse(lag(PRETARGET1, default = 0) == 1, 1, TARGET1))
To compare to your results, I created PRETARGET1 and TARGET1.

Related

Shortest path function returns a wrong path in R igraph

I use get.shortest.paths method to find the shortest path between two vertices. However, something odd is happening. After the comment that I received, I am changing the entire question body. I produced my graph with g <- sample_smallworld(1, 20, 5, 0.1) and here is the adjacency list.
*Vertices 20
*Edges
1 2 0
2 3 0
3 4 0
4 5 0
5 6 0
6 7 0
7 8 0
8 9 0
9 10 0
10 11 0
11 12 0
12 13 0
13 14 0
14 15 0
6 15 0
16 17 0
17 18 0
18 19 0
19 20 0
1 20 0
1 11 0
1 19 0
1 4 0
1 18 0
1 5 0
1 17 0
6 17 0
15 16 0
2 20 0
2 4 0
2 19 0
2 5 0
2 18 0
2 9 0
2 17 0
2 13 0
3 5 0
3 20 0
3 6 0
3 19 0
3 7 0
3 18 0
3 8 0
4 6 0
4 7 0
4 20 0
4 8 0
5 19 0
4 9 0
5 7 0
5 8 0
5 9 0
5 20 0
5 10 0
6 8 0
6 9 0
6 10 0
6 11 0
7 9 0
7 10 0
7 11 0
7 12 0
1 10 0
8 11 0
1 12 0
8 13 0
9 11 0
9 12 0
9 13 0
7 14 0
12 19 0
10 13 0
10 14 0
10 15 0
11 13 0
11 14 0
11 15 0
4 16 0
12 14 0
9 15 0
12 16 0
12 17 0
13 15 0
13 16 0
13 17 0
13 18 0
14 16 0
14 17 0
14 18 0
14 19 0
15 17 0
15 18 0
15 19 0
1 15 0
16 18 0
16 19 0
9 20 0
17 19 0
17 20 0
10 18 0
The shortest path reported between 7 and 2 is:
> get.shortest.paths(g,7,2)
$vpath
$vpath[[1]]
+ 4/20 vertices, from c915453:
[1] 7 14 19 2
Here is the adjacent nodes to node 7 and node 2:
> unlist(neighborhood(g, 1, 7, mode="out"))
[1] 7 3 4 5 6 8 9 10 11 12 14
> unlist(neighborhood(g, 1, 2, mode="out"))
[1] 2 1 3 4 5 9 13 17 18 19 20
As you can see, I can go from 7 to 3 and from 3 to 2. It looks like there is a shorter path. What could I be missing?
Yes, the problem is your edge weights of zero. Looking at the help page ?shortest_paths
weights
Possibly a numeric vector giving edge weights. If this is
NULL and the graph has a weight edge attribute, then the attribute is
used. If this is NA then no weights are used (even if the graph has a
weight attribute).
Note that weights=NULL is the default, so weights will be used. Therefore the weight of the path that was returned is zero - the same as the path that you wanted to get. The weighted distance is the same. If you want to find the path with the smallest number of hops, turn off the use of the weights like this:
get.shortest.paths(g,7,2, weights=NA)$vpath

create a new variable within a for loop in R

I have a dataframe Fix with many variables, among these, there is CURRENT_ID, which is numeric and is between 1 and a number that varies (e.g., in certain cases 12, in other 15, etc.), and also a variable called nitem, that represents the number of the item in my experiment. For each trial and each subject, I would like to identify the minimum and the maximum CURRENT_ID. Then I would like to create a new variable called Remove. In Remove I would like to have a value of 1 if the CURRENT_ID is the minimum or maximum for each trial and participant, and a value of 0, for all the other rows. Following is an example of the data I have and the output I would like to obtain:
SESSION_LABEL TRIAL_INDEX CURRENT_ID nitem OUTPUT
ppt01 1 1 4 1
ppt01 1 1 4 1
ppt01 1 4 4 0
ppt01 1 2 4 0
ppt01 1 2 4 0
ppt01 1 2 4 0
ppt01 1 4 4 0
ppt01 1 5 4 0
ppt01 1 6 4 0
ppt01 1 7 4 0
ppt01 1 8 4 0
ppt01 1 10 4 0
ppt01 1 11 4 0
ppt01 1 11 4 0
ppt01 1 12 4 0
ppt01 1 13 4 0
ppt01 1 13 4 0
ppt01 1 14 4 1
ppt01 1 1 4 1
ppt01 1 1 4 1
ppt01 2 2 2 0
ppt01 2 1 2 1
ppt01 2 5 2 0
ppt01 2 3 2 0
ppt01 2 4 2 0
ppt01 2 5 2 0
ppt01 2 5 2 0
ppt01 2 5 2 0
ppt01 2 6 2 0
ppt01 2 7 2 0
ppt01 2 8 2 0
ppt01 2 10 2 0
ppt01 2 10 2 0
ppt01 2 11 2 0
ppt01 2 13 2 0
ppt01 2 13 2 0
ppt01 2 13 2 0
ppt01 2 14 2 1
ppt01 2 3 2 0
ppt01 2 2 2 0
ppt01 2 1 2 1
ppt01 2 1 2 1
ppt01 2 1 2 1
ppt01 2 5 2 0
ppt01 2 4 2 0
ppt01 2 4 2 0
ppt01 2 5 2 0
ppt01 2 7 2 0
ppt01 2 9 2 0
ppt01 2 10 2 0
ppt01 2 12 2 0
ppt01 2 10 2 0
ppt01 2 10 2 0
ppt01 2 4 2 0
ppt01 2 5 2 0
ppt01 2 4 2 0
ppt01 2 6 2 0
ppt04 2 1 8 1
ppt04 2 1 8 1
ppt04 2 2 8 0
ppt04 2 3 8 0
ppt04 2 4 8 0
ppt04 2 5 8 0
ppt04 2 6 8 0
ppt04 2 7 8 0
ppt04 2 8 8 0
ppt04 2 7 8 0
ppt04 2 6 8 0
ppt04 2 8 8 0
ppt04 2 8 8 0
ppt04 2 10 8 0
ppt04 2 9 8 0
ppt04 2 10 8 0
ppt04 2 13 8 0
ppt04 2 14 8 1
ppt04 2 14 8 1
ppt04 2 1 8 1
ppt04 3 2 10 0
ppt04 3 2 10 0
ppt04 3 2 10 0
ppt04 3 3 10 0
ppt04 3 2 10 0
ppt04 3 4 10 0
ppt04 3 5 10 0
ppt04 3 6 10 0
ppt04 3 7 10 0
ppt04 3 9 10 0
ppt04 3 11 10 0
ppt04 3 12 10 0
ppt04 3 12 10 0
ppt04 3 13 10 0
ppt04 3 14 10 1
ppt04 3 14 10 1
Here is my attempt.
for (j in 1:nrow(Fix)){
Fix$Remove[j] <-ifelse(by(Fix$CURRENT_ID, list(Fix$SESSION_LABEL,Fix$nitem), max), 1,
ifelse(by(Fix$CURRENT_ID, list(Fix$SESSION_LABEL,Fix$nitem), min), 1,0))
}
Also, I am not sure if a for loop is the best day to do it.
Using dplyr:
library(dplyr)
your_data %>%
group_by(SESSION_LABEL, nitem) %>%
mutate(Remove = ifelse(
CURRENT_ID == min(CURRENT_ID) | CURRENT_ID == max(CURRENT_ID),
1, 0
))
You can do with base R:
Fix <- within(Fix, {
mx <- ave(CURRENT_ID, SESSION_LABEL, nitem, FUN=max)
mn <- ave(CURRENT_ID, SESSION_LABEL, nitem, FUN=min)
Remove <- ifelse(CURRENT_ID==mx | CURRENT_ID==mn, 1, 0)
})
But testing the result with your data gives:
which(Fix$Remove!=Fix$OUTPUT)
# [1] 78 79 80 82

Make results of a table into variables (increasing dimensions of data) in R (for visualization)

I'm not sure how to phrase this question. I have a some data which im trying to get into a different format (maybe even an array) so that i can vectorize it. This isn't very concrete, so here's a simplified example:
I have a file like dt, say:
set.seed(1)
time = 1:10
size <- round(runif(10), digits = 1)
count <- round(runif(10)*20)
dt <- data.frame(time,size, count)
dt
time size count
1 1 0.3 4
2 2 0.4 4
3 3 0.6 14
4 4 0.9 8
5 5 0.2 15
6 6 0.9 10
7 7 0.9 14
8 8 0.7 20
9 9 0.6 8
10 10 0.1 16
and i want to end up with...
time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
1 1 0 0 4 0 0 0 0
2 2 0 0 0 4 0 0 0
3 3 0 0 0 0 14 0 0
4 4 0 0 0 0 0 0 8
5 5 0 15 0 0 0 0 0
6 6 0 0 0 0 0 0 10
7 7 0 0 0 0 0 0 14
8 8 0 0 0 0 0 20 0
9 9 0 0 0 0 8 0 0
10 10 16 0 0 0 0 0 0
which has introduced all the possible results for the size variable as new variables.
Then do a cumulative sum on to get something like this, but really that previous step is the trickiest:
time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
1 1 0 0 4 0 0 0 0
2 2 0 0 4 4 0 0 0
3 3 0 0 4 4 14 0 0
4 4 0 0 4 4 14 0 8
5 5 0 15 4 4 14 0 8
6 6 0 15 4 4 14 0 18
7 7 0 15 4 4 14 0 32
8 8 0 15 4 4 14 20 32
9 9 0 15 4 4 22 20 32
10 10 16 15 4 4 22 20 32
We can use dcast to create the 'size' columns, and then loop over the 'size' columns (lapply(...) and do the cumsum.
library(reshape2)
dt1 <- dcast(dt, time~paste0('size_', size), value.var='count', fill=0)
dt1[-1] <- lapply(dt1[-1], cumsum)
dt1
# time size_0.1 size_0.2 size_0.3 size_0.4 size_0.6 size_0.7 size_0.9
#1 1 0 0 4 0 0 0 0
#2 2 0 0 4 4 0 0 0
#3 3 0 0 4 4 14 0 0
#4 4 0 0 4 4 14 0 8
#5 5 0 15 4 4 14 0 8
#6 6 0 15 4 4 14 0 18
#7 7 0 15 4 4 14 0 32
#8 8 0 15 4 4 14 20 32
#9 9 0 15 4 4 22 20 32
#10 10 16 15 4 4 22 20 32

tagging windows around events within data.frame

I have a data.frame with a factor identifying events
year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0
And I would need a counter-type identifying a given window around the events. The result should look like this (for a window that is, for example, 3 periods around the event):
year event window
1 0
2 0
3 0
4 0
5 0
6 0 -3
7 0 -2
8 0 -1
9 1 0
10 0 1
11 0 2
12 0 3
13 0
14 0 -3
15 0 -2
16 0 -1
17 1 0
18 0 1
19 0 2
20 0 3
Any guidance on how to implement this within a function would be appreciated. You can copy the data. frame, pasting the block above in "..." here:
dt <- read.table( text="...", , header = TRUE )
Assuming there is no overlapping, you can use on of my favourite base functions, filter:
DF <- read.table(text="year event
1 0
2 0
3 0
4 0
5 0
6 0
7 0
8 0
9 1
10 0
11 0
12 0
13 0
14 0
15 0
16 0
17 1
18 0
19 0
20 0", header=TRUE)
DF$window <- head(filter(c(rep(0, 3), DF$event, rep(0, 3)),
filter=-3:3)[-(1:3)], -3)
DF$window[DF$window == 0 & DF$event==0] <- NA
# year event window
# 1 1 0 NA
# 2 2 0 NA
# 3 3 0 NA
# 4 4 0 NA
# 5 5 0 NA
# 6 6 0 -3
# 7 7 0 -2
# 8 8 0 -1
# 9 9 1 0
# 10 10 0 1
# 11 11 0 2
# 12 12 0 3
# 13 13 0 NA
# 14 14 0 -3
# 15 15 0 -2
# 16 16 0 -1
# 17 17 1 0
# 18 18 0 1
# 19 19 0 2
# 20 20 0 3

Finding the count of Interactions between Members located in the Dataset

I have a pass traffic data which shows the pass traffic between Members, here's the sample dataset
It shows the Interactions between Members in consecutive rows. I want to count that interactions, and obtain a new dataset which shows how many interactions occured between Members for Each Member, the direction doesn't matters
For example:
between 26 and 11 = X
between 26 and 27 = Y
I just can't figure it out which function I can use and how can I write a code for this calculation. Thanks
You could use the rollaply function from the zoo package to find all interactions. The frequency of these interactions could be calculated using table. (I assume your object is called dat.)
library(zoo)
table(as.data.frame(rollapply(dat[[1]], 2, sort)))
The result:
V2
V1 4 8 10 11 13 17 19 25 26 27 53
4 2 13 17 1 2 5 6 3 1 9 4
8 0 2 14 11 10 4 5 0 13 13 11
10 0 0 3 9 7 2 4 2 8 11 8
11 0 0 0 1 6 5 4 4 5 4 25
13 0 0 0 0 0 1 3 5 7 9 8
17 0 0 0 0 0 0 1 1 1 5 5
19 0 0 0 0 0 0 1 1 1 5 4
25 0 0 0 0 0 0 0 0 5 8 5
26 0 0 0 0 0 0 0 0 1 5 3
27 0 0 0 0 0 0 0 0 0 0 1
53 0 0 0 0 0 0 0 0 0 0 1
The lower triangular part of the matrix contains zeros only since the direction does not matter.
If you are not interested in interactions between the same values, use the following command:
table(as.data.frame(rollapply(rle(dat[[1]])$values, 2, sort)))
V2
V1 8 10 11 13 17 19 25 26 27 53
4 13 17 1 2 5 6 3 1 9 4
8 0 14 11 10 4 5 0 13 13 11
10 0 0 9 7 2 4 2 8 11 8
11 0 0 0 6 5 4 4 5 4 25
13 0 0 0 0 1 3 5 7 9 8
17 0 0 0 0 0 1 1 1 5 5
19 0 0 0 0 0 0 1 1 5 4
25 0 0 0 0 0 0 0 5 8 5
26 0 0 0 0 0 0 0 0 5 3
27 0 0 0 0 0 0 0 0 0 1

Resources