Calculating change from baseline with data in long format - r

Here is a small reproducible example of my data:
> mydata <- structure(list(subject = c(1, 1, 1, 2, 2, 2), time = c(0, 1, 2, 0, 1, 2), measure = c(10, 12, 8, 7, 0, 0)), .Names = c("subject", "time", "measure"), row.names = c(NA, -6L), class = "data.frame")
> mydata
subject time measure
1 0 10
1 1 12
1 2 8
2 0 7
2 1 0
2 2 0
I would like to generate a new variable that is the "change from baseline". That is, I would like
subject time measure change
1 0 10 0
1 1 12 2
1 2 8 -2
2 0 7 0
2 1 0 -7
2 2 0 -7
Is there an easy way to do this, other than looping through all the records programatically or reshaping to wide format first ?

There are many possibilities. My favorites:
library(plyr)
ddply(mydata,.(subject),transform,change=measure-measure[1])
subject time measure change
1 1 0 10 0
2 1 1 12 2
3 1 2 8 -2
4 2 0 7 0
5 2 1 0 -7
6 2 2 0 -7
library(data.table)
myDT <- as.data.table(mydata)
myDT[,change:=measure-measure[1],by=subject]
print(myDT)
subject time measure change
1: 1 0 10 0
2: 1 1 12 2
3: 1 2 8 -2
4: 2 0 7 0
5: 2 1 0 -7
6: 2 2 0 -7
data.table is preferable if your dataset is large.

What about:
mydata$change <- do.call("c", with(mydata, lapply(split(measure, subject), function(x) x - x[1])))
alternatively you could also use the ave function:
with(mydata, ave(measure, subject, FUN=function(x) x - x[1]))
# [1] 0 2 -2 0 -7 -7
or
within(mydata, change <- ave(measure, subject, FUN=function(x) x - x[1]))
# subject time measure change
# 1 1 0 10 0
# 2 1 1 12 2
# 3 1 2 8 -2
# 4 2 0 7 0
# 5 2 1 0 -7
# 6 2 2 0 -7

you can use tapply:
mydata$change<-as.vector(unlist(tapply(mydata$measure,mydata$subject,FUN=function(x){return (x-rep(x[1],length(x)))})));

Related

Recoding by an order in r

I have a data recoding puzzle. Here is how my sample data looks like:
df <- data.frame(
id = c(1,1,1,1,1,1,1, 2,2,2,2,2,2, 3,3,3,3,3,3,3),
scores = c(0,1,1,0,0,-1,-1, 0,0,1,-1,-1,-1, 0,1,0,1,1,0,1),
position = c(1,2,3,4,5,6,7, 1,2,3,4,5,6, 1,2,3,4,5,6,7),
cat = c(1,1,1,1,1,0,0, 1,1,1,0,0,0, 1,1,1,1,1,1,1))
id scores position cat
1 1 0 1 1
2 1 1 2 1
3 1 1 3 1
4 1 0 4 1
5 1 0 5 1
6 1 -1 6 0
7 1 -1 7 0
8 2 0 1 1
9 2 0 2 1
10 2 1 3 1
11 2 -1 4 0
12 2 -1 5 0
13 2 -1 6 0
14 3 0 1 1
15 3 1 2 1
16 3 0 3 1
17 3 1 4 1
18 3 1 5 1
19 3 0 6 1
20 3 1 7 1
There are three ids in the dataset and rows were ordered by a positon variable. For each id, the first row after the scores start by -1 needs to be 0, and the cat variable needs to be 1. For example, for id=1, the first row would be 6th position and in that row, score should be 0 and the cat variable needs to 1. For those ids do not have scores=-1, I keep them as they are.
The desired output should look like below:
id scores position cat
1 1 0 1 1
2 1 1 2 1
3 1 1 3 1
4 1 0 4 1
5 1 0 5 1
6 1 0 6 1
7 1 -1 7 0
8 2 0 1 1
9 2 0 2 1
10 2 1 3 1
11 2 0 4 1
12 2 -1 5 0
13 2 -1 6 0
14 3 0 1 1
15 3 1 2 1
16 3 0 3 1
17 3 1 4 1
18 3 1 5 1
19 3 0 6 1
20 3 1 7 1
Any recommendations??
Thanks
This may be what you are after
df %>%
group_by(id) %>%
mutate(i = which(scores == -1)[1]) %>% # find the first row == -1
mutate(scores = case_when(position == i & scores !=0 ~ 0, T ~ scores), # update the score using position & i
cat = ifelse(scores == -1,0,1)) %>% # then update cat
select (-i) # remove I
After trying a few things and getting ideas from #Ricky and #e.matt, I came up with a solution.
df %>%
filter(scores == -1) %>% # keep cases where var = 1
distinct(id, .keep_all = T) %>% # keep distinct cases based on group
mutate(first = 1) %>% # create first column
right_join(df, by=c("id","scores","position","cat")) %>% # join back original dataset
mutate(first = coalesce(first, 0)) %>% # replace NAs with 0
mutate(scores = case_when(
first == 1 ~ 0,
TRUE~scores)) %>%
mutate(cat = case_when(
first == 1 ~ 1,
TRUE~cat))
This provides my desired output.
id scores position cat first
1 1 0 1 1 0
2 1 1 2 1 0
3 1 1 3 1 0
4 1 0 4 1 0
5 1 0 5 1 0
6 1 0 6 1 1
7 1 -1 7 0 0
8 2 0 1 1 0
9 2 0 2 1 0
10 2 1 3 1 0
11 2 0 4 1 1
12 2 -1 5 0 0
13 2 -1 6 0 0
14 3 0 1 1 0
15 3 1 2 1 0
16 3 0 3 1 0
17 3 1 4 1 0
18 3 1 5 1 0
19 3 0 6 1 0
20 3 1 7 1 0
here is a data.table oneliner
library( data.table )
setDT(df)
df[ df[, .(cumsum( scores == -1 ) == 1), by = .(id)]$V1, `:=`( scores = 0, cat = 1) ]
# id scores position cat
# 1: 1 0 1 1
# 2: 1 1 2 1
# 3: 1 1 3 1
# 4: 1 0 4 1
# 5: 1 0 5 1
# 6: 1 0 6 1
# 7: 1 -1 7 0
# 8: 2 0 1 1
# 9: 2 0 2 1
# 10: 2 1 3 1
# 11: 2 0 4 1
# 12: 2 -1 5 0
# 13: 2 -1 6 0
# 14: 3 0 1 1
# 15: 3 1 2 1
# 16: 3 0 3 1
# 17: 3 1 4 1
# 18: 3 1 5 1
# 19: 3 0 6 1
# 20: 3 1 7 1
You could do something along these lines using the dplyr package:
library(dplyr)
df = mutate(df, cat = ifelse(scores == -1, 1, cat),
scores = ifelse(scores == -1, 0, scores))
Using the mutate() function, I am re-assigning the values for the scores and cat fields according to ifelse() conditional statements. For scores, if the score is -1, the value is replaced by 0, otherwise it keeps the score as is. For cat, it also checks if scores is equal to -1, but would assign a value of 1 when the condition is met, or the already existing value of cat when the condition is not met.
EDIT
After our discussion in the comments, I think something along these lines should be helpful (you may have to modify the logic since I don't exactly follow what the desired output is here):
for(i in 1:nrow(df)){
# Check if score is -1
if(df[i, 'scores'] == -1){
# Update values for the next row
df[i+1, 'scores'] <- 0
df[i+1, 'cat'] <- 1
}
}
Sorry that I don't really follow the desired output, hopefully this is helpful in getting you to your answer!

How can I create a new variable which identifies rows where another variable changes sign?

I have a question regarding data preparation. I have the following data set (in long format; one row per measurement point, therefore several rows per person):
dd <- read.table(text=
"ID time
1 -4
1 -3
1 -2
1 -1
1 0
1 1
2 -3
2 -1
2 2
2 3
2 4
3 -3
3 -2
3 -1
4 -1
4 1
4 2
4 3
5 0
5 1
5 2
5 3
5 4", header=TRUE)
Now I would like to create a new variable that has a 1 in the row, in which a sign change on the time variable happens for the first time for this person, and a 0 in all other rows. If a person has only negative values on time, the should not be any 1 on the new variable. For a person that has only positive values on time, the first row should have a 1 on the new variable and all other rows should be coded with 0. For my example above the new data frame should look like this:
dd <- read.table(text=
"ID time new.var
1 -4 0
1 -3 0
1 -2 0
1 -1 0
1 0 1
1 1 0
2 -3 0
2 -1 0
2 2 1
2 3 0
2 4 0
3 -3 0
3 -2 0
3 -1 0
4 -1 0
4 1 1
4 2 0
4 3 0
5 0 1
5 1 0
5 2 0
5 3 0
5 4 0", header=TRUE)
Does anyone know how to do this? I thought about using dplyr and group_by, however I am pretty new to R and did not make it. Any help is much appreciated!
There are 2 different operations you want done to create new.var, so you need to do them in 2 steps. I'll break this into 2 separate mutate calls for simplicity, but you can put both of them into the same mutate
First, we group by ID and then find the rows where the sign changes. We need to use time >= 0 instead of sign as recommended in this answer: R identifying a row prior to a change in sign because you want a sign change to be counted only when going from -1 <-> 0, not from 0 <-> 1:
library(tidyverse)
dd2 <- dd %>%
group_by(ID) %>%
mutate(new.var = as.numeric((time >= 0) != (lag(time) >= 0)))
dd2
# A tibble: 23 x 3
# Groups: ID [5]
ID time new.var
<int> <int> <dbl>
1 1 -4 NA
2 1 -3 0
3 1 -2 0
4 1 -1 0
5 1 0 1
6 1 1 0
7 2 -3 NA
8 2 -1 0
9 2 2 1
10 2 3 0
# … with 13 more rows
Then we use case_when to modify the first row based on your desired rules. Due to the way lag works, the first row will always have NA (since there is no previous row to look at) which makes it a good way to pick out that first row to change it based on the time values in that group:
dd3 <- dd2 %>%
mutate(new.var = case_when(
!is.na(new.var) ~ new.var,
all(time >= 0) ~ 1,
TRUE ~ 0)
)
print(dd3, n = 100) #n=100 because tibbles are truncated to 10 rows by print
# A tibble: 23 x 3
# Groups: ID [5]
ID time new.var
<int> <int> <dbl>
1 1 -4 0
2 1 -3 0
3 1 -2 0
4 1 -1 0
5 1 0 1
6 1 1 0
7 2 -3 0
8 2 -1 0
9 2 2 1
10 2 3 0
11 2 4 0
12 3 -3 0
13 3 -2 0
14 3 -1 0
15 4 -1 0
16 4 1 1
17 4 2 0
18 4 3 0
19 5 0 1
20 5 1 0
21 5 2 0
22 5 3 0
23 5 4 0
You can try this:
library(dplyr)
dd %>% left_join(dd %>% group_by(ID) %>% summarise(index=min(which(time>=0)))) %>%
group_by(ID) %>% mutate(new.var=ifelse(row_number(ID)==index,1,0)) %>% select(-index)-> DF
# A tibble: 23 x 3
# Groups: ID [5]
ID time new.var
<int> <int> <dbl>
1 1 -4 0
2 1 -3 0
3 1 -2 0
4 1 -1 0
5 1 0 1
6 1 1 0
7 2 -3 0
8 2 -1 0
9 2 2 1
10 2 3 0
The following ave instruction does what the question asks for.
dd$new.var <- with(dd, ave(time, ID, FUN = function(x){
y <- integer(length(x))
if(any(x >= 0)) y[which.max(x[1]*x <= 0)] <- 1L
y
}))
dd
# ID time new.var
#1 1 -4 0
#2 1 -3 0
#3 1 -2 0
#4 1 -1 0
#5 1 0 1
#6 1 1 0
#7 2 -3 0
#8 2 -1 0
#9 2 2 1
#10 2 3 0
#11 2 4 0
#12 3 -3 0
#13 3 -2 0
#14 3 -1 0
#15 4 -1 0
#16 4 1 1
#17 4 2 0
#18 4 3 0
#19 5 0 1
#20 5 1 0
#21 5 2 0
#22 5 3 0
#23 5 4 0
If the expected output is renamed dd2 then
identical(dd, dd2)
#[1] TRUE

Identify and label repeated data in a series

I'm trying to identify cases in a dataset where a value occurs multiple times in a row, and once this is picked up, a row to the side of the nth occurrence confirms this with '1'.
df<-data.frame(user=c(1,1,1,1,2,3,3,3,4,4,4,4,4,4,4,4),
week=c(1,2,3,4,1,1,2,3,1,2,3,4,5,6,7,8),
updated=c(1,0,1,1,1,1,1,1,1,1,0,0,0,0,1,1))
In this case, users are performing a task. If the task is performed, '1' appears for that week, if not '0' appears.
Is it possible, in the event that four or more 0s are encountered in a row, that an indicator is mutated into a new column identifying that this sequence has occurred? Something like this:
user week updated warning
1 1 1 1 0
2 1 2 0 0
3 1 3 1 0
4 1 4 1 0
5 2 1 1 0
6 3 1 1 0
7 3 2 1 0
8 3 3 1 0
9 4 1 1 0
10 4 2 1 0
11 4 3 0 0
12 4 4 0 0
13 4 5 0 0
14 4 6 0 1
15 4 7 1 0
16 4 8 1 0
Thanks!
Edit:
Apologies and thanks to #akrun for helping with this.
Additional example below, where on the 4th occurring missed entry equalling to '1', the warning column is updated to show the event, where a trigger will run off of that data.
df<-data.frame(user=c(1,1,1,1,2,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,7,7,7,7,7,7,7,7),
week=c(1,2,3,4,1,1,2,3,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,8,1,2,3,4,5,6,7,1,2,3,4,5,6,7,8),
missed=c(0,1,0,0,0,0,0,0,0,0,1,1,1,1,0,0,0,1,0,1,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,1,1,0,1))
user week missed warning
1 1 1 0 0
2 1 2 1 0
3 1 3 0 0
4 1 4 0 0
5 2 1 0 0
6 3 1 0 0
7 3 2 0 0
8 3 3 0 0
9 4 1 0 0
10 4 2 0 0
11 4 3 1 0
12 4 4 1 0
13 4 5 1 0
14 4 6 1 1
15 4 7 0 0
16 4 8 0 0
17 5 1 0 0
18 5 2 1 0
19 5 3 0 0
20 5 4 1 0
21 5 5 0 0
22 5 6 0 0
23 5 7 0 0
24 5 8 0 0
25 6 1 0 0
26 6 2 1 0
27 6 3 1 0
28 6 4 1 0
29 6 5 1 1
30 6 6 1 0
31 6 7 0 0
32 7 1 0 0
33 7 2 0 0
34 7 3 0 0
35 7 4 0 0
36 7 5 1 0
37 7 6 1 0
38 7 7 0 0
39 7 8 1 0
An option would be to use rle to create the warning. Grouped by 'user', create the 'warning based by checking therun-length-id (rle) of 'updated', it would give the adjacent similar 'values' and 'lengths' as a list, create a logical condition where values is 0 and lengthsis greater than or equal to 4.
library(dplyr)
library(data.table)
df %>%
group_by(user) %>%
mutate(warning = with(rle(updated), rep(!values & lengths >= 4, lengths))) %>%
group_by(grp = rleid(warning), add = TRUE) %>%
mutate(warning = if(all(warning)) rep(c(0, 1), c(n()-1, 1)) else 0) %>%
ungroup %>%
select(-grp)
# A tibble: 16 x 4
# user week updated warning
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 1 0
# 2 1 2 0 0
# 3 1 3 1 0
# 4 1 4 1 0
# 5 2 1 1 0
# 6 3 1 1 0
# 7 3 2 1 0
# 8 3 3 1 0
# 9 4 1 1 0
#10 4 2 1 0
#11 4 3 0 0
#12 4 4 0 0
#13 4 5 0 0
#14 4 6 0 1
#15 4 7 1 0
#16 4 8 1 0
If we need to flag the group where any have greater than 4 0's then
df %>%
group_by(user) %>%
mutate(warning = with(rle(updated), rep(!values & lengths >= 4, lengths)),
warning = as.integer(any(warning)))
# A tibble: 16 x 4
# Groups: user [4]
# user week updated warning
# <dbl> <dbl> <dbl> <int>
# 1 1 1 1 0
# 2 1 2 0 0
# 3 1 3 1 0
# 4 1 4 1 0
# 5 2 1 1 0
# 6 3 1 1 0
# 7 3 2 1 0
# 8 3 3 1 0
# 9 4 1 1 1
#10 4 2 1 1
#11 4 3 0 1
#12 4 4 0 1
#13 4 5 0 1
#14 4 6 0 1
#15 4 7 1 1
#16 4 8 1 1
I followed a different approach. I numbered sequentially the cases where updated was 0 for each user and releid(updated). If there's a 4, that means that there are 4 consecutive homeworks not done. The warning is thus created where the new vector is equal to 4.
library(data.table)
df[,
warning := {id <- 1:.N;
warning <- as.numeric(id == 4)},
by = .(user,
rleid(updated))][,
warning := ifelse(warning == 1 & updated == 0, 1, 0)][is.na(warning),
warning := 0]
What has been done there
warning := assigns the result of the sequence that is between the {} to warning.
Now, inside the sequence:
id <- 1:.N creates a temporary variable id variable with consecutive numbers for each user and run-length group of updated values.
warning <- as.numeric(id == 4) creates a temporary variable with 1 in case id2 is equal to 4 and zero otherwise.
The by = .(user, rleid(updated)) grouped by both user and run-length values of updated. Of course there were run-length values for updated == 1, so we get rid of them by the ifelse clause. The final [is.na(warning), warning := 0] (notice the chaining) just gets rid of the NA values in the resulting variable.
Data used
> dput(df2)
structure(list(user = c(1, 1, 1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 4,
4, 4, 4, 5, 5, 5, 5, 5), week = c(1, 2, 3, 4, 1, 1, 2, 3, 1,
2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5), updated = c(1, 0, 1, 1,
1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0)), row.names = c(NA,
-21L), class = c("data.table", "data.frame"))
Speed comparisson
I just compared with #akrun's answer:
set.seed(1)
df <- data.table(user = sample(1:10, 100, TRUE), updated = sample(c(1, 0), 100, TRUE), key = "user")
df[, week := 1:.N, by = user]
akrun <- function(df4){
df4 %>%
group_by(user) %>%
mutate(warning = with(rle(updated), rep(!values & lengths >= 4, lengths))) %>%
group_by(grp = rleid(warning), add = TRUE) %>%
mutate(warning = if(all(warning)) rep(c(0, 1), c(n()-1, 1)) else 0) %>%
ungroup %>%
select(-grp)
}
pavo <- function(df4){
df4[, warning := {id <- 1:.N; warning <- as.numeric(id == 4)}, by = .(user, rleid(updated))][, warning := ifelse(warning == 1 & updated == 0, 1, 0)][is.na(warning), warning := 0]
}
microbenchmark(akrun(df), pavo(df), times = 100)
Unit: microseconds
expr min lq mean median uq max neval
akrun(df) 1920.278 2144.049 2405.0332 2245.1735 2308.0145 6901.939 100
pavo(df) 823.193 877.061 978.7166 928.0695 991.5365 4905.450 100

Creating a new variable while using subsequent values in r

I have the following data frame:
df1 <- data.frame(id = rep(1:3, each = 5),
time = rep(1:5),
y = c(rep(1, 4), 0, 1, 0, 1, 1, 0, 0, 1, rep(0,3)))
df1
## id time y
## 1 1 1 1
## 2 1 2 1
## 3 1 3 1
## 4 1 4 1
## 5 1 5 0
## 6 2 1 1
## 7 2 2 0
## 8 2 3 1
## 9 2 4 1
## 10 2 5 0
## 11 3 1 0
## 12 3 2 1
## 13 3 3 0
## 14 3 4 0
## 15 3 5 0
I'd like to create a new indicator variable that tells me, for each of the three ids, at what point y = 0 for all subsequent responses. In the example above, for ids 1 and 2 this occurs at the 5th time point, and for id 3 this occurs at the 3rd time point.
I'm getting tripped up on id 2, where y = 1 at time point 2, but then goes back to one -- I'd like to the indicator variable to take subsequent time points into account.
Essentially, I'm looking for the following output:
df1
## id time y new_col
## 1 1 1 1 0
## 2 1 2 1 0
## 3 1 3 1 0
## 4 1 4 1 0
## 5 1 5 0 1
## 6 2 1 1 0
## 7 2 2 0 0
## 8 2 3 1 0
## 9 2 4 1 0
## 10 2 5 0 1
## 11 3 1 0 0
## 12 3 2 1 0
## 13 3 3 0 1
## 14 3 4 0 1
## 15 3 5 0 1
The new_col variable is indicating whether or not y = 0 at that time point and for all subsequent time points.
I would use a little helper function for that.
foo <- function(x, val) {
pos <- max(which(x != val)) +1
as.integer(seq_along(x) >= pos)
}
df1 %>%
group_by(id) %>%
mutate(indicator = foo(y, 0))
# # A tibble: 15 x 4
# # Groups: id [3]
# id time y indicator
# <int> <int> <dbl> <int>
# 1 1 1 1 0
# 2 1 2 1 0
# 3 1 3 1 0
# 4 1 4 1 0
# 5 1 5 0 1
# 6 2 1 1 0
# 7 2 2 0 0
# 8 2 3 1 0
# 9 2 4 1 0
# 10 2 5 0 1
# 11 3 1 0 0
# 12 3 2 1 0
# 13 3 3 0 1
# 14 3 4 0 1
# 15 3 5 0 1
In case you want to consider NA-values in y, you can adjust foo to:
foo <- function(x, val) {
pos <- max(which(x != val | is.na(x))) +1
as.integer(seq_along(x) >= pos)
}
That way, if there's a NA after the last y=0, the indicator will remain 0.
Here is an option using data.table
library(data.table)
setDT(df1)[, indicator := cumsum(.I %in% .I[which.max(rleid(y)*!y)]), id]
df1
# id time y indicator
# 1: 1 1 1 0
# 2: 1 2 1 0
# 3: 1 3 1 0
# 4: 1 4 1 0
# 5: 1 5 0 1
# 6: 2 1 1 0
# 7: 2 2 0 0
# 8: 2 3 1 0
# 9: 2 4 1 0
#10: 2 5 0 1
#11: 3 1 0 0
#12: 3 2 1 0
#13: 3 3 0 1
#14: 3 4 0 1
#15: 3 5 0 1
Based on the comments from #docendodiscimus, if the values are not 0 for 'y' at the end of each 'id', then we can do
setDT(df1)[, indicator := {
i1 <- rleid(y) * !y
if(i1[.N]!= max(i1) & !is.na(i1[.N])) 0L else cumsum(.I %in% .I[which.max(i1)]) }, id]

Replace a column data with another column of data in a data frame while replacing prior instances <0 by 0

I have a data frame
x<-c(1,3,0,2,4,5,0,-2,-5,1,0)
y<-c(-1,-2,0,3,4,5,1,8,1,0,2)
data.frame(x,y)
x y
1 1 -1
2 3 -2
3 0 0
4 2 3
5 4 4
6 5 5
7 0 1
8 -2 8
9 -5 1
10 1 0
11 0 2
I would like to replace the data in column y with data from column x and also replacing in y the instances that where <0 in y and replacing them by 0. This will result in the following data frame
data.frame(x,y)
x y
1 1 0
2 3 0
3 0 0
4 2 2
5 4 4
6 5 5
7 0 0
8 -2 -2
9 -5 -5
10 1 0
11 0 0
Thanks
x<-c(1,3,0,2,4,5,0,-2,-5,1,0)
y<-c(-1,-2,0,3,4,5,1,8,1,0,2)
df <- data.frame(x, y)
df$y <- ifelse(y<0,0,x)
df
# x y
# 1 1 0
# 2 3 0
# 3 0 0
# 4 2 2
# 5 4 4
# 6 5 5
# 7 0 0
# 8 -2 -2
# 9 -5 -5
# 10 1 1
# 11 0 0
In one line:
> df <- transform(data.frame(x,y), y = ifelse(y<0,0,x))
> df
x y
1 1 0
2 3 0
3 0 0
4 2 2
5 4 4
6 5 5
7 0 0
8 -2 -2
9 -5 -5
10 1 1
11 0 0
Note that the resulting data differs from the reference result you provide on record 10. I suspect that this might be because you applied the condition <= 0 rather than < 0? Otherwise the 1 would be carried across from the x field for this record.
Given your x and y vectors, create the data.frame in one swift move:
> data.frame(x, y=ifelse(y < 0, 0, x))
x y
1 1 0
2 3 0
3 0 0
4 2 2
5 4 4
6 5 5
7 0 0
8 -2 -2
9 -5 -5
10 1 1
11 0 0

Resources