Compute difference between rows in R and setting in zero first difference - r

Hi everybody I am trying to solve a little problem in R. I want to compute the difference between rows in a dataframe in R. My dataframe looks like this:
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
I want to create a new column named diff.var. This column saves the results of differences from rows in variable. One posibble solution is using diff() function. When I used this function I got this:
diff(df$x4)
[1] 2 6 -10 1 0 8 3
That works fine but when I try to apply in my dataframe using df$diff.var=diff(df$x4) I got this:
Error in `$<-.data.frame`(`*tmp*`, "diff.var", value = c(2, 6, -10, 1, :
replacement has 7 rows, data has 8
Due to the fact that the firs row doesn't have a previous row to compute the difference I want to set this in zero. I would like to get something this:
ID x2 x3 x4 diff.var
1 8 11 2 0
2 7 12 4 2
3 6 13 10 6
4 5 14 0 -10
5 4 15 1 1
6 3 16 1 0
7 2 17 9 8
8 1 18 12 3
Where the first element of diff.var is zero due to this element doesn't have a previous element. I would like to build a function to set firts element of diff.var is zero and that makes the differences for the next rows. I wish to create a new dataframe with all variables and diff.var because ID is used por posterior analysis with diff.var. diff() doesn't allow to create this new variable. Thanks for your help.

This question was already asked before in this forum and can be found elsewhere. Anyway, do what Frank suggests
df <- data.frame(ID=1:8, x2=8:1, x3=11:18, x4=c(2,4,10,0,1,1,9,12))
df$vardiff <- c(0, diff(df$x4))
df
ID x2 x3 x4 vardiff
1 1 8 11 2 0
2 2 7 12 4 2
3 3 6 13 10 6
4 4 5 14 0 -10
5 5 4 15 1 1
6 6 3 16 1 0
7 7 2 17 9 8
8 8 1 18 12 3

Related

Arrange a data set in a repeating manner from a reshaped data

I have reshaped the data to long. It has been sorted in ascending order based on one column (as x2 in the below reproducible example) and I want to keep the data in a repeating manner rather than factored. Here is a sample:
set.seed(234)
data<-data.frame(x1=c(1:12),x2=rep(1:3,each=4),x3=runif(12,min=0,max=12))
And I want the format something like this:
x1 x2 x3
1 1 1 6.115445
2 2 2 5.157014
3 3 3 4.793458
4 4 1 9.998710
5 5 2 2.620250
6 6 3 1.825839
7 7 1 5.842854
8 8 2 5.616670
9 9 3 6.511315
10 10 1 9.164444
11 11 2 8.401418
Can you please help me with either what to include in the melt function while converting the data to long format or any other function I should use in rearranging that data.
note:
The above result is to show the desired format, not the exact solution for my data.
EDIT:
Here is head() of my real data:
Date stn Elev Amount
1 2010-01-01 11 0 268.945
2 2010-01-01 11 0 268.396
3 2010-01-01 11 0 267.512
4 2010-01-01 11 0 266.488
5 2010-01-01 11 0 265.558
6 2010-01-01 11 0 265.178
In the actual data, the column Elev contains values like, c("0","100","250","500"...). So you assume that 0 is equivalent to 1 in x2 of the above sample, and so forth for 100, 250....
One method is to use ave as follows:
data[order(ave(data$x3, data$x2, FUN=function(i) 1:length(i)), data$x2),]
x1 x2 x3
1 1 1 8.9474400
5 5 2 0.8029211
9 9 3 11.1328381
2 2 1 9.3805491
6 6 2 7.7375415
10 10 3 3.4107614
3 3 1 0.2404454
7 7 2 11.1526315
11 11 3 6.6686992
4 4 1 9.3130246
8 8 2 8.6117063
12 12 3 6.5724198
In this instance, ave calculates a running count by data$x2, which is then used to sort the data with the order function.
You can also renumber x1 if desired: data$x1 <- 1:nrow(data), which would return your desired result.

Split data when time intervals exceed a defined value

I have a data frame of GPS locations with a column of seconds. How can I split create a new column based on time-gaps? i.e. for this data.frame:
df <- data.frame(secs=c(1,2,3,4,5,6,7,10,11,12,13,14,20,21,22,23,24,28,29,31))
I would like to cut the data frame when there is a time gap between locations of 3 or more seconds seconds and create a new column entitled 'bouts' which gives a running tally of the number of sections to give a data frame looking like this:
id secs bouts
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 10 2
9 11 2
10 12 2
11 13 2
12 14 2
13 20 3
14 21 3
15 22 3
16 23 3
17 24 3
18 28 4
19 29 4
20 31 4
Use cumsum and diff:
df$bouts <- cumsum(c(1, diff(df$secs) >= 3))
Remember that logical values get coerced to numeric values 0/1 automatically and that diff output is always one element shorter than its input.

subsetting a dataframe by a condition in R [duplicate]

This question already has answers here:
Filtering a data frame by values in a column [duplicate]
(3 answers)
Closed 3 years ago.
I have the following data with the ID of subjects.
V1
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
9 2
10 2
11 2
12 2
13 2
14 2
15 2
16 4
17 4
18 4
19 4
20 4
21 4
22 4
23 4
24 4
I want to subset all the rows of the data where V1 == 4. This way I can see which observations relate to subject 4.
For example, the correct output would be
16 4
17 4
18 4
19 4
20 4
21 4
22 4
23 4
24 4
However, the output I'm given after subsetting does not give me the correct rows . It simply gives me.
V1
1 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
I'm unable to tell which observations relate to subject 4, as observations 1:8 are for subject 2.
I've tried the usual methods, such as
condition<- df == 4
df[condition]
How can I subset the data so I'm given back a dataset that shows the correct row numbers for subject 4.
You can also use the subset function:
subset(df,df$V1==4)
I've managed to find a solution since posting.
newdf <- subset(df, V1 == 4).
However i'm still very interested in other solutions to this problems, so please post if you're aware of another method.

R - Conditional replacement of column values in a data frame

I have a data frame which has 2 columns - A & B. I want to replace the values of column B in such a way that, when the VALUE>=5 replace with 1, else replace with 0.
Note - There are 2 conditions to be checked.
X=read.csv("Y:/impdat.csv")
A B
3 16
12 3
1 2
12 9
4 4
5 6
21 1
4 14
3 10
12 1
So after replacing, the data should be
A B
3 1
12 0
1 0
12 1
4 0
5 1
21 0
4 1
3 1
12 0
Sounds simple. But I am unable to implement it.
I tried
ifelse(X$B>=5,1,0)
This only prints the new values, but the original data remains the same.
X$B <- as.integer(X$B >= 5)
will do the trick.
transform(X, B=ifelse(B>=5,1,0))
Got it.
Just had to assign the object.
X$B=ifelse(X$B>=5,1,0)

How can I produce a table into a data.frame?

I printed out the summary of a column variables as such:
Please see below the summary table printed out from R:
I would like to generate it into a data.frame. However, there are too many subject names that it's very difficult to list out all, also, the term "OTHER" with number 31 means that there are 319 subjects which appear only 1 time in the original data.frame.
So, the new data.frame I hope to produce would look like below:
Here is one possible solution.
Table<-table(rpois(100,5))
as.data.frame(Table)
Var1 Freq
1 1 2
2 2 11
3 3 9
4 4 18
5 5 13
6 6 20
7 7 14
8 8 8
9 9 3
10 10 1
11 11 1

Resources