I have a time series data. I would like to group and number rows when column "soak" > 3600. The first row when soak > 3600 is numbered as 1, and the consecutive rows are numbered as 1 too until another row met the condition of soak > 3600. Then that row and consequent rows are numbered as 2 until the third occurrence of soak > 3600.
A small sample of my data and the code I tried is also provided.
My code did the count, but seems using the ave() gave me some decimal numbers... Is there a way to output integer?
starts <- structure(list(datetime = structure(c(1440578907, 1440579205,
1440579832, 1440579885, 1440579926, 1440579977, 1440580044, 1440580106,
1440580195, 1440580256, 1440580366, 1440580410, 1440580476, 1440580529,
1440580931, 1440580966, 1440587753, 1440587913, 1440587933, 1440587954
), class = c("POSIXct", "POSIXt"), tzone = ""), soak = c(NA,
70L, 578L, 21L, 2L, 41L, 14L, 16L, 32L, 9L, 45L, 20L, 51L, 25L,
364L, 4L, 6764L, 20L, 4L, 5L)), row.names = c(NA, -20L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x000000000a4d1ef0>)
starts$trip <- with(starts, ave(tdiff, cumsum(replace(soka, NA, 10000) > 3600)))
Using dplyr
library(dplyr)
starts %>% mutate(trip = cumsum(replace(soak, is.na(soak), 1) > 3600))
And with base R
starts$trip = with(starts, ave(soak, FUN=function(x) cumsum(replace(x, is.na(x), 1) > 3600)))
Related
structure(list(`total primary - yes RS` = c(0L, 138L, 101L, 86L,
118L), `total primary - no RS` = c(0L, 29L, 39L, 35L, 38L), `total secondary- yes rs` = c(0L,
6L, 15L, 3L, 15L), `total secondary- no rs` = c(0L, 0L, 7L, 1L,
2L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
I had previously asked for a line of code that could run a chisquare for each of four rows included
https://stackoverflow.com/questions/66750999/with-r-i-would-like-to-loop-through-each-row-and-create-corresponding-chisquare/66751018#66751018
Though the script worked it only worked because the four rows were able to run through the script.
library(broom)
library(dplyr)
apply(df, 1, function(x) tidy(chisq.test(matrix(x, ncol = 2)))) %>%
bind_rows
I now have a line that has zero and when i run the same script i get
Error in stats::chisq.test(x, y, ...) :
at least one entry of 'x' must be positive
I tried to do something using tryCatch(), this way
tryCatch(apply(df, 1, function(x) tidy(chisq.test(matrix(x, ncol = 2))))) %>%
bind_rows
but it did not work. Ultimately the dataset has a bunch of rows like this I would like a scenario where the script recognizes that it isn't only in row 1, but in multiple rows like 5,23,67 and so on.
I am not sure I am following your code/data exactly, but what if you move your tryCatch statement inside the apply statement like so: apply(df, 2, function(x) tryCatch(tidy(chisq.test(matrix(x, ncol = 2))))) %>% bind_rows? Does that help at all?
Simple example with R on mydata:
l=structure(list(dat = structure(1:9, .Label = c("01.01.2016",
"02.01.2016", "03.01.2016", "04.01.2016", "05.01.2016", "06.01.2016",
"07.01.2016", "08.01.2016", "09.01.2016"), class = "factor"),
lpt = c(94L, 3L, 30L, 92L, 20L, 80L, 20L, 190L, 52L)), .Names = c("dat",
"lpt"), class = "data.frame", row.names = c(NA, -9L))
l=ts(l)
spectrum(l)
R returned plot with periodogram
On this periodogram we can see bursts of values (two bursts 0.23, 0.45).
How should the values of the x-axis and the у-axis be reduced to a dataframe, but only for those values on the x axis that are bursts?
Second question:
Can these values be displayed not in frequencies, but in absolute, original units(dat,lpt)?
structure(list(PROD_DATE = structure(c(1465876800, 1465963200,
1466049600, 1466136000, 1466222400, 1466308800, 1466395200, 1466481600,
1466568000, 1466654400), class = c("POSIXct", "POSIXt"), tzone = ""),
FILENUM = c(51922L, 51922L, 51922L, 51922L, 51922L, 51922L,
51922L, 51922L, 51922L, 51922L), CHOKE_SETTING = c(16L, 18L,
50L, 40L, 30L, 23L, 29L, 32L, 35L, 30L)), .Names = c("PROD_DATE",
"FILENUM", "CHOKE_SETTING"), row.names = c(NA, -10L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "FILENUM", drop = TRUE, indices = list(
0:9), group_sizes = 10L, biggest_group_size = 10L, labels = structure(list(
FILENUM = 51922L), row.names = c(NA, -1L), class = "data.frame", vars = "FILENUM", drop = TRUE, .Names = "FILENUM"))
df <- df %>% group_by(FILENUM) %>% arrange(PROD_DATE) %>%
mutate(DAYS_ON = row_number())
I'm using the code above to start numbering the rows of the dataset to count days since the start. Rather than using Date-time variable in prod_date.
I am unsure how to add another column that counts days since the occurrence of a max value in a different column. It should start counting the first row at the value of 50. Previous rows would either be NA or 0
I have a data.frame x with date and Value
x = structure(list(date = structure(c(1376534700, 1411930800, 1461707400,
1478814300, 1467522000, 1451088000, 1449956100, 1414214400, 1472585400,
1418103000, 1466176500, 1434035100, 1442466300, 1410632100, 1448571900,
1439276400, 1468382700, 1476137400, 1413177300, 1438881300), class = c("POSIXct",
"POSIXt"), tzone = ""), Value = c(44L, 49L, 31L, 99L, 79L, 92L,
10L, 72L, 60L, 41L, 28L, 21L, 67L, 61L, 8L, 65L, 40L, 48L, 53L,
90L)), .Names = c("date", "Value"), row.names = c(NA, -20L), class = "data.frame")
and another list y with only date
y = structure(c(1470356820, 1440168960, 1379245020, 1441582800, 1381753740
), class = c("POSIXct", "POSIXt"), tzone = "")
Before I try to do it with a loop, I wanted to find out if there is a quick way (or packages) to lookup Value from the closest date in x for dates in y? The goal is to find out a date in x that is closest to the date in y and obtain the corresponding Value.
The desired output (got from Excel VLOOKUP, so may not be perfect) would be something like:
output = structure(list(y = structure(c(1470356820, 1440168960, 1379245020,
1441582800, 1381753740), class = c("POSIXct", "POSIXt"), tzone = ""),
Value = c(40, 65, 44, 65, 44)), .Names = c("y", "Value"), row.names = c(NA,
-5L), class = "data.frame")
sapply(y, function(z) x$Value[which.min(abs(x$date - z))])
# [1] 40 65 44 67 44
Using data.table you can join to the nearest value
library(data.table)
x <- as.data.table(x)
y <- data.table(date=y)
res <- x[y, on='date', roll='nearest']
This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
I am using the dplyr to make a sumIF function on my data frame. However, it does not give me the desired output:
> dput(sys)
structure(list(NUMERIC = c(244L, 24L, 1L, 2L, 4L, 111L, 23L,
2L, 3L, 4L, 24L), VAL = c("FALSE", "FALSE", "TES", "TEST", "TRUE",
"TRUE", "TRUE", "asdfs", "asdfs", "safd", "sd"), IDENTIFIER = c(99L,
99L, 98L, 98L, 99L, 99L, 99L, 13L, 13L, 99L, 12L)), .Names = c("NUMERIC",
"VAL", "IDENTIFIER"), row.names = c(NA, 11L), class = c("grouped_dt",
"tbl_dt", "tbl", "grouped_dt", "tbl_dt", "tbl", "data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000000100788>, sorted = c("VAL",
"IDENTIFIER"), vars = list(VAL, IDENTIFIER))
>
>
> sys <- group_by(sys, VAL, IDENTIFIER)
> df.summary <- summarise(sys,
+ numeric = sum(NUMERIC)
+ )
>
> (df.summary)
numeric
1 442
My desired result should look like that:
Any recommendation as to what I am doing wrong?
This could occur when you have plyr loaded along with dplyr. You can either do this on a new R session or use
dplyr::summarise(sys,
numeric = sum(NUMERIC)
)