R chron %in% comparison only recognizes every second date - r

I am using zoo and chron packages in R to read and transform data. At one point I need to select a part of a chron-indexed zoo object which corresponds to another chron object. Unfortunately, using %in% operator I only get part of the corresponding dates. Here is a MWE that reproduces the error:
library(chron)
library(zoo)
chron1 <- seq(chron("2013-01-01","00:00:00", format=c(dates="y-m-d",times="h:m:s")),
chron("2013-01-01","03:10:00", format=c(dates="y-m-d",times="h:m:s")),by=1./1440.)
x1 <- runif(200)
z1 <- zoo(x1,chron1)
chron10 <- trunc(chron1, "00:10:00")
x10 <- aggregate(z1,chron10,FUN=sum)
which(index(x10) %in% chron1)
The (unexpected) output is:
[1] 1 3 5 7 9 10 12 14 16 18 19

chron objects are floating point so there can be slight differences in what appears to be the same datetime depending on how they were calculated. format them and compare those:
which(format(index(x10)) %in% format(chron1))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
This also works as trunc uses an eps value to ensure that inputs slightly less than one minute are not truncated down a further minute. See ?trunc.times
which(trunc(index(x10), "minutes") %in% trunc(chron1, "minutes"))
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Also see R FAQ 7.31

Related

Get sequence of number by cumsum

how can I get the output with 1 3 6 10 15 20 20 20 these numbers by only using Cumulative Summaries rather than c()?
I know that > cumsum(1:5)
[1] 1 3 6 10 15
I am not sure if I completely understand but maybe -
pmin(cumsum(1:8), 20)
#[1] 1 3 6 10 15 20 20 20
Besides the great answer by #Ronak Shah, you can also use
> replace(u<-cumsum(1:8),u>=20,20)
[1] 1 3 6 10 15 20 20 20
Using base R
v1 <- cumsum(1:8)
v1[v1 >=20] <- 20

Show only even numbers from a data set

I am trying to extract only the even numbers from the "cars" data set.
I know I need to create a new function.
I have come this far:
Is.even = function(x) x %% 2 == 0
When I enter in:
Is.even(cars[1])
It gives me back a logical response. I want to only display the actual even numbers in integer form and hide the odd numbers.
What am I doing wrong?
Apart from #neilfws' suggestion, if you pass your values as a vector you can also use Filter
Filter(Is.even, cars[, 1])
#[1] 4 4 8 10 10 10 12 12 12 12 14 14 14 14 16 16 18 18 18 18 20 20 20 20 20 22 24 24 24 24

How to use apply function instead of for loop if you have multiple if conditions to be excecuted

1st DF:
t.d
V1 V2 V3 V4
1 1 6 11 16
2 2 7 12 17
3 3 8 13 18
4 4 9 14 19
5 5 10 15 20
names(t.d) <- c("ID","A","B","C")
t.d$FinalTime <- c("7/30/2009 08:18:35","9/30/2009 19:18:35","11/30/2009 21:18:35","13/30/2009 20:18:35","15/30/2009 04:18:35")
t.d$InitTime <- c("6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35","6/30/2009 9:18:35")
>t.d
ID A B C FinalTime InitTime
1 1 6 11 16 7/30/2009 08:18:35 6/30/2009 9:18:35
2 2 7 12 17 9/30/2009 19:18:35 6/30/2009 9:18:35
3 3 8 13 18 11/30/2009 21:18:35 6/30/2009 9:18:35
4 4 9 14 19 13/30/2009 20:18:35 6/30/2009 9:18:35
5 5 10 15 20 15/30/2009 04:18:35 6/30/2009 9:18:35
2nd DF:
> s.d
F D E Time
1 10 19 28 6/30/2009 08:18:35
2 11 20 29 8/30/2009 19:18:35
3 12 21 30 9/30/2009 21:18:35
4 13 22 31 01/30/2009 20:18:35
5 14 23 32 10/30/2009 04:18:35
6 15 24 33 11/30/2009 04:18:35
7 16 25 34 12/30/2009 04:18:35
8 17 26 35 13/30/2009 04:18:35
9 18 27 36 15/30/2009 04:18:35
Output to be:
From DF "t.d" I have to calculate the time interval for each row between "FinalTime" and "InitTime" (InitTime will always be less than FinalTime).
Another DF "temp" from "s.d" has to be formed having data only within the above time interval, and then the most recent values of "F","D","E" have to be taken and attached to the 'ith' row of "t.d" from which the time interval was calculated.
Also we have to see if the newly formed DF "temp" has the following conditions true:
here 'j' represents value for each row:
if(temp$F[j] < 35.5) + (temp$D[j] >= 100) >= 1)
{
temp$Flag <- 1
} else{
temp$Flag <- 0
}
Originally I have 3 million rows in the dataframe and 20 columns in each DF.
I have solved the above problem using "for loop" but it obviously takes 2 to 3 days as there are a lot of rows.
(Also if I have to add new columns to the resultant DF if multiple conditions get satisfied on each row?)
Can anybody suggest a different technique? Like using apply functions?
My suggestion is:
use lapply over row indices
handle in the function call your if branches
return either your dataframe or NULL
combine everything with rbind
by replacing lapply with mclapply from the 'parallel' package, your code gets executed in parallel.
resultList <- lapply(1:nrow(t.d), function(i){
do stuff
if(condition){
return(df)
}else{
return(NULL)
}
resultDF <- do.call(rbind, resultList)

Merge values of a factor column

Column data$form contains 170 unique different values, (numbers from 1 to ~800).
I would like to merge some values (e.g with a 10 radius/step).
I need to do this in order to use:
colors = rainbow(length(unique(data$form)))
In a plot and provide a better visual result.
Thank you in advance for your help.
you can use %/% to group them and mean to combine them and normalize to scale them.
# if you want specifically 20 groups:
groups <- sort(form) %/% (800/20)
x <- c(by(sort(form), groups, mean))
x <- normalize(x, TRUE) * 19 + 1
0 1 2 3 4
1.000000 1.971781 2.957476 4.103704 4.948560
5 6 7 8 9
5.950617 7.175309 7.996914 8.953086 9.952263
10 11 12 13 14
10.800705 11.901235 12.888889 13.772291 14.888889
15 16 17 18 19
15.927984 16.864198 17.918519 18.860082 20.000000
You could also use cut. If you use the argument labels=FALSE, you get an integer value:
form <- runif(170, min=1,max=800)
> cut(form, breaks=20)
[1] (518,558] (280,320] (240,280] (121,160] (757,797]
[6] (160,200] (320,359] (598,638] (80.8,121] (359,399]
[7] (121,160] (200,240] ...
20 Levels: (1.18,41] (41,80.8] (80.8,121] (121,160] (160,200] (200,240] (240,280] (280,320] (320,359] (359,399] (399,439] ... (757,797]
> cut(form, breaks=20, labels=FALSE)
[1] 14 8 7 4 20 5 9 16 3 10 4 6 5 18 18 6 2 12
[19] 2 19 13 11 13 11 14 12 17 5 ...
On a side-note, I want you to re-consider plotting with rainbow colours, as it distorts reading the data, cf. Rainbow Color Map (Still) Considered Harmful.

Count of element in data.frame

I have data that illustrates hurricane tracks crossing through a series of "gates". How would I code it to output the GateID, and the count of times that each GateID occurs in the total data frame?
track_id day hour month year rate gate_id pres_inter vmax_inter
9 10 0 7 1 9.6451E-06 2 97809 23.545
9 10 0 7 1 9.6451E-06 17 100170 13.843
10 3 6 7 1 9.6451E-06 2 96662 31.568
13 22 12 8 1 9.6451E-06 1 94449 48.466
13 22 12 8 1 9.6451E-06 17 96749 30.55
16 13 0 8 1 9.6451E-06 4 98702 19.205
16 13 0 8 1 9.6451E-06 16 98585 18.143
19 27 6 9 1 9.6451E-06 9 98838 20.053
header <- read.table(fname_in, nrows=1)
track <- read.table(fname_in, sep=',', skip=1)
colnames(track) <- c("ID", "day", "month", "year", "hour", "rate", "gate_id", "pres_inter", "vmax_inter")
I think I would like to count the occurrence of each gate_id, and also perhaps output the maximum wind per gate (vmax_inter), etc....
Totally reading your mind, since you provide nothing concrete to go on. But if GateID is one of your data frame columns, you can get the count for each unique GateID along with other parameters using count from package plyr.
install.packages("plyr")
library("plyr")
count(mydf, vars = "GateID")
See ?count after installing for further details.
For the 2nd part of your question, see ?aggregate and consider the formula interface. For example,
aggregate(gate_id ~ vmax_inter, data = mydf, FUN = max)
or something similar. By the way, you can combine your two read.table steps with 'read.csv`

Resources