Two (supposedly) identical date objects in R are not equal? - r

I have a simple question. I have two Date objects in R that are supposed to be identical (they have the same value and class), but R is saying they are not equal. I am running on linux though I get the same result on a windows machine. Why is this happening?
code:
start=as.Date("2014-12-31")
finish=as.Date("2014-11-28")
dates = seq(start,finish,length=6)
christmasEve = as.Date("2014-12-24")
print(dates[2])
print(christmasEve)
print(class(dates[2]))
print(class(christmasEve))
(christmasEve==dates[2])
output:
[1] "2014-12-24"
[1] "2014-12-24"
[1] "Date"
[1] "Date"
[1] FALSE
Any help would be greatly appreciated!
-Paul

The problem is that you are dividing a number of days that is not a multiple of six by six. Check out:
as.numeric(dates)
# [1] 16435.0 16428.4 16421.8 16415.2 16408.6 16402.0
start - finish
# Time difference of 33 days

Since you are creating the dates as a sequence the dates are not exact round numbers.
> as.numeric(dates)
[1] 16435.0 16428.4 16421.8 16415.2 16408.6 16402.0
> as.numeric(christmasEve)
[1] 16428
> as.character(christmasEve) == as.character(dates[2])
[1] TRUE

It is not possible to test your code as there is no sampleRate. I assumed that sampleRate is 6. You could compare your dates with the code below:
all(as.character(christmasEve) == as.character(dates[2]))
The whole things should work like that
> sampleRate <- 6
>
> start=as.Date("2014-12-31")
> finish=as.Date("2014-11-28")
> dates = seq(start,finish,length=sampleRate)
> christmasEve = as.Date("2014-12-24")
> print(dates[2])
[1] "2014-12-24"
> print(christmasEve)
[1] "2014-12-24"
> print(class(dates[2]))
[1] "Date"
> print(class(christmasEve))
[1] "Date"
> (christmasEve==dates[2])
[1] FALSE
>
> all(christmasEve == dates[2])
[1] FALSE
> all(as.character(christmasEve) == as.character(dates[2])
+ )
[1] TRUE

Related

R calculates wrong?

How can that be?
> mode(daten[1,16])
[1] "numeric"
> mode(weku)
[1] "numeric"
>
> weku
[1] 10.47855
> daten[1,16]
[1] 814995955
> daten[1,16]/weku
[1] 77777557
>
> 814995955/10.47855
[1] 77777551
>
I don't understand this. How can I get the correct calculation?
daten[1,16]/weku is correct.
R does not display all of the decimal values it stores internally. What is printed on the console is controlled by options("digits").
For example, compare print(pi), print(pi, digits=10), and print(pi, digits=22).

Is their a way to find the index of a particular value from a data frame in R?

*This is the actual problem
Q. Write a code to print the price for 3bedroom houses (each house not combined).
new_data = read.csv("LR.csv")
count = 0
index = 1
for(x in new_data$bedrooms){
if(x == 3){
count = count+1
print(new_data$price[index])
index = index +1
}
}
print(count)
Result of this code is:
[1] 275000
[1] 565000
[1] 460000
[1] 603500
[1] 490600
[1] 1010000
[1] 5e+05
[1] 249000
[1] 235000
[1] 410000
[1] 370000
[1] 360000
[1] 1410000
[1] 298000
[1] 485000
[1] 4e+05
[1] 580000
[1] 355000
[1] 650000
[1] 261490
[1] 347000
[1] 485000
[1] 601000
And many more like this. But I want to find exactly which property is it by either it's index number or id.
For the file please reply "file" and give your contact method like mail id or SNS id. I will send you the file.
Please help. Thanks in advance.
You could use which() to get the row indices of the rows fullfilling your condition:
idx <- which(new_data$bedrooms == 3)
print(new_data$price[idx]) # just printing the price
print(new_data[idx,] # printing the whole row
Or you could just directly get the results with slicing based on a condition:
print(new_data$price[new_data$bedrooms == 3] # just printing the price
print(new_data[new_data$bedrooms == 3,] # printing the whole row

Is there a way to equalise two different datasets in R?

I have the first dataset called exprs:
> class(exprs)
[1] "matrix"
> dim(exprs)
[1] 191812 89
My second dataset is called pData:
> class(pData)
[1] "data.frame"
> dim(pData)
[1] 89 3
However when I run:
all(rownames(pData)==colnames(exprs))
[1] FALSE
It results in FALSE. I need the final output to be TRUE.
Is this because one class = data.frame while the other class=matrix?

Looping over dates with R

I need to write some code in R that builds a string by looping over dates and I cant' seem to find an example of this in my books or by Googling. Basically:
for theDate = 1Jan14 to 31Dec14{
"http://website.com/api/" + theDate
}
I thought about creating an input file that held the dates, but that seems inelegant.Does anybody know of a better solution?
This doesn't consume that much memory and doesn't need the julian function:
start <- as.Date("01-08-14",format="%d-%m-%y")
end <- as.Date("08-09-14",format="%d-%m-%y")
theDate <- start
while (theDate <= end)
{
print(paste0("http://website.com/api/",format(theDate,"%d%b%y")))
theDate <- theDate + 1
}
.
[1] "http://website.com/api/01Aug14"
[1] "http://website.com/api/02Aug14"
[1] "http://website.com/api/03Aug14"
[1] "http://website.com/api/04Aug14"
[1] "http://website.com/api/05Aug14"
[1] "http://website.com/api/06Aug14"
[1] "http://website.com/api/07Aug14"
[1] "http://website.com/api/08Aug14"
[1] "http://website.com/api/09Aug14"
[1] "http://website.com/api/10Aug14"
[1] "http://website.com/api/11Aug14"
[1] "http://website.com/api/12Aug14"
[1] "http://website.com/api/13Aug14"
[1] "http://website.com/api/14Aug14"
[1] "http://website.com/api/15Aug14"
[1] "http://website.com/api/16Aug14"
[1] "http://website.com/api/17Aug14"
[1] "http://website.com/api/18Aug14"
[1] "http://website.com/api/19Aug14"
[1] "http://website.com/api/20Aug14"
[1] "http://website.com/api/21Aug14"
[1] "http://website.com/api/22Aug14"
[1] "http://website.com/api/23Aug14"
[1] "http://website.com/api/24Aug14"
[1] "http://website.com/api/25Aug14"
[1] "http://website.com/api/26Aug14"
[1] "http://website.com/api/27Aug14"
[1] "http://website.com/api/28Aug14"
[1] "http://website.com/api/29Aug14"
[1] "http://website.com/api/30Aug14"
[1] "http://website.com/api/31Aug14"
[1] "http://website.com/api/01Sep14"
[1] "http://website.com/api/02Sep14"
[1] "http://website.com/api/03Sep14"
[1] "http://website.com/api/04Sep14"
[1] "http://website.com/api/05Sep14"
[1] "http://website.com/api/06Sep14"
[1] "http://website.com/api/07Sep14"
[1] "http://website.com/api/08Sep14"
>
You can use
> dates <- seq(as.Date("2014-01-01"), as.Date("2014-12-31"), by=1)
to generate a vector of consecutive days. What you want to do with this is not entirely clear from your pseudo-code, but you can iterate directly over the vector (which is generally not what you want in R)
> for (d in dates) {
# Code goes here.
}
The comment-solution by #Roland will give you a vector of the form:
> paste0("http://website.com/api/", dates)
[1] "http://website.com/api/2014-01-01" "http://website.com/api/2014-01-02"
[3] "http://website.com/api/2014-01-03" "http://website.com/api/2014-01-04"
[5] "http://website.com/api/2014-01-05" "http://website.com/api/2014-01-06"
...
Of course after I ask the question I happen to find this.
days <- seq(from=as.Date('2011-02-01'), to=as.Date("2011-03-02"),by='days' )
for ( i in seq_along(days) )
{
print(paste(days[i],"T12:00:00", sep=""))
}
You could translate your date into julian days and then write a loop based on the julian days.
To convert to julian days you can use the code described here
And then you could write code using the the julian days like:
tmp <- as.POSIXlt("1Jan14", format = "%d%b%y")
strdate <- julian(tmp)
tmp <- as.POSIXlt("31Dec14", format = "%d%b%y")
enddate <- julian(tmp)
for (theDate in strdate:enddate){
paste ("http://website.com/api/", toString(theDate), sep = "")
}
you have to figure out how to convert back. I am not to sure about the julian function. maybe you should also have a look into "yday" of lubridate package.

Why do I get "Error in rbind.zoo(...) : indexes overlap" when merging two zoo objects?

I have two seemingly identical zoo objects created by the same commands from csv files for different time periods. I try to combine them into one long zoo but I'm failing with "indexes overlap" error. ('merge' 'c' or 'rbind' all produce variants of the same error text.) As far as I can see there are no duplicates and the time periods do not overlap. What am I doing wrong? Am using R version 3.0.1 on Windows 7 64bit if that makes a difference.
> colnames(z2)
[1] "Amb" "HWS" "Diff"
> colnames(t.tmp)
[1] "Amb" "HWS" "Diff"
> max(index(z2))
[1] "2012-12-06 02:17:45 GMT"
> min(index(t.tmp))
[1] "2012-12-06 03:43:45 GMT"
> anyDuplicated(c(index(z2),index(t.tmp)))
[1] 0
> c(z2,t.tmp)
Error in rbind.zoo(...) : indexes overlap
>
UPDATE: In trying to make a reproducible case I've concluded this is an implementation error due to the large number of rows I'm dealing with: it fails if the final result is more than 311434 rows long.
> nrow(c(z2,head(t.tmp,n=101958)))
Error in rbind.zoo(...) : indexes overlap
> nrow(c(z2,head(t.tmp,n=101957)))
[1] 311434
# but row 101958 inserts fine on its own so its not a data problem.
> nrow(c(z2,tail(head(t.tmp,n=101958),n=2)))
[1] 209479
I'm sorry but I dont have the R scripting skills to produce a zoo of the critical length, hopefully someone might be able to help me out..
UPDATE 2- Responding to Jason's suggestion.. : The problem is in the MATCH but my R skills arent sufficient to know how to interpret it- does it mean MATCH finds a duplicate value in x.t whereas anyDuplicated does not?
> x.t <- c(index(z2),index(t.tmp));
> length(x.t)
[1] 520713
> ix <- ORDER (x.t)
> length(ix)
[1] 520713
> x.t <- x.t[ix]
> length(ix)
[1] 520713
> length(x.t)
[1] 520713
> tx <- table(MATCH(x.t,x.t))
> max(tx)
[1] 2
> tx[which(tx==2)]
311371 311373 311378 311383 311384 311386 311389 311392 311400 311401
2 2 2 2 2 2 2 2 2 2
> anyDuplicated(x.t)
[1] 0
After all the testing and head scratching it seems that the problem I'm having is timezone related. Setting the environment to the same time zone as the original data makes it work just fine.
Sys.setenv(TZ="GMT")
> z3<-rbind(z2,t.tmp)
> nrow(z3)
[1] 520713
Thanks to how to guard against accidental time zone conversion for the inspiration to look in that direction.

Resources