first time question, so if I missed something I apologize:
I imported an excel file into R! using XLconnect, the str() function is as follow:
data.frame': 931 obs. of 5 variables:
$ Media : chr "EEM" "EEM" "EEM" "EEM" ...
$ Month : POSIXct, format: "2014-08-01" "2014-08-01" "2014-08-01" "2014-08-01" ...
$ Request_Row : num 8 25 26 37 38 44 53 62 69 83 ...
$ Total_Click : num 12 9 9 8 8 8 7 7 7 7 ...
$ Match_Type : chr "S" "S" "S" "S" ...
when I use the following sqldf I get no rows selected, anyway to what could be wrong:
sqldf(" select Media, sum(Total_Click) , avg(Request_Row), min(Request_Row) , max(Request_Row), count(distinct(Media)) from All_Data
where Request_Row < 100
and month='2014-09-01'
group by 1,2 order by 2,6 desc ")
<0 rows> (or 0-length row.names)
Thanks for the help
Vj
Its not clear what is intended but the code shown has these problems:
Month is used in the data but month is used in the SQL statement
SQLite has no date or time types and so if you send a POSIXct value to SQLite it will be interpreted as the number of seconds since the UNIX epoch (in GMT time zone). Thus the comparison of the month to a character string won't work. You can convert the number of seconds to yy-mm-dd using the SQLite strftime or date functions. Alternately use a database that has datetime types. sqldf supports the H2 database and it supports date and time types.
The statement is trying to group by Media and sum(Total_Click). Grouping by an aggregated value is not legal although perhaps it could be done by nesting selects depending on what you intended.
Since the statement is grouping by Media the expressoin count(distinct(Media)) fromAll_Data will always be 1 since there can only be one Media in such a group.
You will need to clarify what your intent is but if we drop or fix up the various points we can get this:
sqldf("select
Media,
sum(Total_Click) sum_Total_Click,
avg(Request_Row) avg_Request_Row,
min(Request_Row) min_Request_Row,
max(Request_Row) max_Request_Row
from All_Data
where Request_Row < 100
and date(month, 'unixepoch', 'localtime') = '2014-08-01'
group by 1 order by 2 desc")
which gives:
Media sum_Total_Click avg_Request_Row min_Request_Row max_Request_Row
1 EEM 38 24 8 37
RH2 To use the RH2 package and H2 database instead be sure you have Java and RH2 installed (RH2 includes the H2 database so that does not need to be separately installed) and then:
library(RH2)
library(sqldf)
sqldf("...")
where the ... is replaced with the same SQL statement except the date comparison simplifies to this line:
and month = '2014-08-01'
Data: When posting to the SO R tag please show your data using dput. In this case this was used:
All_Data <-
structure(list(Media = c("EEM", "EEM", "EEM", "EEM"), Month = structure(c(1406865600,
1406865600, 1406865600, 1406865600), class = c("POSIXct", "POSIXt"
), tzone = ""), Request_Row = c(8, 25, 26, 37), Total_Click = c(12,
9, 9, 8), Match_Type = c("S", "S", "S", "S")), .Names = c("Media",
"Month", "Request_Row", "Total_Click", "Match_Type"), row.names = c(NA,
-4L), class = "data.frame")
Update: Misc revisions.
Related
I'm trying to understand why my lubridate mdy() function is returning an error in lapply() to convert dates in a dplyr pipeline. I have used mdy() on other data in a similar method but have yet to see this issue. I am relatively new to R but had been able to troubleshoot other issues until now. I am not very familiar with how to use lapply().
My data is a large .csv of water quality data, which I'm subsetting to simply show the data in question.
library(dplyr)
library(lubridate)
require(lubridate)
wq.all<-as.data.frame(read.csv('C:/WQdata.csv',header=TRUE,stringsAsFactors = FALSE))
test.wq<-wq.all[1:5,12:13]
class(test.wq)
[1] "data.frame"
mode(test.wq)
[1] "list"
str(test.wq)
'data.frame': 5 obs. of 2 variables:
$ YearMonth : chr "2019-07" "2019-06" "2019-05" "2019-04" ...
$ SampleTime: chr "07/09/2019 14:44" "06/10/2019 14:17" "05/22/2019 14:31" "04/08/2019 14:15" ...
In str(test.wq), SampleTime is the data in question which I am trying to coerce from chr to date, or at least num.
First, I don't need the time values, so I used dplyr mutate() to create SampleDate with only the 10-character dates, and then was attempting to coerce using mdy():
wq.date<-test.wq%>%
mutate(SampleDate=str_sub(test.wq[[2]],start=0,end=10))%>%
mdy(SampleDate)
But this returns an error:
Error in lapply(list(...), .num_to_date) : object 'SampleDate' not found
If I only use mutate() it all seems to work fine, and gives me the new SampleDate column I was looking for:
wq.date<-test.wq%>%
mutate(SampleDate=str_sub(test.wq[[2]],start=0,end=10))
head(wq.date)
YearMonth SampleTime SampleDate
1 2019-07 07/09/2019 14:44 07/09/2019
2 2019-06 06/10/2019 14:17 06/10/2019
3 2019-05 05/22/2019 14:31 05/22/2019
4 2019-04 04/08/2019 14:15 04/08/2019
5 2019-03 03/13/2019 14:19 03/13/2019
str(wq.date)
'data.frame': 5 obs. of 3 variables:
$ YearMonth : chr "2019-07" "2019-06" "2019-05" "2019-04" ...
$ SampleTime: chr "07/09/2019 14:44" "06/10/2019 14:17" "05/22/2019 14:31" "04/08/2019 14:15" ...
$ SampleDate: chr "07/09/2019" "06/10/2019" "05/22/2019" "04/08/2019" ...
So it only seems to result in error once I attempt to coerce using mdy(), even though SampleDate clearly exists and I believe I was referencing it correctly.
I have researched other posts here and here, but neither seem to get to quite this issue.
Thoughts? Many thanks!
We need to have it inside mutate or extract the column, otherwise, it is applying the function on the entire data.frame. According to ?mdy
Transforms dates stored in character and numeric vectors to Date or POSIXct objects
So, if the input is not a vector, it won't work
library(dplyr)
library(lubridate)
library(stringr)
test.wq%>%
mutate(SampleDate=str_sub(SampleTime,start=0,end=10))%>%
mutate(date = mdy(SampleDate))
I have a data frame df contains 2 fields (Number and dates) as the following
Number date
1496 Apr-08
3067 May-08
3049 Jun-08
3077 Jul-08
3237 Aug-08
3020 Sep-08
4990 Oct-08
4776 Nov-08
5140 Dec-08
5582 Jan-09
5743 Feb-09
5561 Mar-09
5974 Apr-09
I want to use plot() function in R to plot number vs. date
I've tried using axis.Date() function but it didn't work. Nothing displayed on the plotting area and I don't know why. My code was:
plot(df$Number)
axis.Date(1, at=seq(min(df$date), max(df$date), by="months"), format="%m-%Y")
Any help, please?
It seems that your biggest problem is creating an appropriate date structure for your data. It would be good to acquaint yourself to the different ways that R keeps the date structure. ?strptime has a rather good list of syntax commonly used.
In your question then, to convert your date to a form which axis.Date can work, you need to add an arbitrary day into your date field and then convert it with as.Date:
df$date <- as.Date(paste0("01-", df$date), format="%d-%b-%y")
This way, your axis.Date plot would work:
plot(df$date, df$Number, xaxt="n")
axis.Date(1, at=seq(min(df$date), max(df$date), by="months"), format="%m-%Y")
Input data:
df <- structure(list(Number = c(1496, 3067, 3049, 3077, 3237, 3020,
4990, 4776, 5140, 5582, 5743, 5561, 5974), date = c("Apr-08",
"May-08", "Jun-08", "Jul-08", "Aug-08", "Sep-08", "Oct-08", "Nov-08",
"Dec-08", "Jan-09", "Feb-09", "Mar-09", "Apr-09")), .Names = c("Number",
"date"), row.names = c(NA, -13L), class = "data.frame")
The likely source of your problem is that df$date is not a date, but e.g. a character.
Using
str(df)
'data.frame': 13 obs. of 2 variables:
$ Number: int 1496 3067 3049 3077 3237 3020 4990 4776 5140 5582 ...
$ date : chr "Apr-08" "May-08" "Jun-08" "Jul-08" ...
# note that date is a character vector here, as opposed to Date.
and
plot(df$Number)
axis(1, at=1:nrow(df), labels=df$date)
I get
I have a dataframe data,Which Contains the columns having integers,and columns containing date and time,As shown
>head(data,2)
PRESSURE AMBIENT_TEMP OUTLET_PRESSURE COMP_STATUS DATE TIME predict
1 14 65 21 0 2014-01-09 12:45:00 0.6025863
2 17 65 22 0 2014-01-10 06:00:00 0.6657910
And Now i'm going to write this back to Sql database by the chunck
sqlSave(channel,data,tablename = "ANL_ASSET_CO",append = T)
Where channel is connection name,But this gives error
[RODBC] Failed exec in Update
22018 1722 [Oracle][ODBC][Ora]ORA-01722: invalid number
But When i try excluding the date column ,it writes back without any error.
> sqlSave(channel,data[,c(1:4,7)],tablename = "ANL_ASSET_CO",append = T)
> sqlSave(channel,data[,c(1:4,6:7)],tablename = "ANL_ASSET_CO",append = T)
Because of the date column the data is not writing to ORACLE SQL developer,Could be problem with the hyphen.
How can i write , Any help !!
>class(data$DATE)
[1] "POSIXct" "POSIXt"
So had to change the data type as character
>data$DATE <- as.character(data$DATE)
>sqlSave(channel,data,tablename = "ANL_ASSET_CO",append=T)
This one worked!!
I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.
I have this csv file (fm.file):
Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246
And so on.
I run this commands:
fm.data <- as.xts(read.zoo(file=fm.file,format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
And I get the following:
[1] TRUE
How do I get the fm.data to be numeric without loosing its date index. I want to perform some statistics operations that require the data to be numeric.
I was puzzled by two things: It didn't seem that that 'read.zoo' should give you a character matrix, and it didn't seem that changing it's class would affect the index values, since the data type should be separate from the indices. So then I tried to replicate the problem and get a different result:
txt <- "Date,FM1,FM2
28/02/2011,14.571611,11.469457
01/03/2011,14.572203,11.457512
02/03/2011,14.574798,11.487183
03/03/2011,14.575558,11.487802
04/03/2011,14.576863,11.490246"
require(xts)
fm.data <- as.xts(read.zoo(file=textConnection(txt),format='%d/%m/%Y',tz='',header=TRUE,sep=','))
is.character(fm.data)
#[1] FALSE
str(fm.data)
#-------------
An ‘xts’ object from 2011-02-28 to 2011-03-04 containing:
Data: num [1:5, 1:2] 14.6 14.6 14.6 14.6 14.6 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "FM1" "FM2"
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
List of 2
$ tclass: chr [1:2] "POSIXct" "POSIXt"
$ tzone : chr ""
zoo- and xts-objects have their data in a matrix accessed with coredata and their indices are a separate set of attributes.
I think the problem is you have some dirty data in you csv file. In other words FM1 or FM2 columns contain a character, somewhere, that stops it being interpreted as a numeric column. When that happens, XTS (which is a matrix underneath) will force the whole thing to character type.
Here is one way to use R to find suspicious data:
s <- scan(fm.file,what="character")
# s is now a vector of character strings, one entry per line
s <- s[-1] #Chop off the header row
all(grepl('^[-0-9,.]*$',s,perl=T)) #True means all your data is clean
s[ !grepl('^[-0-9,.]*$',s,perl=T) ]
which( !grepl('^[-0-9,.]*$',s,perl=T) ) + 1
The second-to-last line prints out all the csv rows that contain characters you did not expect. The last line tells you which rows in the file they are (+1 because we removed the header row).
Why not simply use read.csv and then convert the first column to an Date object using as.Date
> x <- read.csv(fm.file, header=T)
> x$Date <- as.Date(x$Date, format="%d/%m/%Y")
> x
Date FM1 FM2
1 2011-02-28 14.57161 11.46946
2 2011-03-01 14.57220 11.45751
3 2011-03-02 14.57480 11.48718
4 2011-03-03 14.57556 11.48780
5 2011-03-04 14.57686 11.49025