I have a data frame df contains 2 fields (Number and dates) as the following
Number date
1496 Apr-08
3067 May-08
3049 Jun-08
3077 Jul-08
3237 Aug-08
3020 Sep-08
4990 Oct-08
4776 Nov-08
5140 Dec-08
5582 Jan-09
5743 Feb-09
5561 Mar-09
5974 Apr-09
I want to use plot() function in R to plot number vs. date
I've tried using axis.Date() function but it didn't work. Nothing displayed on the plotting area and I don't know why. My code was:
plot(df$Number)
axis.Date(1, at=seq(min(df$date), max(df$date), by="months"), format="%m-%Y")
Any help, please?
It seems that your biggest problem is creating an appropriate date structure for your data. It would be good to acquaint yourself to the different ways that R keeps the date structure. ?strptime has a rather good list of syntax commonly used.
In your question then, to convert your date to a form which axis.Date can work, you need to add an arbitrary day into your date field and then convert it with as.Date:
df$date <- as.Date(paste0("01-", df$date), format="%d-%b-%y")
This way, your axis.Date plot would work:
plot(df$date, df$Number, xaxt="n")
axis.Date(1, at=seq(min(df$date), max(df$date), by="months"), format="%m-%Y")
Input data:
df <- structure(list(Number = c(1496, 3067, 3049, 3077, 3237, 3020,
4990, 4776, 5140, 5582, 5743, 5561, 5974), date = c("Apr-08",
"May-08", "Jun-08", "Jul-08", "Aug-08", "Sep-08", "Oct-08", "Nov-08",
"Dec-08", "Jan-09", "Feb-09", "Mar-09", "Apr-09")), .Names = c("Number",
"date"), row.names = c(NA, -13L), class = "data.frame")
The likely source of your problem is that df$date is not a date, but e.g. a character.
Using
str(df)
'data.frame': 13 obs. of 2 variables:
$ Number: int 1496 3067 3049 3077 3237 3020 4990 4776 5140 5582 ...
$ date : chr "Apr-08" "May-08" "Jun-08" "Jul-08" ...
# note that date is a character vector here, as opposed to Date.
and
plot(df$Number)
axis(1, at=1:nrow(df), labels=df$date)
I get
Related
I have a dataframe which contains date and time for the columns. Let's name this dataframe date_time. Since the data type is factor type, I would like to convert the whole column of date_time to numerics without changing anything, eg 2020-01-20 14:02:50 to 20200120140250.
I have about 1000 rows of data. Does anyone knows how to produce the output? I have tried as.numeric and gsub but they doesnt work. I think using POSIXct might work but I do not understand the reasoning behind it.
example of my data:
2020-07-08 21:40:26
2020-07-08 16:48:57
2020-07-01 15:54:10
2020-07-13 20:27:06
2020-07-27 16:08:12
and the list goes on.
You can try:
gsub("[[:punct:] ]", "", as.character(as.POSIXct("2020-01-20 14:02:50")))
The as.character keeps the visual output instead working with the underlying numbers.
UDPATE:
date_time <- data.frame(time = as.POSIXct(
c("2020-07-08 21:40:26", "2020-07-08 16:48:57", "2020-07-01 15:54:10",
"2020-07-13 20:27:06", "2020-07-27 16:08:12", "2020-01-20 14:02:50")))
date_time$num_time <- gsub("[[:punct:] ]", "", as.character(date_time$time))
Solution with lubricdate
dt1 <- as.factor(c("2020-07-08 21:40:26", "2020-07-08 16:48:57", "2020-07-01 15:54:10",
"2020-07-13 20:27:06", "2020-07-27 16:08:1"))
dt <- data.frame(date=ymd_hms(dt1))
dt
class(dt$date)
Result
date
1 2020-07-08 21:40:26
2 2020-07-08 16:48:57
3 2020-07-01 15:54:10
4 2020-07-13 20:27:06
5 2020-07-27 16:08:01
> class(dt$date)
[1] "POSIXct" "POSIXt"
I'm trying to understand why my lubridate mdy() function is returning an error in lapply() to convert dates in a dplyr pipeline. I have used mdy() on other data in a similar method but have yet to see this issue. I am relatively new to R but had been able to troubleshoot other issues until now. I am not very familiar with how to use lapply().
My data is a large .csv of water quality data, which I'm subsetting to simply show the data in question.
library(dplyr)
library(lubridate)
require(lubridate)
wq.all<-as.data.frame(read.csv('C:/WQdata.csv',header=TRUE,stringsAsFactors = FALSE))
test.wq<-wq.all[1:5,12:13]
class(test.wq)
[1] "data.frame"
mode(test.wq)
[1] "list"
str(test.wq)
'data.frame': 5 obs. of 2 variables:
$ YearMonth : chr "2019-07" "2019-06" "2019-05" "2019-04" ...
$ SampleTime: chr "07/09/2019 14:44" "06/10/2019 14:17" "05/22/2019 14:31" "04/08/2019 14:15" ...
In str(test.wq), SampleTime is the data in question which I am trying to coerce from chr to date, or at least num.
First, I don't need the time values, so I used dplyr mutate() to create SampleDate with only the 10-character dates, and then was attempting to coerce using mdy():
wq.date<-test.wq%>%
mutate(SampleDate=str_sub(test.wq[[2]],start=0,end=10))%>%
mdy(SampleDate)
But this returns an error:
Error in lapply(list(...), .num_to_date) : object 'SampleDate' not found
If I only use mutate() it all seems to work fine, and gives me the new SampleDate column I was looking for:
wq.date<-test.wq%>%
mutate(SampleDate=str_sub(test.wq[[2]],start=0,end=10))
head(wq.date)
YearMonth SampleTime SampleDate
1 2019-07 07/09/2019 14:44 07/09/2019
2 2019-06 06/10/2019 14:17 06/10/2019
3 2019-05 05/22/2019 14:31 05/22/2019
4 2019-04 04/08/2019 14:15 04/08/2019
5 2019-03 03/13/2019 14:19 03/13/2019
str(wq.date)
'data.frame': 5 obs. of 3 variables:
$ YearMonth : chr "2019-07" "2019-06" "2019-05" "2019-04" ...
$ SampleTime: chr "07/09/2019 14:44" "06/10/2019 14:17" "05/22/2019 14:31" "04/08/2019 14:15" ...
$ SampleDate: chr "07/09/2019" "06/10/2019" "05/22/2019" "04/08/2019" ...
So it only seems to result in error once I attempt to coerce using mdy(), even though SampleDate clearly exists and I believe I was referencing it correctly.
I have researched other posts here and here, but neither seem to get to quite this issue.
Thoughts? Many thanks!
We need to have it inside mutate or extract the column, otherwise, it is applying the function on the entire data.frame. According to ?mdy
Transforms dates stored in character and numeric vectors to Date or POSIXct objects
So, if the input is not a vector, it won't work
library(dplyr)
library(lubridate)
library(stringr)
test.wq%>%
mutate(SampleDate=str_sub(SampleTime,start=0,end=10))%>%
mutate(date = mdy(SampleDate))
I'm having trouble with a data conversion. I have this data that I get from a .csv file, for instance:
comisiones[2850,28:30]
Periodo.Pago Monto.Pago.Credito Disposicion.En.Efectivo
2850 Mensual 11,503.68 102,713.20
The field Monto.Pago.Credito has a Factor data class and I need it to be numeric but the double precision kind. I need the decimals.
str(comisiones$Monto.Pago.Credito)
Factor w/ 3205 levels "1,000.00","1,000.01",..: 2476 2197 1373 1905 1348 3002 1252 95 2648 667 ...
So I use the generic data conversion function as.numeric():
comisiones$Monto.Pago.Credito <- as.numeric(comisiones$Monto.Pago.Credito)
But then the observation changes to this:
comisiones[2850,28:30]
Periodo.Pago Monto.Pago.Credito Disposicion.En.Efectivo
2850 Mensual 796 102,713.20
str(comisiones$Monto.Pago.Credito)
num [1:5021] 2476 2197 1373 1905 1348 ...
The max of comisiones$Monto.Pago.Credito should be 11,504.68 but now it is 3205.
I don't know if there is a specific data class or type for the decimals in R, I've looked for it but, it didnĀ“t work.
You need to clean up your column firstly, like remove the comma, convert it to character then to numeric:
comisiones$Monto.Pago.Credito <- as.numeric(gsub(",", "", comisiones$Monto.Pago.Credito))
The problem shows up when you convert a factor variable directly to numeric.
You can use extract_numeric from the tidyr package - it will handle factor inputs and remove commas, dollar signs, etc.
library(tidyr)
comisiones$Monto.Pago.Credito <- extract_numeric(comisiones$Monto.Pago.Credito)
If the resulting numbers are large, they may not print with decimal places when you view them, whether you used as.numeric or extract_numeric (which itself calls as.numeric). But the precision is still being stored. For instance:
> x <- extract_numeric("1,200,000.3444")
> x
[1] 1200000
Verify that precision is still stored:
> format(x, nsmall = 4)
[1] "1200000.3444"
> x > 1200000.3
[1] TRUE
first time question, so if I missed something I apologize:
I imported an excel file into R! using XLconnect, the str() function is as follow:
data.frame': 931 obs. of 5 variables:
$ Media : chr "EEM" "EEM" "EEM" "EEM" ...
$ Month : POSIXct, format: "2014-08-01" "2014-08-01" "2014-08-01" "2014-08-01" ...
$ Request_Row : num 8 25 26 37 38 44 53 62 69 83 ...
$ Total_Click : num 12 9 9 8 8 8 7 7 7 7 ...
$ Match_Type : chr "S" "S" "S" "S" ...
when I use the following sqldf I get no rows selected, anyway to what could be wrong:
sqldf(" select Media, sum(Total_Click) , avg(Request_Row), min(Request_Row) , max(Request_Row), count(distinct(Media)) from All_Data
where Request_Row < 100
and month='2014-09-01'
group by 1,2 order by 2,6 desc ")
<0 rows> (or 0-length row.names)
Thanks for the help
Vj
Its not clear what is intended but the code shown has these problems:
Month is used in the data but month is used in the SQL statement
SQLite has no date or time types and so if you send a POSIXct value to SQLite it will be interpreted as the number of seconds since the UNIX epoch (in GMT time zone). Thus the comparison of the month to a character string won't work. You can convert the number of seconds to yy-mm-dd using the SQLite strftime or date functions. Alternately use a database that has datetime types. sqldf supports the H2 database and it supports date and time types.
The statement is trying to group by Media and sum(Total_Click). Grouping by an aggregated value is not legal although perhaps it could be done by nesting selects depending on what you intended.
Since the statement is grouping by Media the expressoin count(distinct(Media)) fromAll_Data will always be 1 since there can only be one Media in such a group.
You will need to clarify what your intent is but if we drop or fix up the various points we can get this:
sqldf("select
Media,
sum(Total_Click) sum_Total_Click,
avg(Request_Row) avg_Request_Row,
min(Request_Row) min_Request_Row,
max(Request_Row) max_Request_Row
from All_Data
where Request_Row < 100
and date(month, 'unixepoch', 'localtime') = '2014-08-01'
group by 1 order by 2 desc")
which gives:
Media sum_Total_Click avg_Request_Row min_Request_Row max_Request_Row
1 EEM 38 24 8 37
RH2 To use the RH2 package and H2 database instead be sure you have Java and RH2 installed (RH2 includes the H2 database so that does not need to be separately installed) and then:
library(RH2)
library(sqldf)
sqldf("...")
where the ... is replaced with the same SQL statement except the date comparison simplifies to this line:
and month = '2014-08-01'
Data: When posting to the SO R tag please show your data using dput. In this case this was used:
All_Data <-
structure(list(Media = c("EEM", "EEM", "EEM", "EEM"), Month = structure(c(1406865600,
1406865600, 1406865600, 1406865600), class = c("POSIXct", "POSIXt"
), tzone = ""), Request_Row = c(8, 25, 26, 37), Total_Click = c(12,
9, 9, 8), Match_Type = c("S", "S", "S", "S")), .Names = c("Media",
"Month", "Request_Row", "Total_Click", "Match_Type"), row.names = c(NA,
-4L), class = "data.frame")
Update: Misc revisions.
I have a CSV file of 1000 daily prices
They are of this format:
1 1.6
2 2.5
3 0.2
4 ..
5 ..
6
7 ..
.
.
1700 1.3
The index is from 1:1700
But I need to specify a begin date and end date this way:
Start period is lets say, 25th january 2009
and the last 1700th value corresponds to 14th may 2013
So far Ive gotten this close to this problem:
> dseries <- ts(dseries[,1], start = ??time??, freq = 30)
How do I go about this? thanks
UPDATE:
managed to create a seperate object with dates as suggested in the answers and plotted it, but the y axis is weird, as shown in the screenshot
Something like this?
as.Date("25-01-2009",format="%d-%m-%Y") + (seq(1:1700)-1)
A better way, thanks to #AnandaMahto:
seq(as.Date("2009-01-25"), by="1 day", length.out=1700)
Plotting:
df <- data.frame(
myDate=seq(as.Date("2009-01-25"), by="1 day", length.out=1700),
myPrice=runif(1700)
)
plot(df)
R stores Date-classed objects as the integer offset from "1970-01-01" but the as.Date.numeric function needs an offset ('origin') which can be any staring date:
rDate <- as.Date.numeric(dseries[,1], origin="2009-01-24")
Testing:
> rDate <- as.Date.numeric(1:10, origin="2009-01-24")
> rDate
[1] "2009-01-25" "2009-01-26" "2009-01-27" "2009-01-28" "2009-01-29"
[6] "2009-01-30" "2009-01-31" "2009-02-01" "2009-02-02" "2009-02-03"
You didn't need to add the extension .numeric since R would automticallly seek out that function if you used the generic stem, as.Date, with an integer argument. I just put it in because as.Date.numeric has different arguments than as.Date.character.