I have an Excel spreadsheet with dates in the header. I want to import the spreadsheet into R, presumably using the read.xlsx() function. However, the dates are converted to a string of the internal value from Excel with an "X" in the front. I am hoping to keep the dates as a Date class, or convert the strings to a Date. I understand I could use as.Date() if the date was at least in a format, or the number of days from a specified origin, but it has the "X".
Thank you very much for the help.
Eg.
the excel spreadsheet "Practice"
Sample 09-Jul 10-Jul 11-Jul
1 3 10 2
2 5 0
3 1 0 0
then in R:
practice<-read.xlsx("Practice.xlsx")
Sample X42925 X42926 X42927
1 1 3 10 2
2 2 5 0 NA
3 3 1 0 0
practice2=gather(practice,Date,value,-Sample,na.rm=TRUE)
Sample Date value
1 1 X42925 3
2 2 X42925 5
3 3 X42925 1
4 1 X42926 10
5 2 X42926 0
6 3 X42926 0
7 1 X42927 2
9 3 X42927 0
practice2$Date=as.Date(practice2$Date)
Error in charToDate(x) :
character string is not in a standard unambiguous format
The value X42925 is an Excel serial date, corresponding roughly to the date which is 42925 days after January 1, 1900. We can convert these serial dates to R dates using as.Date with an appropriate origin.
You should be able to convert your Date column using something like the following. This assumes the dates, prefixed by X, were read in as text.
dates <- as.numeric(substr(practice2$Dates, 2, nchar(practice2$Dates)))
practice2$Dates <- as.Date(dates, origin = '1899-12-30')
Demo
Try saving the Excel file as a .csv instead. That converts your data and dates into plain text, which should import into R with no problem.
Alternatively, try one of the methods here:
http://www.milanor.net/blog/read-excel-files-from-r/
Good luck!
Related
I am using a model to create prediction. The model is giving me a factor out which ranges from 0 to 6.
I am trying to report this as this value, but when I try to convert this to a number or put it into a data frame, it converts the 0 value to a 1 and all the other values up one...sometimes.
out = as.factor(c(0,1,2,3,4,5))
out
[1] 0 1 2 3 4 5
Levels: 0 1 2 3 4 5
as.numeric(out)
[1] 1 2 3 4 5 6
I would simply subtract by 1 if this increased the value by 1 everytime, but if my model returns only non-zero values, it will not increase the value:
out = as.factor(c(1,2,3,4,5,6))
as.numeric(out)
[1] 1 2 3 4 5 6
Is there a simple way to get the raw values out of the factor rather than R converting the 0 to a 1 and adjusting the rest of the values?
Thank you,
RStudio 1.3.1093
r 4.0.3
From my own comments, I found the solution here: How to convert a factor to integer\numeric without loss of information?
"In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f))." - Joshua Ulrich
This solved the issue and I was able to put into a data frame without a problem.
I have a column within a data frame containing hours that was created by dividing by 60 from another column that is comprised of minutes and would like them formatted to hh:mm
I found this post to be helpful:
Convert a decimal number to HH:MM:SS in R
but the output is not correct for example
Duration.h.mm
1 1.93
2 0.62
3 2.24
library(chron)
times(rd$Duration.h.mm / (24 *60))
## output :
Duration.h.mm
1 00:01:56
2 00:00:37
3 00:02:14
I'm after
1 1:56
2 0:37
3 2:14
thanks in advance.
Try this:
> format(strptime(times(rd$Duration.h.mm / (24*60)),"%H:%M:%S"),'%M:%S')
[1] "01:56" "00:37" "02:14"
I have a data frame contains time variable of object POSIXct in the following format: yyyy-mm-dd HH:MM:SS.
I've been able to split it into time intervals using: split(dfrm, cut((dfrm$time), "30 mins"))
However, this method starts the splitting from the minimum time value I have in my data; but, I'm interested in splitting the data to daily time intervals, and summarise the results whitin each time interval.
Here is some reproducible example (sorry for the "poor" example; I can't copy and paste right now):
dfrm <- data.frame(time=as.POSIXct(c("2017-02-01 00:58:53", "2017-02-01 00:53:02","2017-02-01 01:15:09","2017-02-01 02:28:08","2017-02-01 02:15:20","2017-02-01 02:37:25")))
So, using my splitting method mentioned above, I'll get 4 groups starts at 00:53:00 with intervals of 30 minutes (and the data will be splitted between 3 of that 4 groups with 3,1 and 2 observations). While I'm looking for something like that:
Interval Total rows
1 0
2 2
3 1
4 0
5 2
6 1
7 0
8 0
.
.
where 1 is for 00:00:00-00:30:00, 2 is for 00:30:00-01:00:00,..., 48 is for 23:30:00-00:00:00
I have a dataframe where the columns represent monthly data and the rows different simulations. the data I am working with accumulates over time so I want to take the difference between the months to get the true value for that month. There are not headers for my data frame
For example:
View(df)=
1 3 4 6 19 23 24 25 26 ...
1 2 3 4 5 6 7 8 9 ...
0 0 2 3 5 7 14 14 14 ...
My plan was to use the diff() function or something like it, but I am having trouble using it on a dataframe.
I have tried:
df1<-diff(df, lag = 1, differences = 1)
but only get zeros.
I am grateful for any advice.
see ?apply. If it's a data frame
apply(df,2,diff)
should work. Also since a dataframe is a list of vectors sapply(df,diff) should work.
It must be a very easy task, but I can't find the right line of code for this:
Data frame (df) has several columns (Date is the first one, containing string object), and around 200 rows.
Date V1
1 01/01/2011 5
2 02/01/2011 4
3 03/01/2011 2
...
200 05/09/2011
needs to become this (current year):
Date V1
1 01/01/2013 5
2 02/01/2013 4
3 03/01/2013 2
...
200 05/09/2013
Thanks!
df$Date <- sub('11$','13',df$Date)
should work.
But beware: naming a variable Date is a bad idea because R already has an internal data type with that name.