Convert numeric values of a data table to dates - r

I want to convert a data table containing numeric values for 305 variables and 361 observations into a data table of same size containing dates. The data table does contain "NA"s.
The numeric value of the dates has the origin of excel. This is what I tried so far:
Rep_Day_monthly <- as.data.table(sapply(Rep_Day_monthly,as.numeric))
Rep_Day_monthly <- sapply(Rep_Day_monthly,as.Date)
Problem with this is, that the data table still contains numeric values, so e.g. 5963 instead of 1986-04-30.
Looking very much forward to your help!
Cheers

as.Date needs an origin (i.e. a date corresponding to 0). If the data is from Excel, this will usually be 1 Jan 1970, so you could use Rep_Day_monthly <- as.data.table(lapply(Rep_Day_monthly,as.Date,origin="1970-01-01"))

Related

Can a column of a data.table have more classes ?I would like to change the dates of some observations in a date column as 'out-of-bounds'

Can a column of a data.table have more than one classes? For example, I have a date column (class = "Date") in my data.table and I would like to change some values of this date column: I would like to change the dates of some observations as 'out-of-bounds'. When I do this, the dates that should have been changed in 'out-of-bounds', simply become NA. I think that maybe this is due to the fact that 'out-of-bounds' is character and the dates in the column are dates and the column cannot contain a mix of classes. Do you how can I do this?
Its not really possible, any data table column is a vector, and any vector can have only 1 type in it.
What is possible is to set specific values to be NULL or NA .

Convert column with dates (as strings) into date type with only the year

I have a dataset (call it df) that has several columns. One of those columns is the column date, which has strings of the form "d-MON-yy" or "dd-MON-yy" depending on if the day number is less than 10 (e.g. 9-Jan-04, 15-Oct-98) or NA.
I am trying to change this to date type values, but I only need the year. Specifically, all the dates whose yy digits are less than 20 are from this century, and all the dates whose yy digits are greater than or equal to 20are from the 1900s. I want to have the four numbers of the year in the end.
Since I am only interested in the year, I don't mind a solution that returns numeric values.
In the end, I'd like to also filter out the rows that have NA on *the date variable only.
I am pretty new to R, and I have tried to make it work with several answers I found here to no avail.
Thank you.

Fastest way to assign values in data frame to matrix in R?

I have a very large data frame with Timestamp, StationId and Value as column names.
I would like to create a new matrix where the rows are Timestamps, columns are StationIds and the matrix elements are Values.
I have tried doing so using a loop but it is taking very long:
for (row in 1:nrow(res))
{
rmatrix[toString(res[row,"Timestamp"]),toString(res[row,"StationId"])] <-
res[row,"Value"]
}
The 'res' data frame looks like this. The timestamps are for a year, at 5mins interval. There are 62 unique station ids. The elements in the Value column are actually rainfall values.
The rmatrix I'm trying to rearrange the data into looks like this. Each row is a unique timestamp at 5mins interval. Each column is the id of a station. The elements of the matrix are supposed to be the rainfall value for that station at that time.
Is there a faster way to do this?
library(tidyverse)
df <- res %>% spread(StationIds,Values)

Reading non-rectangular data in R

I have a fairly large data set in csv format that I'd like to read into R. The data is annoyingly structured (my own fault) as follows:
,US912828LJ77,,US912810ED64,,US912828D804,...
17/08/2009,101.328125,15/08/1989,99.6171875,02/09/2014,99.7265625,...
And with the second line style repeated for a few thousand times. The structure is that each pair of columns represents a timeseries of differing lengths (so that the data is not rectangular).
If I use something like
>rawdata <- read.csv("filename.csv")
I get a dataframe with all the blank entries padded with NA, and the odd columns forced to a factor datatype.
What I'd like to ultimately get to is either a set of timeseries objects (for each pair of columns) named after every even entry in the first row (the "US912828LJ77" fields) or a single dataframe with row labels as dates running from the minimum of (min of each odd column) to max of (max of each odd column).
I can't imagine I'm the only mook to put together a dataset in such an unhelpful structure but I can't see any suggestions out there for how to deal with this. Any help would be greatly appreciated!
First you need to parse every odd column to date
odd.cols = names(rawdata)[seq(1,dim(rawdata)[2]-1,2)]
for(dateCol in odd.cols){
rawdata[[dateCol]] = as.Date(rawdata[[dateCol]], "%d/%m/%Y")
}
Now I guess the problem is straightforward, you just need to find min, max values per column, create a vector running from min date to max date, join it with rawdata and handle missing values for you US* columns.

Create a stack of n subset data frames from a single data frame based on date column

I need to create a bunch of subset data frames out of a single big df, based on a date column (e.g. - "Aug 2015" in month-Year format). It should be something similar to the subset function, except that the count of subset dfs to be formed should change dynamically depending upon the available values on date column
All the subsets data frames need to have similar structure, such that the date column value will be one and same for each and every subset df.
Suppose, If my big df currently has last 10 months of data, I need 10 subset data frames now, and 11 dfs if i run the same command next month (with 11 months of base data).
I have tried something like below. but after each iteration, the subset subdf_i is getting overwritten. Thus, I am getting only one subset df atlast, which is having the last value of month column in it.
I thought that would be created as 45 subset dfs like subdf_1, subdf_2,... and subdf_45 for all the 45 unique values of month column correspondingly.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
subdf_i <- subset(df, mnth == uniqmnth[i])
i==i+1
}
I hope there should be some option in the subset function or any looping might do. I am a beginner in R, not sure how to arrive at this.
I think the perfect solution for this might be use of assign() for the iterating variable i, to get appended in the names of each of the 45 subsets. Thanks for the note from my friend. Here is the solution to avoid the subset data frame being overwritten each run of the loop.
uniqmnth <- unique(df$mnth)
for (i in 1:length(uniqmnth)){
assign(paste("subdf_",i,sep=""), subset(df, mnth == uniqmnth[i])) i==i+1
}

Resources