To add dates to empty dataframe that are generated by Posixct function - r

I have created a sequence of dates with this script:
dates<-seq(
from=as.POSIXct("2015-1-1 0","%Y-%m-%d %H", tz="UTC"),
to=as.POSIXct("2015-12-31 24", "%Y-%m-%d %H", tz="UTC"),
by="hour"
)
Now I want to store the result to the first column of empty dataframe:
df<-data.frame(Date=as.POSIXct(character()),Area=character(), Application=character(), Type= character(),
Reading=double())
using this code
df$Date<-dates
but it gives me error:
Error in `$<-.data.frame`(`*tmp*`, "Date", value = c(1420070400, 1420074000, :
replacement has 8761 rows, data has 0
Can anyone help me to sort out this issue please?

A data.frame needs columns of equal length and cannot have one column containing 8761 observations, and the rest 0. A workaround is to initialize a data.frame with the correct dimensions for your data, filled by NA; and then assign columns.
# Initialize df
df <- data.frame(matrix(NA, nrow = length(dates), ncol = 5))
# Define names of cols and add column
names(df) <- c("Date", "Area", "Application", "Type", "Reading")
df$Date <- dates

Related

How to create a data frame that contains dates and NAs

I have a df with
columns = c("Tim", "Tom", "Peter")
the rows display a certain productID.
When Tim buyes product 1, then the df[1,1] should be something like 2019-08-01 and the the rest of the df is filled with NAs.
I Created the df with some values in it. The date values are displayed in numeric form like 18109. And I tried to transform it into "2019-08-01" as date.
df[1,1:5] <- as.Date("18109", format = "%Y-%m-%d")
Error in as.Date.numeric(value) : 'origin' must be supplied
as.Date(KundenBestelldf$Tim, format = "%Y-%m-%d")
#results in a list of NAs and the date is deleted when I am looking for head()
df[1,1:5] <- as.Date("18109", format = "%Y-%m-%d")
#Error in as.Date.numeric(value) : 'origin' must be supplied
as.Date(df$Tim, format = "%Y-%m-%d")
#results in a list of NAs and the date is deleted when I am looking for
head()
#the code to reproduce is:
#create customer vector
names <- c("Tim", "Tom", "Peter")
ID <- c(1:6)
names(ID) <- names
#matrix
matrix <- matrix(data = NA, nrow = 255, ncol = 6)
colnames(matrix) <- names
df <- data.frame(matrix)
class(df) #class is now data.frame
df[1,1:5] <- as.Date(as.integer("18109"),format = "%Y-%m-%d", origin =
"1970-01-01")
#the class is actually numeric and now I can not transform it to date

How to convert mutiple dataframes in a list to xts object

I have a list that list consisting of 1000 data frames each data frame having first column as Date. I want to convert all these data frames into xts object.
I have converted date into Date object using lapply.
I want to convert every data frame to xts in one command not individually one by one as it will take much time.
An option is to loop over the list, remove the first column which is the 'Date', apply the xts and specify the order.by as the first column (assuming that the class of 'Date' column is Date)
library(xts)
lst2 <- lapply(lst1, function(x) xts(x[-1], order.by = x[,1]))
data
set.seed(24)
lst1 <- list(data.frame(Date = seq(as.Date('2015-01-01'),
length.out = 10, by = '1 day'), Col2 = rnorm(10)),
data.frame(Date = seq(as.Date('2017-01-01'),
length.out = 10, by = '1 day'), Col2 = rnorm(10)))

leading NA is causing "POSIXct" "POSIXt" to be coereced to type integer

I have column of dates and a grouping column. I am creating a next.date and preivous.date based on the groups - with the following code.
library(data.table)
library(lubridate)
dates <- ymd(c("2014-01-23","2014-01-28","2014-02-22","2014-05-30","2014-07-09","2014-09-08","2014-10-19","2014-11-25","2014-12-05","2014-12-13"))
group <- c("A","B","A","A","B","C","C","A","B","C")
data <- data.table(dates,group)
data <- data[order(dates)]
#Create previous and next dates
#Previous is coerced to type integer
data <- data[, next.date := c(dates[-1], NA),by = .(group)]
data <- data[, previous.date := c(NA,dates[-.N]),by = .(group)]
With a leading NA as in previous.date - the type is coerced to integer. Question: Is it possible to stop this. I can do this with this.
data <- data[, previous.date := ymd(c(NA,dates[-.N])),by = .(group)]
This is not optimal on large data

Column name in data frame goes back to numeric

Hi I'm trying to load some dates as column names of my data frame but they will only appear as numbers (11595 for example) even if I'm forcing them with as.Date
Is there another way to do this? Thanks!
dates <- seq(as.Date("2000-1-1"), as.Date("2018-10-1"), by="3 months") -1
d.test <- data.frame(matrix(0, ncol = 8, nrow = 8))
for (i in 1:8) {
colnames(d.test)[i] <- as.Date(dates[i], "yyyy-mm-dd")
}
Set dates with as.character, also avoid for loops when not necessary. See ?colnames or ?rownames, the value assigned should be a character vector
dates <- seq(as.Date("2000-1-1"), as.Date("2018-10-1"), by="3 months") -1
d.test <- data.frame(matrix(0, ncol = 8, nrow = 8))
colnames(d.test) <- as.character(dates)[1:ncol(d.test)]
See ?colnames. Values for column names should be a character vector and will be coerced using as.character(). Column names are just labels, not variable types.

Problems with casting a dataframe with text columns

I have this text dataframe with all columns being character vectors.
Gene.ID barcodes value
A2M TCGA-BA-5149-01A-01D-1512-08 Missense_Mutation
ABCC10 TCGA-BA-5559-01A-01D-1512-08 Missense_Mutation
ABCC11 TCGA-BA-5557-01A-01D-1512-08 Silent
ABCC8 TCGA-BA-5555-01A-01D-1512-08 Missense_Mutation
ABHD5 TCGA-BA-5149-01A-01D-1512-08 Missense_Mutation
ACCN1 TCGA-BA-5149-01A-01D-1512-08 Missense_Mutation
How do I build a dataframe from this using reshape/reshape 2 such that I get a dataframe of the format Gene.ID~barcodes and the values being the text in the value column for each and "NA" or "WT" for a filler?
The default aggregation function keeps defaulting to length, which I want to avoid if possible.
I think this will work for your problem. First, I'm generating some data similar to yours. I'm making gene.id and barcode a factor for simplicity and this should be the same as your data.
geneNames <- c(paste("gene", 1:10, sep = ""))
data <- data.frame(gene = as.factor(c(1:10, 1:4, 6:10)),
express = sample(c("Silent", "Missense_Mutation"), 19, TRUE),
barcode = as.factor(c(rep(1, 10), rep(2, 9))))
I made a vector geneNames a vector of the gene names (e.g, A2M). In order to get the NA values in those missing an expression of a given gene, you need to merge the data such that you have number_of_genes by number_of_barcodes rows.
geneID <- unique(data$gene)
data2 <- data.frame(barcode = rep(unique(data$barcode), each = length(geneID)),
gene = geneID)
data3 <- merge(data, data2, by = c("barcode", "gene"), all.y = TRUE)
Now melting and casting the data,
library(reshape)
mdata3 <- melt(data3, id.vars = c("barcode", "gene"))
cdata <- cast(mdata3, barcode ~ variable + gene, identity)
names(cdata) <- c("barcode", geneNames)
You should then have a data frame with number_of_barcodes rows and with (number_of_unique_genes + 1) columns. Each column should contain the expression information for that particular gene in that particular sample barcode.

Resources