How to create a data frame that contains dates and NAs - r

I have a df with
columns = c("Tim", "Tom", "Peter")
the rows display a certain productID.
When Tim buyes product 1, then the df[1,1] should be something like 2019-08-01 and the the rest of the df is filled with NAs.
I Created the df with some values in it. The date values are displayed in numeric form like 18109. And I tried to transform it into "2019-08-01" as date.
df[1,1:5] <- as.Date("18109", format = "%Y-%m-%d")
Error in as.Date.numeric(value) : 'origin' must be supplied
as.Date(KundenBestelldf$Tim, format = "%Y-%m-%d")
#results in a list of NAs and the date is deleted when I am looking for head()
df[1,1:5] <- as.Date("18109", format = "%Y-%m-%d")
#Error in as.Date.numeric(value) : 'origin' must be supplied
as.Date(df$Tim, format = "%Y-%m-%d")
#results in a list of NAs and the date is deleted when I am looking for
head()
#the code to reproduce is:
#create customer vector
names <- c("Tim", "Tom", "Peter")
ID <- c(1:6)
names(ID) <- names
#matrix
matrix <- matrix(data = NA, nrow = 255, ncol = 6)
colnames(matrix) <- names
df <- data.frame(matrix)
class(df) #class is now data.frame
df[1,1:5] <- as.Date(as.integer("18109"),format = "%Y-%m-%d", origin =
"1970-01-01")
#the class is actually numeric and now I can not transform it to date

Related

Passing dataframe as argument to function

I am writing a function to process data from a huge dataframe (row by row) which always has the same column names. So I want to pass the dataframe itself as a function to read out the information I need from the individual rows. However, when I try to use it as argument I can't read the information from it for some reason.
Dataframe:
DF <- data.frame("Name" = c("A","B"), "SN" = 1:2, "Age" = c("21,34,456,567,23,123,34", "15,345,567,3,23,45,67,76,34,34,55,67,78,3"))
My code:
List <- do.call(list, Map(function(DT) {
DT <- as.data.frame(DT)
aa <- as.numeric(strsplit(DT$Age, ","))
mean.aa <- mean(aa)
},
DF))
Trying this I get a list with the column names, but all Values are NULL.
Expected output :
My expected output is a list with length equal to the number of rows in the data frame. Under each list index there should be another list with the age of the corresponding row (an also other stuff from the same row of the data table, later).
DF <- apply(data.frame("Name" = c("A","B"), "SN" = 1:2, "Age" = c("21,34,456,567,23,123,34", "15,345,567,3,23,45,67,76,34,34,55,67,78,3"), "mean.aa" = c(179.7143, 100.8571)), 1, as.list)
What am I doing wrong?
Here is one way :
DF <- data.frame("Name" = c("A","B"), "SN" = 1:2, "Age" = c("21,34,456,567,23,123,34", "15,345,567,3,23,45,67,76,34,34,55,67,78,3"))
apply(DF, 1, function(row){
aa <- as.numeric(strsplit(row["Age"], ",")[[1]])
row["mean.aa"] <- mean(aa)
as.list(row)
})

How to insert dates from one data frame into another?

I have a data frame with two columns, one containing dates, the other numbers. My goal is to insert dates from another data frame into the date column. Here is an example:
df <- data.frame(rep(as.Date("2001-01-01", origin = "1970-01-01"), 3),
c(1, 2, 3),
stringsAsFactors = F)
ins <- data.frame(rep(as.Date("1999-01-01", origin = "1970-01-01"), 3),
c(1, 2, 3),
stringsAsFactors = F)
The data frame I want to obtain is:
> df_goal
dates numbers
1 1999-01-01 1
2 2001-01-01 2
3 2001-01-01 3
I tried df[1, ] <- c(ins[1, 1], ins[1, 2]), but I got the following error:
Error in as.Date.numeric(value) : 'origin' must be supplied
However, if in df I omitt the numeric column, it works:
df <- data.frame(rep(as.Date("2001-01-01"), 3),
stringsAsFactors = F)
ins <- data.frame(rep(as.Date("1999-01-01"), 3),
c(1, 2, 3),
stringsAsFactors = F)
df[1, ] <- ins[1, 1]
How to get the first case (df with two columns) working?
I tried df[1, ] <- c(ins[1, 1], ins[1, 2]), but I got the following error:
Error in as.Date.numeric(value) : 'origin' must be supplied
Don't use c -- it transforms its arguments so they have the same class.
In this case, c(ins[1, 1], ins[1, 2]) makes a date vector; and when this is assigned onto the second column of df, R tries to coerce that column to date to make sense of the assignment, like as.Date(c(1, 2, 3)).
You can instead do df[1,] <- ins[1, c(1,2)].
Side note: Don't do this sort of insertion based on row numbers; there must be a better way to achieve what you're after, like a join/merge.
Alternatively:
df2 <- rbind(ins[1,], df[2:3,])
or
df2 <- df
df2[1,] <- ins[1,]

Create a new row in a dataframe, one element is a factor, the other numeric

I am working on doing some fairly basic descriptive statistics for a large group of data. I have written a function to try and get the statistics that I need.
I want to create a new row at the bottom of a dataframe, one element of which is a factor ("total"), and the other element of which is numeric (sum of the other rows).
Here is an example of this code:
Create the dataframe
df <- data.frame(
pop = c(201:250),
age = factor(rep(c("20-29", "30-39", "40-49", "50-59", "60-69"), 10)),
year = factor(rep(c(2012, 2013, 2014, 2015, 2016), 10)) )
Write a function to do the aggregation
DiabMort_fun <- function(VDRpop, VDRage, nyrs, nrows) {
Aggregate_fun <- function(pop, ag1, nyrs, nrows, names_list) {
popbylist <- data.frame(aggregate(pop, by = list(Category = ag1), FUN=sum))
popbylist$mean <- (popbylist$x / nyrs)
colnames(popbylist) = names_list
popbylist[nrows,] <- c("total", sum(popbylist[2]), sum(popbylist[3]))
return(popbylist)
}
VDRbyage <- Aggregate_fun(pop = VDRpop, ag1 = VDRage, nyrs = nyrs, nrows = nrows,
names_list = c("Age", "Num_pop_VDR", "Mean_pop_VDR"))
return(VDRbyage)
}
Run this function
test <- DiabMort_fun(VDRpop = df$pop, df$age,
nyrs = 5, nrows = 5)
When I run this, I get the following error message:
Warning message:
In [<-.factor(*tmp*, iseq, value = "total") :
invalid factor level, NA generated
The "totals" column is now c(NA, 11275, 2255)
What I would like it be is c("total", 11275, 2255)
Does anyone know how to create a new row in this function which will expand the factor levels to include "total"? The relevant code within the function is:
popbylist[nrows,] <- c("total", sum(popbylist[2]), sum(popbylist[3]))
Thanks
You shouldn't need to make the age and year columns factors; if you skip that step, and set stringsToFactors = FALSE in the first data.frame() call, your function should work.
If you really want to keep the present order and data types, you can turn the summary row into a 1-row dataframe, then bind that to the other frame. Just make sure the column names match:
temp <- data.frame("total", sum(popbylist[2]), sum(popbylist[3]))
colnames(temp) = names_list
popbylist <- rbind(popbylist, temp)

Keep gettinga column of <NA> in my dataframe where characters should be in R?

So I'm trying to parse together two values to create a character ID for each value in a dataframe in R. This data frame has a letter for a group the value is in, and then the ID number within that group. I want the format of ID 1 in Group A to be "1:A", so I created a new column in my df as follows:
df = data.frame(FirstID = numeric(), SecondID = numeric, Distance = numeric(), CharID = character())
And then I loop through my data and fill the data frame.
df[counter, 1] <- #value
df[counter, 2] <- #value
df[counter, 3] <- #value
numChar <- as.character(#Data$ID[i])
df[counter, 4] <- paste(numChar, "A", sep = ":", collapse = NULL)
But apparently, when I try to run this function, I get 50 warnings saying
Warning messages:
1: In [<-.factor(*tmp*, iseq, value = "1:A") :
invalid factor level, NA generated
...
Any ideas on what I may be doing wrong here?

To add dates to empty dataframe that are generated by Posixct function

I have created a sequence of dates with this script:
dates<-seq(
from=as.POSIXct("2015-1-1 0","%Y-%m-%d %H", tz="UTC"),
to=as.POSIXct("2015-12-31 24", "%Y-%m-%d %H", tz="UTC"),
by="hour"
)
Now I want to store the result to the first column of empty dataframe:
df<-data.frame(Date=as.POSIXct(character()),Area=character(), Application=character(), Type= character(),
Reading=double())
using this code
df$Date<-dates
but it gives me error:
Error in `$<-.data.frame`(`*tmp*`, "Date", value = c(1420070400, 1420074000, :
replacement has 8761 rows, data has 0
Can anyone help me to sort out this issue please?
A data.frame needs columns of equal length and cannot have one column containing 8761 observations, and the rest 0. A workaround is to initialize a data.frame with the correct dimensions for your data, filled by NA; and then assign columns.
# Initialize df
df <- data.frame(matrix(NA, nrow = length(dates), ncol = 5))
# Define names of cols and add column
names(df) <- c("Date", "Area", "Application", "Type", "Reading")
df$Date <- dates

Resources