I have created list of intervals from below code.
library(lubridate)
date1 <- ymd_hms("2000-01-01 05:30:00",tz = "US/Eastern")
shifts <- lapply(0:14, function(x){
mapply(function(y,z){
interval((date1+days(x)+minutes(y)), (date1+days(x)+minutes(y+z)))
}, y = c(0,150,390,570,690,810,1050), z = c(600,570,600,600,600,600,600), SIMPLIFY = FALSE)
})
I have another data set df with 105 columns.
I am trying to give column names as shifts intervals.
But the format is changing. I want my column names as same as shifts. I am trying as below.
list <- unlist(shifts, recursive = FALSE)
colnames(df)<-as.date(list)
The reason this is failing is because list is still of type interval. If you want to use the contents of this interval as colnames, you need to convert them to a list of characters like this:
list <- unlist(shifts, recursive = FALSE)
dmy <- list()
for(i in 1:length(list)){
foo <- c(list[[i]])
foo <- as.character(foo)
dmy <- append(dmy, foo)
}
colnames(df) <- dmy # list of characters
Output:
> class(list[[1]])
[1] "Interval"
attr(,"package")
[1] "lubridate"
> class(dmy[[1]])
[1] "character"
Now you should be able to rename columns of df :)
Related
Coming from Python + Pandas, I tried to convert a column in an R data.frame.
In Python/Pandas I'd do it like so: df[['weight']] = df[['weight']] / 1000
In R I came up with this:
convertWeight <- function(w) {
return(w/1000)
}
df$weight <- lapply(df$weight, convertWeight)
I know of the library dplyr which has the function mutate. That would allow me to transform columns as well.
Is there a different way to mutate a column without using the dplyr library? Something that comes close to Pandas way of doing it?
EDIT
Inspecting the values in df$weight I see this:
> df[1,]
date weight
1 1.552954e+12 84500.01
> typeof(df[1,1])
[1] "list"
> df$weight[1]
[[1]]
[1] 84500.01
> typeof(df$weight[1])
[1] "list"
This is neither a number, nor a char. Why list?
Btw: I have the data from a json import like so:
library(rjson)
data <- fromJSON(file = "~/Desktop/weight.json")
# convert json data to data.frame
df <- as.data.frame(do.call("rbind", json_data))
# select only two columns
df <- df[c("date", "weight")]
# now I converted the `date` value from posix to date
# and the weight value from milli grams to kg
# ...
Obviously I have a lot to learn about R.
df$weight = as.numeric(df$weight)
df$weight = df$weight / 1000
# If you would like to eliminate the display of scientific notation:
options(scipen = 999)
# If having difficulty still because of the list, try
df = as.data.frame(df)
I am Rstudio for my R sessions and I have the following R codes:
d1 <- read.csv("mydata.csv", stringsAsFactors = FALSE, header = TRUE)
d2 <- d1 %>%
mutate(PickUpDate = ymd(PickUpDate))
str(d2$PickUpDate)
output of last line of code above is as follows:
Date[1:14258], format: "2016-10-21" "2016-07-15" "2016-07-01" "2016-07-01" "2016-07-01" "2016-07-01" ...
I need an additional column (let's call it MthDD) to the dataframe d2, which will be the Month and Day of the "PickUpDate" column. So, column MthDD need to be in the format mm-dd but most importantly, it should still be of the date type.
How can I achieve this?
UPDATE:
I have tried the following but it outputs the new column as a character type. I need the column to be of the date type so that I can use it as the x-axis component of a plot.
d2$MthDD <- format(as.Date(d2$PickUpDate), "%m-%d")
Date objects do not display as mm-dd. You can create a character string with that representation but it will no longer be of Date class -- it will be of character class.
If you want an object that displays as mm-dd and still acts like a Date object what you can do is create a new S3 subclass of Date that displays in the way you want and use that. Here we create a subclass of Date called mmdd with an as.mmdd generic, an as.mmdd.Date method, an as.Date.mmdd method and a format.mmdd method. The last one will be used when displaying it. mmdd will inherit methods from Date class but you may still need to define additional methods depending on what else you want to do -- you may need to experiment a bit.
as.mmdd <- function(x, ...) UseMethod("as.mmdd")
as.mmdd.Date <- function(x, ...) structure(x, class = c("mmdd", "Date"))
as.Date.mmdd <- function(x, ...) structure(x, class = "Date")
format.mmdd <- function(x, format = "%m-%d", ...) format(as.Date(x), format = format, ...)
DF <- data.frame(x = as.Date("2018-03-26") + 0:2) # test data
DF2 <- transform(DF, y = as.mmdd(x))
giving:
> DF2
x y
1 2018-03-26 03-26
2 2018-03-27 03-27
3 2018-03-28 03-28
> class(DF2$y)
[1] "mmdd" "Date"
> as.Date(DF2$y)
[1] "2018-03-26" "2018-03-27" "2018-03-28"
Try using this:
PickUpDate2 <- format(PickUpDate,"%m-%d")
PickUpDate2 <- as.Date(PickUpDate2, "%m-%d")
This should work, and you should be able to bind_cols afterwards, or just add it to the data frame right away, as you proposed in the code you provided. So the code should be substituted to be:
d2$PickUpDate2 <- format(d2$PickUpDate,"%m-%d")
d2$PickUpDate2 <- as.Date(d2$PickUpDate2, "%m-%d")
The following dataframe:
df <- data.frame(matrix(rnorm(9*9), ncol=9))
names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1")
row.names(df) <- names(df)
...is split by rownames according to common indices found after "_" and i release dataframes from the list to the global environment:
list_all <- split(df,sub(".+_","",rownames(df)))
list2env(list_all,envir=.GlobalEnv)
Many of my dataframes have now numeric names, and cannot be adressed easily, so i want to change their names. Id like to add "df_" to every name, but since i dont know how to do it, i was told make.names could be nice. I create a vector of all unique indices, and factorize it, which i think maintains the original order of the indices:
indx <- gsub(".*_", "", names(df))
indx1 <- factor(indx, levels=unique(indx))
new.names <- make.names(unique(indx1))
new.names
[1] "X1" "p" "o1"
new.names is in the order i want it to be. I apply the new names to the list, and release it to the environment
list_all <- setNames(list_all, new.names)
list2env(list_all,envir=.GlobalEnv)
Now, the numeric names have an added leading X (nice!), but the sequence of the dataframes has changed and names have been wrongly assigned (dataframe p contains now all rows with "o1" and vice versa).
Questions:
Is there an easy way to add strings to object names of the same class in a workspace?
If i am going to do it with the make.names route, how can i absolutely make sure that the vectors in list_all are named in the
same order as in new.names?
Thank you!
Why not simply using, just after having created list_all:
names(list_all) = paste0("df_", names(list_all))
list2env(list_all,envir=.GlobalEnv)
#> df_1
# c_1 d_1 e_1 a_p b_p c_p 1_o1 2_o1 3_o1
#c_1 1.10388982 -0.2329471 -0.3330288 -2.0477186 -1.4576052 1.5411154 -0.9529714 0.289516457 -0.01017546
#d_1 -1.02420662 -0.1002591 -0.7884373 1.5021531 0.3551084 0.7755127 0.7679464 -0.002950944 -0.69849456
#e_1 -0.02004774 -0.1873947 -0.3674220 0.7321503 0.9076226 -0.4997974 -0.2915408 -1.376529597 -1.43563284
Here's a function that I think does what you want:
# dummy data:
x <- numeric(0)
y <- numeric(0)
z <- numeric(0)
df1 <- data.frame()
df2 <- data.frame()
df3 <- data.frame()
df4 <- data.frame()
renameObjects <- function(env=.GlobalEnv, class, pfx) {
objs <- ls(envir = env) # get list of objects
classes <- sapply(objs, function(x) class(get(x))) == class
for (obj in objs[classes]) {
assign(paste0(pfx, obj), get(obj), envir = env)
}
rm(list=objs[classes], envir = env)
}
# run the function
renameObjects(class='data.frame', pfx = 'my_prefix_')
Results
> ls()
[1] "df1" "df2" "df3" "df4"
[5] "renameObjects" "x" "y" "z"
> renameObjects(class='data.frame', pfx = 'my_prefix_')
> ls()
[1] "my_prefix_df1" "my_prefix_df2" "my_prefix_df3" "my_prefix_df4"
[5] "renameObjects" "x" "y" "z"
This seems like a simple enough function to write, but I think I'm misunderstanding the requirements for formal arguments / how R parses and evaluates a function.
I'm trying to write a function that converts any character vector of the form "%m/%d/%Y" (and belonging to data.frame df) to a date vector, and formats it as "%m/%d/%Y", as follows:
dateformat <- function(x) {
df$x <- (format(as.Date(df$x, format = "%m/%d/%Y"), "%m/%d/%Y"))
}
I was thinking that...
dateformat(a)
... would just take the "a" as the actual argument for x and plug it into the function, thus resolving as:
df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))
However, I get the following error when running dateformat(a):
Error in as.Date.default(df$x, format = "%m/%d/%Y") :
do not know how to convert 'df$x' to class “Date”
Can someone please explain why my understanding of formal/actual arguments and/or R function parsing/evaluation is incorrect? Thank you.
Update
Of course, for all the variables I want to convert to dates (e.g., df$a, df$b, df$c), I could just write
df$a <- (format(as.Date(df$a, format = "%m/%d/%Y"), "%m/%d/%Y"))
df$b <- (format(as.Date(df$b, format = "%m/%d/%Y"), "%m/%d/%Y"))
df$c <- (format(as.Date(df$c, format = "%m/%d/%Y"), "%m/%d/%Y"))
But I'm looking to improve my coding skills by making a more general function to which I could feed a vector of variables. For instance, what if I had df$a to df$z, all character variables that I wanted to convert to date variables? After I write a proper function, I'd like to then perhaps run it like so:
for (n in letters) {
dateformat(n)
}
First, the format(...) function returns a character vector, not a date, so if x is a string,
format(as.Date(x, format = "%m/%d/%Y"), "%m/%d/%Y")
converts x to date and then back to character, as in:
result <- format(as.Date("01/03/2014", format = "%m/%d/%Y"), "%m/%d/%Y")
result
# [1] "01/03/2014"
class(result)
# [1] "character"
Second, referencing an object, such as df, in a function, on the LHS of an expression, causes R to create that object in the scope of the function.
a <- 2
f <- function(x) a <- x
f(3)
a
# [1] 2
Here, we set a variable, a, to 2. Then in the function we create a new variable, a in the scope of the function, set it to x (3), and destroy it when the function returns. So in the global environment a is still 2.
If you insist on using a dateformat(...) function, this should work work:
df <- data.frame(a=paste("01",1:10,"2014",sep="/"),
b=paste("02",11:20,"2014",sep="/"),
c=paste("03",21:30,"2014",sep="/"))
dateformat <- function(x) as.Date(df[[x]], format = "%m/%d/%Y")
for (n in letters[1:3]) df[[n]] <- dateformat(n)
sapply(df,class)
# a b c
# "Date" "Date" "Date"
This will be more efficient though:
df <- as.data.frame(lapply(df,as.Date,format="%m/%d/%Y"))
I'm retrieving one minute quotes from google. After processing the data I try to create an xts object with one minute intervals but get same datetime repeated several times but don't understand why. Note that if I use the same data to build a vector of timestamps called my.dat2it does work.
library(xts)
url <- 'https://www.google.com/finance/getprices?q=IBM&i=60&p=15d&f=d,o,h,l,c,v'
x <- read.table(url,stringsAsFactors = F)
mynam <- unlist(strsplit(unlist(strsplit(x[5,], split='=', fixed=TRUE))[2] , split=','))
interv <- as.numeric(unlist(strsplit(x[4,], split='=', fixed=TRUE))[2])
x2 <- do.call(rbind,strsplit(x[-(1:7),1],split=','))
rownames(x2) <- NULL
colnames(x2) <- mynam
ind <- which(nchar(x2[,1])>5)
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
#To convert from data.frame to numeric
class(x2) <- 'numeric'
my.dat <- rep(0,nrow(x2))
#Convert all to same format
for (i in 1:nrow(x2)) {
if (nchar(x2[i,1])>5) {
ini.dat <- x2[i,1]
my.dat[i] <- ini.dat
} else {
my.dat[i] <- ini.dat+interv*x2[i,1]
}
}
df <- xts(x2[,-1],as.POSIXlt(my.dat, origin = '1970-01-01'))
head(df,20)
my.dat2 <- as.POSIXlt(my.dat, origin = '1970-01-01')
head(my.dat2,20)
I tried a simpler example simulating the data and creating a sequence of dates by minute to create the xts object and it worked so it must be something that I'm missing when passing the dates to the xts function.
Your my.dat object has duplicated values and xts and zoo objects must be ordered, so all the duplicate values are being grouped together.
The issue is this line, where you only take the second element, rather than every non-blank element.
x2[ind,1] <- unlist(strsplit(x2[ind,1], split='a', fixed=TRUE))[2]
# this should be
x2[ind,1] <- sapply(strsplit(x2[ind,1], split='a', fixed=TRUE), "[[", 2)