I must be missing something obvious here but for this:
> range(data$timestamp)
[1] "2015-06-29 09:32:43.000 UTC" "2015-07-03 15:50:35.986 UTC"
I want to do something like:
df <- data.frame(as.Date(range(data$timestamp)))
names(df) <- c('from', 'to')
and get a data frame with columns 'from' and 'to' without needing an extra variable only to index. Written as above data.frame converts the vector to two rows in a single-column data frame. I've tried various combinations of cbind, matrix, t, list and attempts at destructuring. What is the best way to do this?
df <- as.data.frame(as.list(as.Date(range(data$timestamp))))
names(df) <- c('from', 'to')
This will work. data.frames are really just special lists after all.
If you wanted a one-liner, you could use setNames. I've also found this type of thing much more readable now using magrittr:
data$timestamp %>% range %>% as.Date %>% as.list %>% as.data.frame %>% setNames(c("from", "to")
Alternatively, you could cast via a matrix:
df <- as.data.frame(matrix(as.Date(range(data$timestamp)), ncol = 2))
names(df) <- c('from', 'to')
This will, however, strip the class (and other attributes) from the dates. If you instead set the dimensions of the vector using dim<-, then neither print nor as.data.frame will treat it as a matrix (because it still has the class Date).
To get round this, convert to Date after creating the data.frame:
df <- as.data.frame(matrix(range(data$timestamp), ncol = 2))
df[] <- lapply(df, as.Date)
names(df) <- c('from', 'to')
You can try :
range_timestamp <- c("2015-06-29 09:32:43.000 UTC", "2015-07-03 15:50:35.986 UTC")
df <- data.frame(from=as.Date(range_timestamp[1]), to=as.Date(range_timestamp)[2])
df
# from to
#1 2015-06-29 2015-07-03
Another option, using data.table and avoiding indexing:
require(data.table)
df <- `colnames<-`(data.frame(rbind(range_timestamp)), c("from","to"))
df <- setDT(df)[, lapply(.SD, as.Date)]
df
from to
1: 2015-06-29 2015-07-03
Or, as mentionned by #akrun in the comment:
require(data.table)
df <- setnames(setDT(as.list(as.Date(range_timestamp))), c('from', 'to'))[]
I was a few seconds too late with my suggestion. As I see, others have already answered. Anyway: here is an alternative that is similar to what you have attempted:
timestamp <-c("2015-06-29 09:32:43.000 UTC","2015-07-03 15:50:35.986 UTC")
df <- t(data.frame(as.Date(range(timestamp))))
colnames(df) <- c('from', 'to')
rownames(df) <- NULL
#> df
# from to
#[1,] "2015-06-29" "2015-07-03"
Related
Leaving the date column I would like to convert rest of the columns in a data frame from chr to numeric. How could I achieve this? There are many columns in the data frame and below is only an extract. Thanks.
Date RECORD Battery_V Data_logger_Temp_C VWC_CS7
2021-06-25 12:34:00 0 12.47 14.14 0.127
Suppose we have the data frame shown reproducibly in the Note at the end. Then convert all columns except the first as shown. No packages are used.
DF2 <- replace(DF, -1, lapply(DF[-1], as.numeric))
or
DF2 <- DF
DF2[-1] <- lapply(DF2[-1], as.numeric)
or we can convert all character columns using:
ok <- sapply(DF, is.character)
DF2 <- replace(DF, ok, lapply(DF[ok], as.numeric))
or
DF2 <- DF
ok <- sapply(DF2, is.character)
DF2[ok] <- lapply(DF2[ok], as.numeric)
Note
Lines <- " Date RECORD Battery_V Data_logger_Temp_C VWC_CS7
2021-06-25T12:34:00 0 12.47 14.14 0.127"
DF <- read.table(text = Lines, header = TRUE, colClasses = "character",
strip.white = TRUE)
DF$Date <- as.POSIXct(DF$Date, format = "%Y-%m-%dT%H:%M:%S")
library(dplyr)
df <- df %>%
mutate(across(.cols = -Date, as.numeric))
I would like to do something more efficient than
dataframe$col <- as.character(dataframe$col)
since I have many numeric columns.
In base R, we may either use one of the following i.e. loop over all the columns, create an if/else conditon to change it
dataframe[] <- lapply(dataframe, function(x) if(is.numeric(x))
as.character(x) else x)
Or create an index for numeric columns and loop only on those columns and assign
i1 <- sapply(dataframe, is.numeric)
dataframe[i1] <- lapply(dataframe[i1], as.character)
It may be more flexible in dplyr
library(dplyr)
dataframe <- dataframe %>%
mutate(across(where(is.numeric), as.character))
All said by master akrun! Here is a data.table alternative. Note it converts all columns to character class:
library(data.table)
data.table::setDT(df)
df[, (colnames(df)) := lapply(.SD, as.character), .SDcols = colnames(df)]
I have multiple character columns (around 20) that I would like to change all to date formats and drop the time using r. I've tried loops, mutate and apply.
Here is some sample data using just two columns
col1 = c("2017-04-01 23:00:00", "2017-03-03 00:00:01", "2017-04-02
00:00:01")
col2 = c("2017-04-10 08:41:49", "2017-04-10 08:39:48", "2017-04-10
08:41:51")
df <- cbind(col1, col2)
I've tried:
df <- df %>% mutate(df, funs(ymd))
and
df <- df %>% mutate(df, funs(mdy))
Both gave me an error. I've also tried putting all column names in a list and do a
for(i in namedlist) {
as_date(df[i])
glimpse(df)
}
That didn't work either.
I've tried to use the answer from Convert multiple columns to dates with lubridate and dplyr and that did not work either. That posts wanted certain variables to be converted. I want all of my variables to be converted so the var command doesn't apply.
Any suggestions to do this efficiently? Thank you.
If you're applying over all columns, you can do a very short call with lapply. I'll pass it here using data.table:
library( data.table )
setDT( df )
df <- df[ , lapply( .SD, as.Date ) ]
On your test data, this gives:
> df
col1 col2
1: 2017-04-01 2017-04-10
2: 2017-03-03 2017-04-10
3: 2017-04-02 2017-04-10
NOTE: your test data actually results in a matrix, so you need to convert it to a data.frame first (or directly to a data.table).
You can do the same thing with just base R, but I personally like the above solution better:
df <- as.data.frame( lapply( df, as.Date ) )
> df
col1 col2
1 2017-04-01 2017-04-10
2 2017-03-03 2017-04-10
3 2017-04-02 2017-04-10
EDIT: This time with the right wildcards for the as.Date function. I also added a reproducible example:
library(dplyr)
df <- data.frame(date_1 = c("2019-01-01", "2019-01-02", "2019-01-03"),
date_2 = c("2019-01-04", "2019-01-05", "2019-01-06"),
value = c(1,2,3),
stringsAsFactors = F)
str(df)
date_cols <- c("date_1", "date_2")
df_2 <- df %>%
mutate_at(vars(date_cols), funs(as.Date(., "%Y-%m-%d")))
str(df_2)
I am trying to set some variables as character and others as numeric, what I currently have is;
colschar <- c(1:2, 68:72)
colsnum <- c(3:67)
subset <- as.data.frame(lapply(data[, colschar], as.character), (data[, colsnum], as.numeric))
which returns an error.
I am trying to set columns 1:2 and 68:72 as a character and columns 3:67 all as numeric.
I suggest:
data[colschar] <- lapply(data[colschar], as.character)
data[colsnum] <- lapply(data[colsnum], as.numeric)
It should be better if you share an extract of your data. In any case you may try with tidiverse approach:
library(dplyr)
mydf_molt <- mydf %>%
mutate_at(.vars=c(1:2, 68:72),.funs=funs(as.character(.))) %>%
mutate_at(.vars=c(3:67),.funs=funs(as.numeric(.)))
I have a data frame with 300 columns which has a string variable somewhere which I am trying to remove. I have found this solution in stack overflow using lapply (see below), which is what I want to do, but using the dplyr package. I have tried using the mutate_each function but cant seem to make it work
"If your data frame (df) is really all integers except for NAs and garbage then then the following converts it.
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
You'll have a warning about NAs introduced by coercion but that's just all those non numeric character strings turning into NAs.
dplyr 0.5 now includes a select_if() function.
For example:
person <- c("jim", "john", "harry")
df <- data.frame(matrix(c(1:9,NA,11,12), nrow=3), person)
library(dplyr)
df %>% select_if(is.numeric)
# X1 X2 X3 X4
#1 1 4 7 NA
#2 2 5 8 11
#3 3 6 9 12
Of course you could add further conditions if necessary.
If you want to use this line of code:
df2 <- data.frame(lapply(df, function(x) as.numeric(as.character(x))))
with dplyr (by which I assume you mean "using pipes") the easiest would be
df2 = df %>% lapply(function(x) as.numeric(as.character(x))) %>%
as.data.frame
To "translate" this into the mutate_each idiom:
mutate_each(df, funs(as.numeric(as.character(.)))
This function will, of course, convert all columns to character, then to numeric. To improve efficiency, don't bother doing two conversions on columns that are already numeric:
mutate_each(df, funs({
if (is.numeric(.)) return(.)
as.numeric(as.character(.))
}))
Data for testing:
df = data.frame(v1 = 1:10, v2 = factor(11:20))
mutate_all works here, and simply wrap the gsub in a function. (I also assume you aren't necessarily string hunting, so much as trawling for non-integers.
StrScrub <- function(x) {
as.integer(gsub("^\\D+$",NA, x))
}
ScrubbedDF <- mutate_all(data, funs(StrScrub))
Example dataframe:
library(dplyr)
options(stringsAsFactors = F)
data = data.frame("A" = c(2:5),"B" = c(5,"gr",3:2), "C" = c("h", 9, "j", "1"))
with reference/help from Tony Ladson