I am trying to transform my date dataset from chr to time format.
Here is my code :
time<-f1%>%
mutate(`ymd_hms(Timestamp)`=strftime(f1$`ymd_hms(Timestamp)`, tz="GMT", format = "%H:%M:%S"))%>%
transform(as.numeric(f1$SpotPrice))
chron(times=f1$`ymd_hms(Timestamp)`)
its showing error:
Error in chron(., times = f1$`ymd_hms(Timestamp)`) :
. and f1$`ymd_hms(Timestamp)` must have equal lengths
if i comment it
time<-f1%>%
mutate(`ymd_hms(Timestamp)`=strftime(f1$`ymd_hms(Timestamp)`, tz="GMT", format = "%H:%M:%S"))%>%
transform(as.numeric(f1$SpotPrice))
#chron(times=f1$`ymd_hms(Timestamp)`)
output is
'data.frame': 10078 obs. of 4 variables:
$ InstanceType : chr " a1.2xlarge" " a1.2xlarge" " a1.2xlarge" " a1.4xlarge" ...
$ ProductDescription: chr " Linux/UNIX" " Red Hat Enterprise Linux" " SUSE Linux" " Linux/UNIX" ...
$ SpotPrice : num 0.0671 0.1971 0.2171 0.1343 0.2643 ...
$ ymd_hms(Timestamp): chr "06:17:23" "06:17:23" "06:17:23" "12:15:54" ...
Also but i use this commend as a single variable as follows:
f2<-chron(times=DfUse$`ymd_hms(Timestamp)`)
it works and shows the output
'times' num [1:10078] 06:17:23 06:17:23 06:17:23 12:15:54 12:15:54 ...
- attr(*, "format")= chr "h:m:s"`
but i want to pipe it with my whole dataset not to use it alone as a single variable.
Thanks in advance.
There are few syntax errors and room for improvement in the code.
Although having ymd_hms(Timestamp) = on the left hand side works but you don't need that since it creates a column with that name. Only using Timestamp should be good.
Don't use $ in dplyr pipe.
Usually it is better to keep base R functions and dplyr functions separate. transform is in base R but you can use the same mutate pipe to turn SpotPrice to numeric.
library(dplyr)
time<-f1%>%
mutate(Timestamp = strftime(ymd_hms(Timestamp), tz="GMT", format = "%H:%M:%S"),
SpotPrice = as.numeric(SpotPrice),
Timestamp = chron::chron(times=Timestamp))
Related
I have a file with irregular quotes like the following:
"INDICATOR,""CTY_CODE"",""MGN_CODE"",""EVENT_NR"",""EVENT_NR_CR"",""START_DATE"",""PEAK_DATE"",""END_DATE"",""MAX_EXT_ON"",""DURATION"",""SEVERITY"",""INTENSITY"",""AVERAGE_AREA"",""WIDEST_AREA_PERC"",""SCORE"",""GRP_ID"""
"Spi-3,""AFG"","""",1,1,""1952-10-01"",""1952-11-01"",""1953-06-01"",""1952-11-01"",9,6.98,0.78,19.75,44.09,5,1"
It seems irregular because the first column is only wrapped in single quotes, whereas every subsequent column is wrapped in double quotes. I'd like to read it so that every column is imported without quotes (neither in the header, nor the data).
What I've tried is the following:
# All sorts of tidyverse imports
tib <- readr::read_csv("file.csv")
And I also tried the suggestions offered here:
# Base R import
DF0 <- read.table("file.csv", as.is = TRUE)
DF <- read.csv(text = DF0[[1]])
# Data table import
DT0 <- fread("file.csv", header =F)
DT <- fread(paste(DT0[[1]], collapse = "\n"))
But even when it imports the file in the latter two cases, the variable names and some of the elements are wrapped in quotation marks.
I used data.table::fread with the quote="" option (which is "as is").
Then I cleaned the names and data by eliminating all the quotes.
The dates could be converted too, but I didn't do that.
library(data.table)
library(magrittr)
DT0 <- fread('file.csv', quote = "")
DT0 %>% setnames(names(.), gsub('"', '', names(.)))
string_cols <- which(sapply(DT0, class) == 'character')
DT0[, (string_cols) := lapply(.SD, function(x) gsub('\\"', '', x)),
.SDcols = string_cols]
str(DT0)
Classes ‘data.table’ and 'data.frame': 1 obs. of 16 variables:
$ INDICATOR : chr "Spi-3"
$ CTY_CODE : chr "AFG"
$ MGN_CODE : chr ""
$ EVENT_NR : int 1
$ EVENT_NR_CR : int 1
$ START_DATE : chr "1952-10-01"
$ PEAK_DATE : chr "1952-11-01"
$ END_DATE : chr "1953-06-01"
$ MAX_EXT_ON : chr "1952-11-01"
$ DURATION : int 9
$ SEVERITY : num 6.98
$ INTENSITY : num 0.78
$ AVERAGE_AREA : num 19.8
$ WIDEST_AREA_PERC: num 44.1
$ SCORE : int 5
$ GRP_ID : chr "1"
- attr(*, ".internal.selfref")=<externalptr>
This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 1 year ago.
I have an issue in converting data into the numeric format.
str(DfFilter)
output
'data.frame': 32 obs. of 5 variables:
$ InstanceType : chr " c1.xlarge" " c1.xlarge" " c1.xlarge" " c1.xlarge" ...
$ ProductDescription: chr " Linux/UNIX" " Linux/UNIX" " Linux/UNIX" " Linux/UNIX" ...
$ SpotPrice : num 0.052 0.0739 0.0747 0.0751 0.0755 ...
$ ymd_hms(Timestamp): POSIXct, format: "2021-05-16 06:26:40" "2021-05-16 00:58:55" "2021-05-16 06:46:50" ...
$ Timestamp : 'times' num 06:26:40 00:58:55 06:46:50 14:17:55 19:07:09 ...
..- attr(*, "format")= chr "h:m:s"
but when i run to check for numeric values as follow
is.numeric(DfFilter)
[1] FALSE
why is that so. Kindly help in understanding this issue. Thanks in advance.
With purrr package and based on the comments:
DfModel <- DfFilter %>%
purrr::keep(.p = function(x) is.numeric(x))
It will keep only the numeric variables
Filter with is.numeric could be used to get only numeric columns.
Filter(is.numeric, DfFilter)
# a c
#1 1 2.2
Another way to keep only numeric value in a data.frame the result of is.numeric used in sapply could be used for subsetting with [:
DfFilter[sapply(DfFilter, is.numeric)]
# a c
#1 1 2.2
Example dataset:
DfFilter <- data.frame(a=1, b="b", c=2.2)
I am trying to create a table (in Snowflake db) with exactly the same column names as I keep in the R data.frame object:
'data.frame': 1 obs. of 26 variables:
$ Ship_To : chr "0002061948"
$ Del_Coll_Indicator : chr "D"
$ Currency : chr "GBP"
$ Total_Volume : num 0
$ Total_Quantity : num 0
...
There is no problem with the table creation:
dbWriteTable(con = my_db$con, name = "test5", value = df)
but all column names in the database are converted to upper cases:
'data.frame': 1 obs. of 26 variables:
$ SHIP_TO : chr "0002061948"
$ DEL_COLL_INDICATOR : chr "D"
$ CURRENCY : chr "GBP"
...
Is there any way to keep in the table original names from R's data frame?
As covered by Snowflake's SQL reference docs, when identifiers (such as column names) are unquoted at creation, Snowflake will upper case them, and treat them as case-insensitive. Any quoted identifiers will be kept as-is and treated as a case-sensitive identifier.
Alter the data frame column names (colnames(df)) to use a quoted identifier format via the dbQuoteIdentifier(my_db$con, each_column_name) DBI function. This should help preserve the casing.
I have the following dataframe:
str(dat2)
data.frame: 29081 obs. of 105 variables:
$ id: int 20 34 46 109 158....
$ reddit_id: chr "t1_cnas90f" "t1_cnas90t" "t1_cnas90g"....
$ subreddit_id: chr "t5_cnas90f" "t5_cnas90t" "t5_cnas90g"....
$ link_id: chr "t3_c2qy171" "t3_c2qy172" "t3_c2qy17f"....
$ created_utc: chr "2015-01-01" "2015-01-01" "2015-01-01"....
$ ups: int 3 1 0 1 2....
...
How can i change the datatype of reddit_id, subreddit_id and link_id from character to factor? I know how to do it one column by column, but as this is tedious work, i am searching for a faster way to do it.
I have tried the following, without success:
dat2[2:4] <- data.frame(lapply(dat2[2:4], factor))
From this approach. Its end up giving me an error message: invalid "length" argument
Another approach was to do it this way:
dat2 <- as.factor(data.frame(dat2$reddit_id, dat2$subreddit_id, dat2$link_id))
Result: Error in sort.list(y): "x" must be atomic for "sort.
After reading the error i also tried it the other way around:
dat2 <- data.frame(as.factor(dat2$reddit_id, dat2$subreddit_id, dat2$link_id))
Also without success
If some information are missing, I am sorry. I am a newbie to R and Stackoverflow...Thank you for your help!!!
Try with:
library("tidyverse")
data %>%
mutate_at(.vars = vars(reddit_id, subreddit_id, link_id)),
.fun = factor)
To take advantage of partial matching, use
data %>%
mutate_at(.vars = vars(contains("reddit"), link_id),
.fun = factor)
I've a csv file of daily bars, with just two lines:
"datestamp","Open","High","Low","Close","Volume"
"2012-07-02",79.862,79.9795,79.313,79.509,48455
(That file was an xts that was converted to a data.frame then passed on to write.csv)
I load it with this:
z=read.zoo(file='tmp.csv',sep=',',header=T,format = "%Y-%m-%d")
And it is fine as print(z) shows:
Open High Low Close Volume
2012-07-02 79.862 79.9795 79.313 79.509 48455
But then as.xts(z) gives: Error in coredata.xts(x) : currently unsupported data type
Here is the str(z) output:
‘zoo’ series from 2012-07-02 to 2012-07-02
Data:List of 5
$ : num 79.9
$ : num 80
$ : num 79.3
$ : num 79.5
$ : int 48455
- attr(*, "dim")= int [1:2] 1 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Open" "High" "Low" "Close" ...
Index: Date[1:1], format: "2012-07-02"
I've so far confirmed it is not that 4 columns are num and one column is int, as I still get the error even after removing the Volume column. But, then, what could that error message be talking about?
As Sebastian pointed out in the comments, the problem is in the single row. Specifically the coredata is a list when read.zoo reads a single row, but something else (a matrix?) when there are 2+ rows.
I replaced the call to read.zoo with the following, and it works fine whether 1 or 2+ rows:
d=read.table(fname,sep=',',header=T)
x=as.xts(subset(d,select=-datestamp),order.by=as.Date(d$datestamp))