Plotting time (HMS) with ggplot2 - r

I'm trying to plot a running sessionsI want to make a ggplot with:
x=distance (2.2KM, 5KM, 10KM , 12.8KM, Ziel)
Y= time (HMS)
I have the following data:
'data.frame': 16333 obs. of 6 variables:
$ Numéro : chr "6526" "5427" "6528" "6529" ...
$ X2.2km : chr "00:10:47.4" "00:08:58.2" "00:11:10.4" "00:09:27.3" ...
$ X5km : chr "00:26:05.0" "00:21:46.1" "00:27:13.5" "00:22:35.3" ...
$ X10km : chr "00:56:30.1" "00:45:59.3" "00:58:53.1" "00:47:51.7" ...
$ X12.8km : chr "01:14:24.7" "00:59:50.7" "01:17:35.0" "01:01:42.6" ...
$ Zielzeit: chr "01:37:40.0" "01:16:38.1" "01:41:53.0" "01:19:02.5" ...
the next step is to use melt function from library reshape2 and lubridate
xx<-melt(xx,id="Numéro")
####Using lubridate ####
xx$value<-hms(xx$value)
My problem is here when i try to plot simple graphics, i receive the following message
> ggplot(xx,aes(variable,value))+geom_point()
Error in x < range[1] : cannot compare Period to Duration:
coerce with 'as.numeric' first.
> ggplot(xx,aes(variable,value))+geom_line()
Error in x < range[1] : cannot compare Period to Duration:
coerce with 'as.numeric' first.)
DATASET
xx <- read.table(header=TRUE, text="
Numéro variable value
1 6526 X2.2km 10M 47.4S
2 5427 X2.2km 8M 58.2S
3 6528 X2.2km 11M 10.4S
4 6529 X2.2km 9M 27.3S
5 6530 X2.2km 8M 29.3S")
Thank for any kind of contributions .

Related

After converting to numeric still not in numeric format in R [duplicate]

This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 1 year ago.
I have an issue in converting data into the numeric format.
str(DfFilter)
output
'data.frame': 32 obs. of 5 variables:
$ InstanceType : chr " c1.xlarge" " c1.xlarge" " c1.xlarge" " c1.xlarge" ...
$ ProductDescription: chr " Linux/UNIX" " Linux/UNIX" " Linux/UNIX" " Linux/UNIX" ...
$ SpotPrice : num 0.052 0.0739 0.0747 0.0751 0.0755 ...
$ ymd_hms(Timestamp): POSIXct, format: "2021-05-16 06:26:40" "2021-05-16 00:58:55" "2021-05-16 06:46:50" ...
$ Timestamp : 'times' num 06:26:40 00:58:55 06:46:50 14:17:55 19:07:09 ...
..- attr(*, "format")= chr "h:m:s"
but when i run to check for numeric values as follow
is.numeric(DfFilter)
[1] FALSE
why is that so. Kindly help in understanding this issue. Thanks in advance.
With purrr package and based on the comments:
DfModel <- DfFilter %>%
purrr::keep(.p = function(x) is.numeric(x))
It will keep only the numeric variables
Filter with is.numeric could be used to get only numeric columns.
Filter(is.numeric, DfFilter)
# a c
#1 1 2.2
Another way to keep only numeric value in a data.frame the result of is.numeric used in sapply could be used for subsetting with [:
DfFilter[sapply(DfFilter, is.numeric)]
# a c
#1 1 2.2
Example dataset:
DfFilter <- data.frame(a=1, b="b", c=2.2)

Error in changing data into Date in [R]

I have a problem in converting a vector into date one by using as.Date.
Data is as below.
> new3<-read.csv("Total Load - Day Ahead _ Actual.csv",stringsAsFactors=F)
> colnames(new3)<- c("Date","Hour","Dayahead","Actual")
> str(new3)
'data.frame': 35044 obs. of 4 variables:
$ Date : chr "01-01-2015" "01-01-2015" "01-01-2015" "01-01-2015" ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead: chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
...
Here, I tried as.Data
new3$Date<-as.Date(new3$Date,"%d/%m/%Y")
The order of d,m,Y is right. But when I do this, it shows me NA in date info as below
> str(new3)
'data.frame': 35044 obs. of 4 variables:
$ Date : Date, format: NA NA NA NA ...
$ Hour : chr "0:00" "0:15" "0:30" "0:45" ...
$ Dayahead: chr "42955" "42412" "41901" "41355" ...
$ Actual : int 42425 42021 42068 41874 41230 40810 40461 40160 39958
...
I don't know what to do to fix it.
Can anyone help me out here? Thank you
The step doesn't seem right
new3$Date<-as.Date(new3$Date,"%d/%m/%Y")
You should try using
new3$Date<-as.Date(new3$Date,"%d-%m-%Y")
The separator for date in your date seems to be - and not /
I'll suggest looking into lubridate package as well. It allows you easy ways to convert date from character to date format.

Create column in R in a large database

My apologies if this question has already been answered, but I haven't found it. I'll post all my ideas to solve it. The problem is that the database is large and my PC cannot perform this calculation (core i7 and 8 GB RAM). I'm using Microsoft R Open 3.3.2 and RStudio 1.0.136.
I've trying to create a new column on a large database in R called tcm.RData (471 MB). My need is a column which divides Shape_Area by the sum of Shape_Area by COD (which I called ShapeSum). I first tried to do it in a single formula but, as it failed, I tried again in two steps with 1) summing up Shape_Area by COD and, if that succeed, to divide Shape_Area by ShapeSum.
> str(tcm)
Classes ‘data.table’ and 'data.frame': 26835293 obs. of 15 variables:
$ OBJECTID : int 1 2 3 4 5 6 7 8 9 10 ...
$ LAT : num -15.7 -15.7 -15.7 -15.7 -15.7 ...
$ LONG : num -58.1 -58.1 -58.1 -58.1 -58.1 ...
$ UF : chr "MT" "MT" "MT" "MT" ...
$ COD : num 510562 510562 510562 510562 510562 ...
$ AREA_97 : num 1130 1130 1130 1130 1130 ...
$ Shape_Area: num 255266.7 14875 25182.2 5503.9 95.5 ...
$ TYPE : chr "2" "2" "2" "2" ...
$ Nomes : chr NA NA NA NA ...
$ NEAR_DIST : num 376104 371332 371410 371592 371330 ...
$ tc_2004 : chr "AREA_URBANA" "DESFLORESTAMENTO_2004" "DESFLORESTAMENTO_2004" "DESFLORESTAMENTO_2004" ...
$ tc_2008 : chr "AREA_URBANA" "AREA_NAO_OBSERVADA" "AREA_NAO_OBSERVADA" "AREA_NAO_OBSERVADA" ...
$ tc_2010 : chr "AREA_URBANA" "PASTO_LIMPO" "PASTO_LIMPO" "PASTO_LIMPO" ...
$ tc_2012 : chr "AREA_URBANA" "PASTO_SUJO" "PASTO_SUJO" "PASTO_SUJO" ...
$ tc_2014 : chr "AREA_URBANA" "PASTO_LIMPO" "PASTO_LIMPO" "PASTO_SUJO" ...
- attr(*, ".internal.selfref")=<externalptr>
> tcm$ShapeSum <- tcm[, Shape_Area := sum(tcm$Shape_Area), by="COD"]
Error: cannot allocate vector of size 204.7 Mb
Error during wrapup: cannot allocate vector of size 542.3 Mb
I also tried the following codes, but all of them failed:
> tcm$ShapeSum <- apply(tcm[, c(Shape_Area)], 1, function(x) sum(x), by="COD")
Error in apply(tcm[, c(Shape_Area)], 1, function(x) sum(x), by = "COD") :
dim(X) must have a positive lenght
> tcm$ShapeSum <- mutate(tcm, ShapeSum = sum(Shape_Area), by="COD", package = "dplyr")
Error: cannot allocate vector of size 204.7 Mb
Error during wrapup: cannot allocate vector of size 542.3 Mb
> tcm$ShapeSum <- tcm[, transform(tcm, ShapeSum = sum(Shape_Area)), by="COD"]
> tcm$ShapeSum <- transform(tcm, aggregate(tcm$AreaShape, by=list(Category=tcm$COD), FUN=sum))
Error in aggregate.data.frame(as.data.frame(x), ...): no rows to aggregate
I thank very much for attention and for any suggestions to solve this problem.
We can use the data.table methods for creating the column as it is more efficient with the assignment (:=) which happens in place
library(data.table)
tcm[, ShapeSum := sum(Shape_Area), by = COD]
Or as #user20650 suggested it could be (based on the OP's description)
tcm[, ShapeSum := Shape_Area/sum(Shape_Area), by = COD]
library(data.table)
tcm <- fread("yout_tcm_file.txt")
tcm[, newColumn:=oldColumnPlusOne+1]
more:
https://cran.r-project.org/web/packages/data.table/vignettes/datatable-intro.html

R dataframe define column names at creation

I get monthly price value for the two assets below from Yahoo:
if(!require("tseries") | !require(its) ) { install.packages(c("tseries", 'its')); require("tseries"); require(its) }
startDate <- as.Date("2000-01-01", format="%Y-%m-%d")
MSFT.prices = get.hist.quote(instrument="msft", start= startDate,
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="m", retclass="its")
SP500.prices = get.hist.quote(instrument="^gspc", start=startDate,
quote="AdjClose", provider="yahoo", origin="1970-01-01",
compression="m", retclass="its")
I want to put these two into a single data frame with specified columnames (Pandas allows this now - a bit ironic since they take the data.frame concept from R). As below, I assign the two time series with names:
MSFTSP500.prices <- data.frame(msft = MSFT.prices, sp500= SP500.prices )
However, this does not preserve the column names [msft, snp500] I have appointed. I need to define column names in a separate line of code:
colnames(MSFTSP500.prices) <- c("msft", "sp500")
I tried to put colnames and col.names inside the data.frame() call but it doesn't work. How can I define column names while creating the data frame?
I found ?data.frame very unhelpful...
The code fails with an error message indicating no availability of as.its. So I added the missing code (which appears to have been successful after two failed attempts.) Once you issue the missing require() call you can use str to see what sort of object get.hist.quote actually returns. It is neither a dataframe nor a zoo object, although it resembles a zoo-object in many ways:
> str(SP500.prices)
Formal class 'its' [package "its"] with 2 slots
..# .Data: num [1:180, 1] 1394 1366 1499 1452 1421 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:180] "2000-01-02" "2000-01-31" "2000-02-29" "2000-04-02" ...
.. .. ..$ : chr "AdjClose"
..# dates: POSIXct[1:180], format: "2000-01-02 16:00:00" "2000-01-31 16:00:00" ...
If you run cbind on those two objects you get a regular matrix with dimnames:
> str(cbind(SP500.prices, MSFT.prices) )
num [1:180, 1:2] 1394 1366 1499 1452 1421 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:180] "2000-01-02" "2000-01-31" "2000-02-29" "2000-04-02" ...
..$ : chr [1:2] "AdjClose" "AdjClose"
You will still need to change the column names since there does not seem to be a cbind.its that lets you assign column-names. I would caution about using the data.frame method, since the object is might get confusing in its behavior:
> str( MSFTSP500.prices )
'data.frame': 180 obs. of 2 variables:
$ AdjClose :Formal class 'AsIs', 'its' [package ""] with 1 slot
.. ..# .S3Class: chr "AsIs" "its"
$ AdjClose.1:Formal class 'AsIs', 'its' [package ""] with 1 slot
.. ..# .S3Class: chr "AsIs" "its"
The columns are still S4 objects. I suppose that might be useful if you were going to pass them to other its-methods but could be confusing otherwise. This might be what you were shooting for:
> MSFTSP500.prices <- data.frame(msft = as.vector(MSFT.prices),
sp500= as.vector(SP500.prices) ,
row.names= as.character(MSFT.prices#dates) )
> str( MSFTSP500.prices )
'data.frame': 180 obs. of 2 variables:
$ msft : num 35.1 32 38.1 25 22.4 ...
$ sp500: num 1394 1366 1499 1452 1421 ...
> head(rownames(MSFTSP500.prices))
[1] "2000-01-02 16:00:00" "2000-01-31 16:00:00" "2000-02-29 16:00:00"
[4] "2000-04-02 17:00:00" "2000-04-30 17:00:00" "2000-05-31 17:00:00"
MSFT.prices is a zoo object, which seems to be a data-frame-alike, with its own column name which gets transferred to the object. Confer
tmp <- data.frame(a=1:10)
b <- data.frame(lost=tmp)
which loses the second column name.
If you do
MSFTSP500.prices <- data.frame(msft = as.vector(MSFT.prices),
sp500=as.vector(SP500.prices))
then you will get the colnames you want (though you won't get zoo-specific behaviours). Not sure why you object to renaming columns in a second command, though.

read.zoo works but then as.xts fails with "currently unsupported data type"

I've a csv file of daily bars, with just two lines:
"datestamp","Open","High","Low","Close","Volume"
"2012-07-02",79.862,79.9795,79.313,79.509,48455
(That file was an xts that was converted to a data.frame then passed on to write.csv)
I load it with this:
z=read.zoo(file='tmp.csv',sep=',',header=T,format = "%Y-%m-%d")
And it is fine as print(z) shows:
Open High Low Close Volume
2012-07-02 79.862 79.9795 79.313 79.509 48455
But then as.xts(z) gives: Error in coredata.xts(x) : currently unsupported data type
Here is the str(z) output:
‘zoo’ series from 2012-07-02 to 2012-07-02
Data:List of 5
$ : num 79.9
$ : num 80
$ : num 79.3
$ : num 79.5
$ : int 48455
- attr(*, "dim")= int [1:2] 1 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "Open" "High" "Low" "Close" ...
Index: Date[1:1], format: "2012-07-02"
I've so far confirmed it is not that 4 columns are num and one column is int, as I still get the error even after removing the Volume column. But, then, what could that error message be talking about?
As Sebastian pointed out in the comments, the problem is in the single row. Specifically the coredata is a list when read.zoo reads a single row, but something else (a matrix?) when there are 2+ rows.
I replaced the call to read.zoo with the following, and it works fine whether 1 or 2+ rows:
d=read.table(fname,sep=',',header=T)
x=as.xts(subset(d,select=-datestamp),order.by=as.Date(d$datestamp))

Resources