selecting certain rows from R data frame

selecting certain rows from R data frame - r

I have this huge data frame that has servernames, Date, CPU, memory as the headers. There are multiple servers names. I would like to be able to select certain server name order by the date column and create time serious graphs
this is a small subset of the data frame:
Hostname Date 5 60 61 CPUAVG CPUAVG+Sev CPUMaximum MemoryAVG
1 server1 2012-01-29 01:00:00 23.79 NA NA 2.33 0.72 2.33 23.76
2 server1 2012-01-29 02:00:00 23.91 NA NA 2.86 2.38 2.86 23.82
3 server1 2012-01-29 03:00:00 25.65 NA NA 6.25 9.59 6.25 24.85
4 server2 2012-01-29 04:00:00 26.30 NA NA 18.41 31.09 18.41 25.87
5 server3 2012-01-29 05:00:00 24.33 NA NA 1.92 0.42 1.92 24.24
6 server3 2012-01-29 06:00:00 24.40 NA NA 2.65 1.79 2.65 24.31

Checkout the 'subset' command.
thisServer <- subset (servers, Hostname="server1")
Then to order the rows
thisServerSorted <- thisServer[order(thisServer$Date),]
Then you can plot from there.

#convert Date to a date field (if needed)
library(lubridate)
servers$Date <- ymd_hms(servers$Date)
#select the servers you need
SelectedServers <- subset(servers, Hostname %in% c("server1", "server3"))
library(ggplot2)
#no need for sorting with ggplot2
ggplot(SelectedServers, aes(x = Date, y = CPUAVG, colour = Hostname)) + geom_line()
ggplot(SelectedServers, aes(x = Date, y = CPUAVG)) + geom_line() + facet_wrap(~Hostname)

Related

Dividing table without using split in R

I have a datatable for a time period of 21 days with data measured every 10 seconds which looks like
TimeStamp ActivePower CurrentL1 GeneratorRPM RotorRPM WindSpeed
2017-03-05 00:00:10 2183.650 1201.0 1673.90 NA 10.60
2017-03-05 00:00:20 2216.200 1224.0 1679.70 NA 11.00
2017-03-05 00:00:30 2176.500 1203.5 NA 16.05 11.90
---
2017-03-25 23:59:40 2024.20 1150.0 1687.00 16.15 10.35
2017-03-25 23:59:50 1959.05 1106.0 1661.15 15.90 8.65
2017-03-26 00:00:00 1820.55 1038.0 1665.70 15.80 9.20
I want to divide it into 30 minute blocks and my colleague said I shouldn't use the split function since the data can also have timestamps where there is no data and that I should manually make a 30 minute interval duration.
I have done this so far:
library(data.table)
library(dplyr)
library(tidyr)
datei <- file.choose()
data_csv <- fread(datei)
datatable1 <- as.data.table(data_csv)
datatable1 <- datatable1[turbine=="UTHA02",]
datatable1[, TimeStamp:=as.POSIXct(get("_time"), tz="UTC")]
setkey(datatable1, TimeStamp)
startdate <- datatable1[1,TimeStamp]
enddate <- datatable1[nrow(datatable1), TimeStamp]
durationForInterval <- 30*60 #in seconds
curr <- startdate
datatable1[TimeStamp >= curr & TimeStamp < curr + durationForInterval]
So I manually made a 30 minute interval duration and got the first interval
time ActivePower CurrentL1 GeneratorRPM RotorRPM WindSpeed
1: 2017-03-05 00:00:10 2183.65 1201.0 1673.90 NA 10.60
2: 2017-03-05 00:00:20 2216.20 1224.0 1679.70 NA 11.00
3: 2017-03-05 00:00:30 2176.50 1203.5 NA 16.05 11.90
4: 2017-03-05 00:00:40 2267.95 1256.5 1685.85 NA 10.60
5: 2017-03-05 00:00:50 2533.15 1408.0 1693.30 16.20 12.40
---
176: 2017-03-05 00:29:20 2750.35 1531.0 1694.40 16.20 11.45
177: 2017-03-05 00:29:30 2930.40 1630.5 1668.25 NA 12.65
178: 2017-03-05 00:29:40 2459.55 1367.0 1680.25 15.90 12.15
179: 2017-03-05 00:29:50 2713.80 1508.5 1681.15 16.20 12.25
180: 2017-03-05 00:30:00 2395.20 1333.0 1667.75 16.00 11.75
But I only could do it for the first interval and I dont know how to do it for the rest. Is there something that I am missing or am I overthinking? Any help is appreciated!

This will create a column interval with a unique value for every 30 minutes.
datatable1[, interval := as.integer(TimeStamp, units = "secs") %/% (60L*30L)]
You could split on that column or use it for grouping operations.
split(datatable1, datatable1$interval) # or split(datatable1, by = "interval")

How to combine 2 rows in 1

I'm trying to reshape a data frame but I'm totally lost in how to proceed:
> test
# Time Entry Order Size Price S / L T / P Profit Balance
1 1 2017-01-11 00:00:00 buy 1 0.16 1.05403 1.0449 1.07838 NA NA
2 3 2017-01-24 16:00:00 s/l 1 0.16 1.04490 1.0449 1.07838 -97.28 9902.72
As you can see, we have 2 (or more) registers for one order ID. What I want to do is combine those 2 rows into one by adding several new columns: Exit (that's where the "s/l" entry of the second observation should go), Exit Price (there should go the data for the Price column on the second entry) and replace the NA from the first entry with the data of the second one on the Profit and Balance columns.
By the way, the original name of the Entry column is "Type" but I already changed that, so that's why it doesn't make that much sense of having the exit reason of the trade on a column called "Entry". So far I've only thought of extracting the data on several vectors and then just do a mutate on the first entry and dropping the second one, but I'm quite sure there's a better way of doing that. Also, that stone-age approach would be useless when applied to the whole data frame.
If possible, I'd like to stick to the tidyverse library to do this just for ease of replication. Thank you in advance for your suggestions!

I ended up sorting it out! My solution was to split the data frame in 2, reshape each half as needed, and then full joining them. Here's the initial data frame:
> head(backtest_table, n = 10)
# Time Type Order Size Price S / L T / P Profit Balance
1 1 2017.01.11 00:00 buy 1 0.16 1.05403 1.04490 1.07838 NA NA
2 2 2017.01.19 00:00 buy 2 0.16 1.05376 1.04480 1.07764 NA NA
3 3 2017.01.24 16:00 s/l 1 0.16 1.04490 1.04490 1.07838 -97.28 9902.72
4 4 2017.01.24 16:00 s/l 2 0.16 1.04480 1.04480 1.07764 -95.48 9807.24
5 5 2017.02.09 00:00 buy 3 0.15 1.05218 1.04265 1.07758 NA NA
6 6 2017.03.03 16:00 t/p 3 0.15 1.07758 1.04265 1.07758 251.75 10058.99
7 7 2017.03.29 00:00 buy 4 0.15 1.08826 1.07859 1.11405 NA NA
8 8 2017.04.04 00:00 close 4 0.15 1.08416 1.07859 1.11405 -41.24 10017.75
9 9 2017.04.04 00:00 sell 5 0.15 1.08416 1.09421 1.05737 NA NA
10 10 2017.04.07 00:00 sell 6 0.15 1.08250 1.09199 1.05718 NA NA
Here's the code I used to modify everything:
# Re-format data
library(lubridate)
# Separate entries and exits
entries <- backtest_table %>% filter(Type %in% c("buy", "sell"))
exits <- backtest_table %>% filter(!Type %in% c("buy", "sell"))
# Reshape entries and exits
# Entries
entries <- entries[-c(1, 9, 10)]
colnames(entries) <- c("Entry time", "Entry type", "Order", "Entry volume",
"Entry price", "Entry SL", "Entry TP")
entries$`Entry time` <- entries$`Entry time` %>% ymd_hm()
entries$`Entry type` <- as.factor(entries$`Entry type`)
# Exits
exits <- exits[-1]
colnames(exits) <- c("Exit time", "Exit type", "Order", "Exit volume",
"Exit price", "Exit SL", "Exit TP", "Profit", "Balance")
exits$`Exit time` <- exits$`Exit time` %>% ymd_hm()
exits$`Exit type` <- as.factor(exits$`Exit type`)
# Join re-shaped data
test <- full_join(entries, exits, by = c("Order"))
And here's the output of that:
> head(test, n = 10)
Entry time Entry type Order Entry volume Entry price Entry SL Entry TP Exit time
1 2017-01-11 buy 1 0.16 1.05403 1.04490 1.07838 2017-01-24 16:00:00
2 2017-01-19 buy 2 0.16 1.05376 1.04480 1.07764 2017-01-24 16:00:00
3 2017-02-09 buy 3 0.15 1.05218 1.04265 1.07758 2017-03-03 16:00:00
4 2017-03-29 buy 4 0.15 1.08826 1.07859 1.11405 2017-04-04 00:00:00
5 2017-04-04 sell 5 0.15 1.08416 1.09421 1.05737 2017-05-26 10:00:00
6 2017-04-07 sell 6 0.15 1.08250 1.09199 1.05718 2017-05-01 09:20:00
7 2017-04-19 sell 7 0.15 1.07334 1.08309 1.04733 2017-04-25 10:00:00
8 2017-05-05 sell 8 0.14 1.07769 1.08773 1.05093 2017-05-29 14:00:00
9 2017-05-24 sell 9 0.14 1.06673 1.07749 1.03803 2017-06-22 18:00:00
10 2017-06-14 sell 10 0.14 1.04362 1.05439 1.01489 2017-06-15 06:40:00
Exit type Exit volume Exit price Exit SL Exit TP Profit Balance
1 s/l 0.16 1.04490 1.04490 1.07838 -97.28 9902.72
2 s/l 0.16 1.04480 1.04480 1.07764 -95.48 9807.24
3 t/p 0.15 1.07758 1.04265 1.07758 251.75 10058.99
4 close 0.15 1.08416 1.07859 1.11405 -41.24 10017.75
5 t/p 0.15 1.05737 1.09421 1.05737 265.58 10091.18
6 s/l 0.15 1.09199 1.09199 1.05718 -94.79 9825.60
7 s/l 0.15 1.08309 1.08309 1.04733 -97.36 9920.39
8 t/p 0.14 1.05093 1.08773 1.05093 247.61 10338.79
9 t/p 0.14 1.03803 1.07749 1.03803 265.59 10504.05
10 s/l 0.14 1.05439 1.05439 1.01489 -100.33 10238.46
And that combined the observations where a trade was added that showed NAs on the last columns with the observations where that trade was closed, populating the last columns with the actual result and new account balance!
If someone has suggestions on how to improve the system please let me know!

log return calculation from matrix

when I have a dataframe named "Historical_Stock_Prices_R" like this
Date1 MSFT AAPL GOOGL
25-01-05 21.03 4.87 88.56
26-01-05 21.02 4.89 94.62
27-01-05 21.10 4.91 94.04
28-01-05 21.16 5.00 95.17
I use the following formulas to get a lsit of monthly max and monthly mean log return from daily price data file
return<- cbind.data.frame(date=Historical_Stock_Prices_R$Date1[2:nrow(Historical_Stock_Prices_R)],apply(Historical_Stock_Prices_R[,2:4],2,function(x) log(x[-1]/x[-length(x)])*100))
return$Date <- as.Date(return$date,format="%d-%m-%y")
RMax <- aggregate(return[,-1],
by=list(Month=format(return$Date,"%y-%m")),
FUN=max)
RMean <- aggregate(return[,-1],
by=list(Month=format(return$Date,"%y-%m")),
FUN=mean)
But now I have a matrix (not a dataframe) named "df" like this
AAPL.High ABT.High ABBV.High ACN.High ADBE.High
07-01-02 NA NA NA NA NA
03-01-07 12.37 24.74 NA 37 41.32
04-01-07 12.28 25.12 NA 37.23 41
05-01-07 12.31 25 NA 36.99 40.9
Now how can I calculate same monthly mean and monthly max using similar kind of code?

Incrementally add seconds of a timestamp column grouped by ID in R

I have a dataframe that is essentially a time series data.
Timestamp <- c("1/27/2015 18:28:16","1/27/2015 18:28:17","1/27/2015 18:28:19","1/27/2015 18:28:20","1/27/2015 18:28:23","1/28/2015 22:43:08","1/28/2015 22:43:09","1/28/2015 22:43:13","1/28/2015 22:43:15","1/28/2015 22:43:16"
)
ID <- c("A","A","A","A","A","B","B","B","B","B")
v1<- c(1.70,1.71,1.77,1.79,1.63,7.20,7.26,7.16,7.18,7.18)
df <- data.frame(Timestamp ,ID,v1)
Timestamp ID v1
1/27/2015 18:28:16 A 1.70
1/27/2015 18:28:17 A 1.71
1/27/2015 18:28:19 A 1.77
1/27/2015 18:28:20 A 1.79
1/27/2015 18:28:23 A 1.63
1/28/2015 22:43:08 B 7.20
1/28/2015 22:43:09 B 7.26
1/28/2015 22:43:13 B 7.16
1/28/2015 22:43:15 B 7.18
1/28/2015 22:43:16 B 7.18
Since I dont really care about the timestamp, I was thinking of creating a column called interval to plot this data in one plot.
I am wrongly creating the interval column by doing this
df$interval <- cut(df$Timestamp, breaks="sec")
I want to incrementally add the "secs" of the timestamp and put it in the interval column and this should by grouped by ID. By this I mean, Everytime it has a new ID, the interval column resets to 1 and then incrementally adds the timestamp (secs).
My desired output
Timestamp ID v1 Interval
1/27/2015 18:28:16 A 1.70 1
1/27/2015 18:28:17 A 1.71 2
1/27/2015 18:28:19 A 1.77 4
1/27/2015 18:28:20 A 1.79 5
1/27/2015 18:28:23 A 1.63 8
1/28/2015 22:43:08 B 7.20 1
1/28/2015 22:43:09 B 7.26 2
1/28/2015 22:43:13 B 7.16 6
1/28/2015 22:43:15 B 7.18 8
1/28/2015 22:43:16 B 7.18 9
I also would like to plot this using ggplot with interval vs v1 by ID and so we get 2 time series in the same plot. I will then extract features from it.
Please help me how to work around this problem so that I can apply it to a larger dataset.

One solution with data.table:
For the data:
library(data.table)
df <- as.data.table(df)
df$Timestamp <- as.POSIXct(df$Timestamp, format='%m/%d/%Y %H:%M:%S')
df[, Interval := as.numeric(difftime(Timestamp, .SD[1, Timestamp], units='secs') + 1) , by=ID]
which outputs:
> df
Timestamp ID v1 Interval
1: 2015-01-27 18:28:16 A 1.70 1
2: 2015-01-27 18:28:17 A 1.71 2
3: 2015-01-27 18:28:19 A 1.77 4
4: 2015-01-27 18:28:20 A 1.79 5
5: 2015-01-27 18:28:23 A 1.63 8
6: 2015-01-28 22:43:08 B 7.20 1
7: 2015-01-28 22:43:09 B 7.26 2
8: 2015-01-28 22:43:13 B 7.16 6
9: 2015-01-28 22:43:15 B 7.18 8
10: 2015-01-28 22:43:16 B 7.18 9
Then for ggplot:
library(ggplot2)
ggplot(df, aes(x=Interval, y=v1, color=ID)) + geom_line()
and the graph:

Calling a list of tickers in quantmod using R

I want to get some data from a list of Chinese stocks using quantmod.
The list is like below:
002705.SZ -- 002730.SZ (in this sequence, there are some tickers matched with Null stock, for example, there is no stock called 002720.SZ)
300357.SZ -- 300402.SZ
603188.SS
603609.SS
603288.SS
603306.SS
603369.SS
I want to write a loop to run all these stocks to get the data from each of them and save them into one data frame.

This should get you started.
library(quantmod)
library(stringr) # for str_pad
stocks <- paste(str_pad(2705:2730,width=6,side="left",pad="0"),"SZ",sep=".")
get.stock <- function(s) {
s <- try(Cl(getSymbols(s,auto.assign=FALSE)),silent=T)
if (class(s)=="xts") return(s)
return (NULL)
}
result <- do.call(cbind,lapply(stocks,get.stock))
head(result)
# X002705.SZ.Close X002706.SZ.Close X002707.SZ.Close X002708.SZ.Close X002709.SZ.Close X002711.SZ.Close X002712.SZ.Close X002713.SZ.Close
# 2014-01-21 15.25 27.79 NA 17.26 NA NA NA NA
# 2014-01-22 14.28 28.41 NA 16.56 NA NA NA NA
# 2014-01-23 13.65 27.78 33.62 15.95 19.83 NA 36.58 NA
# 2014-01-24 15.02 30.56 36.98 17.55 21.81 NA 40.24 NA
# 2014-01-27 14.43 31.26 40.68 18.70 23.99 26.34 44.26 NA
# 2014-01-28 14.18 30.01 44.75 17.66 25.57 28.97 48.69 NA
This takes advantage of the fact that getSymbols(...) returns either an xts object, or a character string with an error message if the fetch fails.
Note that cbind(...) for xts objects aligns according to the index, so it acts like merge(...).
This produces an xts object, not a data frame. To convert this to a data.frame, use:
result.df <- data.frame(date=index(result),result)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

selecting certain rows from R data frame - r

Checkout the 'subset' command. thisServer <- subset (servers, Hostname="server1") Then to order the rows thisServerSorted <- thisServer[order(thisServer$Date),] Then you can plot from there.

Related

Dividing table without using split in R

How to combine 2 rows in 1

log return calculation from matrix

Incrementally add seconds of a timestamp column grouped by ID in R

Calling a list of tickers in quantmod using R

Categories

Resources