How do I plot asset stock prices in R? - r

I'm trying to plot asset stock prices in R. I'm downloading the data in csv format from Yahoo Finance and then importing it to R so I can run some statistical tests on it and draw a few plots.
I'm currently trying to plot the closing price vs the date, and I'm not having a lot of success. R is just plotting it as a series of distinct points and won't join these points up with lines, despite me trying to use the argument type = "l".
price <- read.csv("~/Downloads/AAPL.csv")
plot(price$Date,price$Close,type="l")
I'm just grabbing the data from here: https://finance.yahoo.com/quote/AAPL/history?p=AAPL
I get an output like this every time, regardless of what kind of extra arguments I try.
For example, I tried to make it red, didn't change at all.
Thanks!

The problem is that pric$Date is a factor (categorical variable) and not a number. You can convert the date string to a Posix timestamp with as.POSIXlt, and then compute a floating point representation therefrom, e.g. year + yday/366.

Try this
price$Date = as.Date(price$Date)
plot(price$Date,price$AAPL.Close,type="l",col=4)
or better
library(quantmod)
fro = '2014-07-31'
Apple = getSymbols('AAPL',auto.assign = F,from=fro)
chartSeries(Apple,subset = "last 3 years")

You don't need to use a package unless you want to create candlestick charts.
df <- read.csv("AAPL.csv")
> str(df)
'data.frame': 254 obs. of 7 variables:
$ Date : Factor w/ 254 levels "2019-07-10","2019-07-11",..: 1 2 3 4 5 6 7 8 9 10 ...
$ Open : num 202 203 202 204 205 ...
$ High : num 204 204 204 206 206 ...
$ Low : num 202 202 202 204 204 ...
$ Close : num 203 202 203 205 204 ...
$ Adj.Close: num 201 199 201 203 202 ...
$ Volume : int 17897100 20191800 17595200 16947400 16866800 14107500 18582200 20929300 22277900 18355200 ...
df$Date <- as.Date(df$Date) # Otherwise it is treated as a factor variable
> str(df)
'data.frame': 254 obs. of 7 variables:
$ Date : Date, format: "2019-07-10" "2019-07-11" "2019-07-12" "2019-07-15" ...
$ Open : num 202 203 202 204 205 ...
$ High : num 204 204 204 206 206 ...
$ Low : num 202 202 202 204 204 ...
$ Close : num 203 202 203 205 204 ...
$ Adj.Close: num 201 199 201 203 202 ...
$ Volume : int 17897100 20191800 17595200 16947400 16866800 14107500 18582200 20929300 22277900 18355200 ...
plot(y=df$Close, x=df$Date, col="red", type = "l") # look at ?plot for more details

Related

Create a dataframe i nR

I would like to create a dataframe with 117 columns and 90 rows, the first ones being: ID, date1, date2, Category, DR1, DRM01, DRM02, DRM03 .... up to DRM111. For the first column, it would have values ranging from 1 to 3. In date1 it would have a fixed value, which would be "2022-01-05", in date2, it would have values between 2021-12-20 to the maximum that it gives. Category can be ABC or ERF, in DR1 would be values that would vary from 200 to 250, and finally, in DRM columns, would be values that would vary from 0 to 300. Is it possible to create a dataframe like this?
I wondering if this is an effort at simulation. The first few tasks seem blindly obvious but the last call to replicate with simplify=FALSE might have been a bit less than trivial.
test <- data.frame( ID = rep(1:3, length=90),
date1 = as.Date( "2022-01-05"),
date2= seq( as.Date("2021-12-20"), length.out=90, by=1),
#Category = ???? so far not specified
DR1 = sample( 200:250, 90, repl=TRUE), #need repl is length need is long
setNames( replicate(111, { sample(0:300, 90)}, simplify=FALSE) ,
nm=paste("DRM",1:111) ) )
Snipped the last 105 rows of the output from str:
str(test)
'data.frame': 90 obs. of 115 variables:
$ ID : int 1 2 3 1 2 3 1 2 3 1 ...
$ date1 : Date, format: "2022-01-05" "2022-01-05" "2022-01-05" "2022-01-05" ...
$ data2 : Date, format: "2021-12-20" "2021-12-21" "2021-12-22" "2021-12-23" ...
$ DR1 : int 229 218 240 243 221 202 242 221 237 208 ...
$ DRM.1 : int 41 238 142 100 19 56 224 152 85 84 ...
$ DRM.2 : int 150 185 141 55 34 83 88 105 165 294 ...
$ DRM.3 : int 144 22 237 174 78 291 120 63 261 236 ...
$ DRM.4 : int 223 105 263 214 45 226 129 80 182 15 ...
$ DRM.5 : int 27 108 288 237 129 251 150 70 300 243 ...
# additional rows elided
The last item in that construction returns a list that has 111 "columns" with ascending numbered names. I admit to being puzzled about why there were periods in the DRM names but then realized that the data.frame function uses check.names to make sure they are legitimate, so the spaces from paste were converted to periods. If you don't like periods then use paste0.

How can I fully extract all elements into a data frame?

I retrieve some data from an API and convert it to a flat structure.
library(httr)
url <- "https://api.carbonintensity.org.uk/intensity/2019-11-25/2019-11-26"
raw_original <- GET(url)
raw <- rawToChar(raw_original$content)
raw <- fromJSON(raw)
api_extr <- do.call("rbind", lapply(raw, data.frame))
At first, all seems well (a 5-column data frame):
> head(api_extr)
from to intensity.forecast intensity.actual intensity.index
1 2019-11-24T23:30Z 2019-11-25T00:00Z 210 200 moderate
2 2019-11-25T00:00Z 2019-11-25T00:30Z 199 200 moderate
3 2019-11-25T00:30Z 2019-11-25T01:00Z 200 198 moderate
4 2019-11-25T01:00Z 2019-11-25T01:30Z 204 189 moderate
5 2019-11-25T01:30Z 2019-11-25T02:00Z 199 191 moderate
6 2019-11-25T02:00Z 2019-11-25T02:30Z 192 193 moderate
However, one of the columns (intensity) is in fact a data frame which contains three further columns.
> str(api_extr)
'data.frame': 49 obs. of 3 variables:
$ from : chr "2019-11-24T23:30Z" "2019-11-25T00:00Z" "2019-11-25T00:30Z" "2019-11-25T01:00Z" ...
$ to : chr "2019-11-25T00:00Z" "2019-11-25T00:30Z" "2019-11-25T01:00Z" "2019-11-25T01:30Z" ...
$ intensity:'data.frame': 49 obs. of 3 variables:
..$ forecast: int 210 199 200 204 199 192 191 194 197 192 ...
..$ actual : int 200 200 198 189 191 193 197 193 193 194 ...
..$ index : chr "moderate" "moderate" "moderate" "moderate" ...
I would expect the data frame to have five columns whereas instead it only has three.
At first glance this may seem insignificant, but the problems will start when it comes to working with the data (i.e. plotting it).
How can I achieve five columns?
You can pass the URL directly to fromJSON and flatten the result in a single step.
library(jsonlite)
url <- "https://api.carbonintensity.org.uk/intensity/2019-11-25/2019-11-26"
df <-fromJSON(url, flatten = TRUE)[[1]]
str(df)
'data.frame': 49 obs. of 5 variables:
$ from : chr "2019-11-24T23:30Z" "2019-11-25T00:00Z" "2019-11-25T00:30Z" "2019-11-25T01:00Z" ...
$ to : chr "2019-11-25T00:00Z" "2019-11-25T00:30Z" "2019-11-25T01:00Z" "2019-11-25T01:30Z" ...
$ intensity.forecast: int 210 199 200 204 199 192 191 194 197 192 ...
$ intensity.actual : int 200 200 198 189 191 193 197 193 193 194 ...
$ intensity.index : chr "moderate" "moderate" "moderate" "moderate" ...

TM - Clustering data with special date variable

Ive got the following data from tripadvisor:
'data.frame': 682 obs. of 6 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ id : Factor w/ 674 levels "id","rn106322397",..: 672 671 670 669 668 667 666 665 664 663 ...
$ quote : Factor w/ 606 levels "\"Picturesque Lake Konigssee\"",..: 389 139 113 149 384 39 176 598 199 603 ...
$ rating : Factor w/ 6 levels "1","2","3","4",..: 3 5 5 5 4 5 5 5 4 5 ...
$ date : Factor w/ 505 levels "date","Reviewed 1 August 2014\n",..: 200 200 427 427 427 443 434 351 313 494 ...
$ reviewnospace: Factor w/ 674 levels "- Good car parking facilities- Organized boat trips- Ensure that you have enough time at hand for the boat trip",..: 624 573 144 211 507 26 351 672 451 249 ...
I try to cluster the data on the basis of the date, to get two groups - winter and summer vacationers. With this clustering i want to analyse the reviews afterwards. I am using the tm package and tried it with the following code:
> x <- read.csv ("seeganz.csv", header = TRUE, stringsAsFactors = FALSE, sep = ",")
> corp <- VCorpus(VectorSource(x$reviewnospace), readerControl = list(language = "eng"))
> meta(corp,tag = "date") <- x$date
> idx <- meta(corp, "date") == 'December'
But it is not working as the content say 0 documents:
> corp [idx]
<<VCorpus>>
Metadata: corpus specific: 0, document level (indexed): 1
Content: documents: 0
As the date has the structure "Reviewed 1 August 2014", how do I have to adapt this code to get, for example just the reviews from Nov - Feb?
Do you have any idea how I can solve this problem?
Thank you.
Generic Approach:
Use substr(date, 10, nchar(date)) to get to 1 August 2014 call this new vector dateNew
Use normal date function e.g. as.Date(dateNew,...) to change dateNew into a vector of type Date where you can do subsetting/subtraction and other operations
References from http://www.statmethods.net/input/dates.html
# use as.Date( ) to convert strings to dates
mydates <- as.Date(c("2007-06-22", "2004-02-13"))
# number of days between 6/22/07 and 2/13/04
days <- mydates[1] - mydates[2]

Failure passing plots to `saveHTML(){animation}` in R

The background:
I am trying to create an animation with saveHTML(){animation} to show how runner's pace between two consecutive laps changes over time. I tried to pass the plots into the expr block with the following code:
MakeSpLaps <- function(finishers.pace, lap1, lap2, start.lap) {
sp <- qplot(lap1, lap2, data=finishers.pace,
color=gender, alpha = I(.7) )
# + additional elements removed;
return(sp)
}
MakeSpLapsAnimation <- function(){
brk <- seq(0, 3000, 60)
lbl <- seconds_to_period(brk)
oopt = ani.options(interval = 0.2, nmax = 20)
saveHTML({
par(mar = c(4, 4, 0.5, 0.5))
for (i in 3:11){
# The problematic line below
MakeSpLaps(p, p[[i]], p[[i+1]], i-2)
ani.pause()
}
}, img.name = "lap_plot", imgdir = "lap_dir", htmlfile = "laps.html",
autobrowse = FALSE, title = "Plots of consecutive laps.",
description = "Plots of consecutive laps.")
}
Where the data.frame p looks like this:
'data.frame': 17051 obs. of 11 variables:
$ bib : int 10001 10003 10004 10005 10006 10009 10010 10011 10012 10013 ...
$ gender : Factor w/ 3 levels "","F","M": 3 3 3 3 3 3 3 3 3 3 ...
$ X5km_lap : num 290 204 196 315 228 ...
$ X10km_lap : num 280 204 201 322 225 ...
$ X15km_lap : num 283 205 204 326 235 ...
$ X20km_lap : num 282 206 204 342 229 ...
$ X25km_lap : num 280 210 205 371 235 ...
$ X30km_lap : num 280 225 216 407 254 ...
$ X35km_lap : num 279 274 231 404 267 ...
$ X40km_lap : num 284 251 257 357 262 ...
$ Finish_lap: num 289 242 247 333 265 ...
The problem and question:
Running MakeSpLaps(p, p[[3]], p[[4]], 1) alone creates the graph I want, but when I plug it to saveHTML(), no plot was created except for a blank PNG. The HTML files are created with the following warning. How can I correctly pass the plots to the function saveHTML()?
animation option 'nmax' changed: 20 --> 1
animation option 'nmax' changed: 1 --> 20
HTML file created at: laps.html
The actual code is here: https://github.com/hktang/rscraper/blob/3d542b18b5f6fbf1a1fa31b0bd3936f1179cdc59/r/visuals.R#L145

can't draw the grouped value above stacked bar plot in ggplot2

I have a ggplot2 question, I run the code below show the stacked barplot without add value above each bar correctly:
p=ggplot(data=essnn)
p+geom_bar(binwidth=0.5,stat="identity")+ #
aes(x=reorder(classname,-amount,sum), y=amount, label=amount, fill = sort(year))+
theme()
I want add the sum amount grouped by year in each class, and here is my code:
+geom_text(aes(x=classes,y=total,label=total), data=essnnta, fill=NULL, size=3)
But an error message appear:
Error in fill = year, can not find object "year"
That's my problem: why the object "year" can be found when I draw stack bar plot without add the sum amount grouped by year in each class, but when I add the sum amount grouped by year, the error appear?
> str(essnn)
'data.frame': 48619 obs. of 15 variables:
$ id : int 2006051337 2006051337 2006051337 2006051337 2006051337 2006051337 2004070648 2006031360 2006031360 2004070062 ...
$ gender : Factor w/ 3 levels "","F","M": 3 3 3 3 3 3 3 3 3 3 ...
$ age : num 30 30 30 30 30 30 38 43 43 37 ...
$ class : Factor w/ 92 levels "100ab","100aa",..: 18 18 18 18 18 18 18 18 18 18 ...
$ classname: Factor w/ 1136 levels "cad"," Office2010",..: 111 111 111 111 111 111 116 107 107 107 ...
$ grade : num 7 5 6 8 3 4 1 4 3 2 ...
$ year : Factor w/ 6 levels "98","99","100",..: 3 3 3 3 2 2 4 5 5 3 ...
$ ses : num 212 210 211 213 207 208 217 221 220 210 ...
$ date : int 1010421 1001115 1010214 1010701 1000411 1000627 1020424 1030304 1021121 1001108 ...
$ money : num 5800 5800 5800 5800 5200 5200 3000 0 5500 5500 ...
$ discount : num 1160 1160 1160 1160 1040 1040 600 0 275 550 ...
$ amount : num 4640 4640 4640 4640 4160 ...
$ idc : Factor w/ 7 levels "在校生","校友",..: 2 2 2 2 2 2 2 7 7 7 ...
$ mdy : Date, format: "2012-04-21" "2011-11-15" "2012-02-14" "2012-07-01" ...
$ day : num 1123 1281 1190 1052 1499 ...
> str(essnnta)
'data.frame': 10 obs. of 2 variables:
$ classes: Factor w/ 10 levels "JD","JF",..: 1 7 8 4 6 10 3 5 2 9
$ total : num 55603526 43708950 43555010 35649129 33214372 ...
Your problem might be that your x-axes are not the same in the two data frames. So ggplot does not know which value corresponds with which stack. I am not sure about this as I don't understand the way you define your x axis in the original barplot. I also find it a bit strange to define the aes outside of the ggplot function or the geom_bar. But that might just be me be used to a different kind of syntax.
All in all I find it difficult to help you as you do not provide any reproducible example.
Here is a small bit of data, and a plot that sort of works. If you supplement your question with your data (or a subset of it), see if this works. You may also want to position the label at the top of each bar.
essnn <- data.frame(year = c(98,99,100,101,102),
classname = c("a", "b", "c", "d", "e"),
amount = c(1e6, 2e6,3e6,4e6,5e6))
essnnta <- data.frame(total = c(10, 20, 30, 40, 50))
ggplot(data=essnn, aes(x=reorder(classname,-amount, sum), y=amount, fill = year)) +
geom_bar(binwidth=0.5, stat="identity", position = "stack") +
geom_text(aes(x=essnn$classname, y=essnnta$total, label=essnnta$total), size=3) # not "classes"

Resources