Yahoo financial data in r with zoo [closed] - r

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Hello i want to import a . csv file in r, i have the following code :
fbmonthly<-read.zoo("E:\\R\\Stockforecast\\Data\\AAPLmonthly.csv",sep=",",header= TRUE, format = '%m/%Y', FUN=as.Date)
Although i have this error :
Error in read.zoo("E:\R\Stockforecast\Data\AAPLmonthly.csv", sep = ",", :
index has bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
my csv file looks like 01-03-13 0 0 0 0 0 0
01-04-13 63.128571 63.607143 55.014286 63.254284 47.519821 2740872400
01-05-13 63.494286 66.535713 59.842857 64.247147 48.265705 2361882600
01-06-13 64.389999 64.918571 55.552856 56.647144 44.609531 1754634000
01-07-13 57.527142 65.334282 57.317142 64.647141 50.909512 1634528700
01-08-13 65.10714 73.391426 64.751427 69.602859 54.812115 2014584600
01-09-13 70.442856 72.559998 63.888573 68.10714 56.215424 2157735300
01-10-13 68.349998 77.035713 68.325714 74.671425 61.633572 1959433000
01-11-13 74.860001 79.761429 73.197144 79.438568 65.568352 1306288900
01-12-13 79.714287 82.162857 76.971428 80.145714 68.953758 1764349300
01-01-14 79.382858 80.028572 70.507141 71.514282 61.527653 2191488600
01-02-14 71.80143 78.741432 71.328575 75.177139 64.679031 1470091700
01-03-14 74.774284 78.428574 74.687141 76.677139 68.836685 1250424700
01-04-14 76.822861 85.632858 73.047142 84.298569 75.678795 1608765200
01-05-14 84.571426 92.024284 82.904289 90.428574 81.181992 1433917100
01-06-14 90.565712 95.050003 88.928574 92.93 86.802559 1206934800
01-07-14 93.519997 99.440002 92.57 95.599998 89.296494 1035086000
01-08-14 94.900002 102.900002 93.279999 102.5 95.741524 937077000
can you please help me?
Thanks

It's not a "CSV" file. It's delimiter appears to be whitespace, which is what read.zoo would use by default. No header, also the default for read.zoo. Need to correct the date format:
read.zoo(text="01-03-13 0 0 0 0 0 0
01-04-13 63.128571 63.607143 55.014286 63.254284 47.519821 2740872400",
format = '%m-%d-%y')
V2 V3 V4 V5 V6 V7
2013-01-03 0.00000 0.00000 0.00000 0.00000 0.00000 0
2013-01-04 63.12857 63.60714 55.01429 63.25428 47.51982 2740872400

fbmonthly<-read.zoo("E:\\R\\Stockforecast\\Data\\AAPLmonthly.csv",sep=",",header= TRUE, format = '%d-%m-%Y', FUN=as.Date)
If you have header and comma separated file it seems you have the date format wrong.

so you say to use fbmonthly<-read.zoo("E:\\R\\Stockforecast\\Data\\AAPLmonthly.csv",sep=".",header= TRUE, format = '%m-%Y', FUN=as.yearmon)
instead?

Related

Conversion of strings to numbers

Hello I'm looking for a way to turn user inputted strings into matrices for example: 28 = SPACE, 27 = ?, 26 = 0, 25 = A, 24=B 23=C 22=D 21=E 20=F 19=G 18=H
17=I 16=J 15=K 14=L 13=M 12=N 11=O 10=P 9=Q 8=R 7=S 6=T 5=U 4=V 3=W 2=X 1=Y 0=Z
"HI HOW ARE YOU?" -> "[18 17 28 18 11][3 28 25 8 21][28 1 11 5 27]"
wherein each letter/symbol of the string is converted to a numerical value (special attention to spacebar I really don't know how to turn space into numbers). I'll be using these matrices to make a cryptograph
You could use utf8ToInt
x <- "HI HOW ARE YOU?"
We need pmin to get your condition 28 = SPACE right.
pmin(abs(utf8ToInt("HI HOW ARE YOU?") - utf8ToInt("Z")), 28)
# [1] 18 17 28 18 11 3 28 25 8 21 28 1 11 5 27
From ?utf8ToInt :
Conversion of UTF-8 encoded character vectors to and from integer vectors representing a UTF-32 encoding.
First step is
utf8ToInt("HI HOW ARE YOU?")
[1] 72 73 32 72 79 87 32 65 82 69 32 89 79 85 63
from which we substract utf8ToInt("Z"), i.e. 90 because you wrote 0=Z.
Call abs on the result to get positive numbers.
abs(utf8ToInt("HI HOW ARE YOU?") - utf8ToInt("Z"))
# [1] 18 17 58 18 11 3 58 25 8 21 58 1 11 5 27
The last piece is your condition 28 = SPACE, which is where pmin helps you out.

Writing functions: Creating data processing functions with R software [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Hello fellow "R" users!
Please spare me some of your time on helping me with the use of "R" software(Beginner) regarding "Data processing function", wherein I have three (3) different .csv files named "x2013, x2014, x2015" that has the same 6 columns as per respective year based on the image below: Problem and started typing the commands:
filenames=list.files()
library(plyr)
install.packages("plyr")
import.list=adply(filenames,1,read.csv)
Although I just really wanted to summarize all the calls from the three source (csv). Any kind of help would be appreciated. Thank you for assisting me!
If you want to summarize the results of read.csv into one data.frame you can use the following approach with do.call and rbind, given that csv-files has the same amount of columns. The code below takes all csv files (the amount of columns should be the same) from the project home directory and concatenate into one data.frame:
# simulation of 3 data.frames with 6 columns and 10 rows
df1 <- as.data.frame(matrix(1:(10 * 6), ncol = 6))
df2 <- df1 * 2
df3 <- df1 * 3
write.csv(df1, "X2012.csv")
write.csv(df2, "X2013.csv")
write.csv(df3, "X2014.csv")
# Load all csv files from home directory
filenames <- list.files(".", pattern = "csv$")
import.list<- lapply(filenames, read.csv)
# concatenate list of data.frames into one data.frame
df_res <- do.call(rbind, import.list)
str(df_res)
Output is a data.frame with 6 columns and 30 rows:
'data.frame': 30 obs. of 7 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ V1: int 1 2 3 4 5 6 7 8 9 10 ...
$ V2: int 11 12 13 14 15 16 17 18 19 20 ...
$ V3: int 21 22 23 24 25 26 27 28 29 30 ...
$ V4: int 31 32 33 34 35 36 37 38 39 40 ...
$ V5: int 41 42 43 44 45 46 47 48 49 50 ...
$ V6: int 51 52 53 54 55 56 57 58 59 60 ...

How do I get rid of commas and periods, etc in R? [duplicate]

This question already has answers here:
How to load comma separated data into R?
(2 answers)
Closed 6 years ago.
This is my data set:
Depth.Fe
1 0,14.21
2 3,19.35
3 10,17.22
4 14,15.87
5 23,13.62
6 30,16.31
7 36,14.13
8 48,13.95
9 59,15
10 66,14.23
11 68,16.81
12 81,15.93
13 94,16.02
14 96,17.85
15 102,17.02
16 115,15.87
17 121,19.84
18 130,16.94
19 163,16.72
20 168,19.2
21 205,20.41
22 239,16.88
23 251,18.74
24 283,16.67
25 297,18.56
26 322,18.87
27 335,20.81
28 351,24.52
29 370,25.03
30 408,25.11
31 416,23.28
32 419,22.56
33 425,19
34 429,20.53
35 443,19.08
36 447,22.83
37 465,21.06
38 474,24.96
39 493,19.12
40 502,22.24
41 522,26.88
42 550,21.15
43 558,28.92
44 571,27.96
45 586,25.03
46 596,26.27
I want depth and Fe to be separated as individual columns, but nothing I try is working.
please help
First of all, #akrun is definitely right in his comment to your post. If this is a dataset imported from somewhere, then follow his comment.
Assuming that somehow you were handed this weird dataset, I would try this:
df <- data.frame(matrix(as.numeric(unlist(strsplit(df$Depth.Fe,split=","))),nrow=2,byrow = T),stringsAsFactors = F)
colnames(df) <- c("Depth","Fe")
This would take a dataset that looks like this:
Depth.Fe
1 0,14.21
2 3,19.35
to this:
Depth Fe
1 0 14.21
2 3 19.34

Saving an output from R into excel format?

After running the predict function for glm i get an output in the below format:
1 2 3 4 5 6 7 8 9 10 11 12
3.954947e-01 8.938624e-01 7.775473e-01 1.294646e-02 3.954947e-01 9.625746e-01 9.144256e-01 4.739872e-01 1.443219e-01 1.180850e-04 2.138978e-01 7.775473e-01
13 14 15 16 17 18 19 20 21 22 23 24
5.425436e-03 2.069844e-04 2.723969e-01 4.739872e-01 9.144256e-01 1.091998e-01 2.070056e-02 5.114936e-01 1.443219e-01 5.922029e-01 7.578099e-02 8.937642e-01
25 26 27 28 29 30 31 32 33 34 35 36
6.069970e-02 6.069970e-02 1.337947e-01 1.090992e-01 4.841467e-02 9.205547e-01 3.954947e-01 3.874915e-05 3.855242e-02 1.344839e-01 6.318574e-04 2.723969e-01
37 38 39 40 41 42 43 44 45 46 47 48
7.400276e-04 8.593199e-01 6.666800e-01 2.069844e-04 8.161623e-01 4.916555e-05 3.060374e-02 3.402079e-01 2.256598e-03 9.363767e-01 6.116082e-01 3.940969e-03
49 50 51 52 53 54 55 56 57 58 59 60
7.336723e-01 2.425257e-02 3.369967e-03 5.624262e-02 1.090992e-01 1.357630e-06 1.278169e-04 3.046189e-01 8.938624e-01 4.535894e-01 5.132348e-01 3.220426e-01
61 62 63 64 65 66 67 68 69 70 71 72
3.366492e-03 1.357630e-06 1.014721e-01 1.294646e-02 9.144256e-01 1.636988e-02 2.070056e-02 1.012835e-01 5.000274e-03 8.165247e-02 1.357630e-06 8.033850e-03
IS there any code by which I can get the complete output vertically or in an excel format? Thank you in advance!
The simplest way is to write a character separated value file using a comma as the delimiter:
[Acknowledge Roland's comment] write.csv(data.frame(predict(yourGLM)), "file.csv")
Excel reads these automatically, especially if you save the file with a .csv extension.
If its just a matter of viewing it vertically first create the data:
# create test data
example(predict.glm)
pred <- predict(budworm.lg)
1) Separate R Window Use View to display it in a separate R window:
View(pred)
2) R Console to display it on the R console vertically:
data.frame(pred)
3) Browser to display it in the browser vertically:
library(R2HTML)
HTMLStart(); HTML(data.frame(pred)); w <- HTMLStop()
browseURL(w)
4) Excel to display it in Excel vertically using w we just computed:
shell(paste("start excel", w))

Transpose with multiple variables and more than one metrics in R

I'm previously a SAS user - since I don't have SAS anymore I need to learn to use R for work.
The dataset has the following column:
market date sitename impression clicks
I want to transpose it into:
market date sitename-impression sitename-clicks
I think in SAS I used to do:
Proc Transpose
by market date;
id sitename;
var impression clicks;
run;
I do have a book on R and googled a lot, but couldn't find the solution that works...
Would really appreciate if anyone can help.
Thanks in advance!!!
Let me start by saying welcome to stackoverflow. Glad to have anew user. When you ask a question it's helpful and encouraged for you to provide the code you're using and a reproducible data set that looks like the original. This is called a minimal reproducible example. To get a data set into here you can use several options, here are two: use dput() around the object name and cut and paste what is displayed in the console or just post the dataframe directly. For the code provide all the code necessary to replicate your problem. I hope you find this helpful for future questions you'll ask.
I may not fully understand but I think you want to transform, not transpose, the data.
dat <- data.frame(market=rnorm(10), date=rnorm(10), #let's create a data set
sitename=rnorm(10), impression=rnorm(10), clicks=rnorm(10))
dat #look at it (I pasted it below)
# > dat
# market date sitename impression clicks
# 1 -0.9593797 -0.08411994 1.6079129 -0.5204772 -0.31633966
# 2 -0.5088689 1.78799500 -0.2469315 1.3476964 -0.04344779
# 3 -0.1527465 0.81673996 1.7824969 -1.5531260 -1.28304384
# 4 -0.7026194 0.52072913 -0.1174356 0.5722210 -1.20474443
# 5 -0.4537490 -0.69139062 1.1124277 -0.2452974 -0.33025320
# 6 0.7466588 0.36318337 -0.4623319 -0.9036768 -0.65754302
# 7 0.8007612 2.59588554 0.1820732 0.4318629 -0.36308748
# 8 1.0781715 -1.01512734 0.2297475 0.9219439 -1.15687902
# 9 0.3731450 -0.19004572 0.5190749 -1.4020371 -0.97370295
# 10 0.7724259 1.76528303 0.5781786 -0.5490849 -0.83819036
#now to create the new columns (I think this is what you want)
#the easiest way is to use transform. ?tranform for more
dat.new <- transform(dat, sitename.clicks=sitename-clicks,
impression.clicks=impression-clicks)
dat.new #here's the new data set. Notice it has the new and old columns.
#To get rid of the old columns you can use indexing and specify the columns you want.
dat.new[, c(1:2, 6:7)]
#We could have also done:
dat.new[, c(1,2,6,7)]
#or said the columns not wanted with negative indexing:
dat.new[, -c(3:5)]
EDIT In looking at Brian's comments and the variables I would think that a long to wide transformation is what the poster desires. I would likely approach it using Wickham's reshape2 package as well, as this method is easier for me to work with and I imagine it would be easier for an R beginner as well. However, here is a base way to do the long to wide format using the same data set Brian provided:
wide <- reshape(DF, v.names=c("impression", "clicks"), idvar=c("market", "date"),
timevar="sitename", direction="wide")
reshape(wide)
The reshape function is very flexible but takes some getting used to to use appropriately. I'm leaving my previous response up as well to keep the history of this post though I now believe this is not the posters intent. It serves as a reminder that a reproducible example is very helpful in providing clarity to your query.
Example data, as Tyler said, is important. I interpreted your question differently because I thought your data was different. I didn't take the - as a literal subtraction of numerics, but a combination of variables.
DF <- expand.grid(market = LETTERS[1:5],
date = Sys.Date()+(0:5),
sitename = letters[1:2])
n <- nrow(DF)
DF$impression <- sample(100, n, replace=TRUE)
DF$clicks <- sample(100, n, replace=TRUE)
I find the reshape2 package useful for these sort of transpositions/transformations/rearrangements.
library("reshape2")
dcast(melt(DF, id.vars=c("market","date","sitename")),
market+date~sitename+variable)
gives
market date a_impression a_clicks b_impression b_clicks
1 A 2012-02-28 74 97 11 71
2 A 2012-02-29 34 30 88 35
3 A 2012-03-01 40 85 40 49
4 A 2012-03-02 46 12 99 20
5 A 2012-03-03 6 95 85 56
6 A 2012-03-04 61 61 42 64
7 B 2012-02-28 4 53 74 9
8 B 2012-02-29 43 27 92 59
9 B 2012-03-01 34 26 86 43
10 B 2012-03-02 81 47 84 35
11 B 2012-03-03 3 5 91 48
12 B 2012-03-04 19 26 99 21
13 C 2012-02-28 22 31 100 53
14 C 2012-02-29 40 83 95 27
15 C 2012-03-01 78 89 81 29
16 C 2012-03-02 57 55 79 87
17 C 2012-03-03 37 61 3 97
18 C 2012-03-04 83 61 41 77
19 D 2012-02-28 81 18 47 3
20 D 2012-02-29 90 100 17 83
21 D 2012-03-01 12 40 35 93
22 D 2012-03-02 85 14 63 67
23 D 2012-03-03 63 53 29 58
24 D 2012-03-04 40 79 56 70
25 E 2012-02-28 97 62 68 31
26 E 2012-02-29 24 84 17 63
27 E 2012-03-01 94 93 32 2
28 E 2012-03-02 6 26 86 26
29 E 2012-03-03 100 34 37 80
30 E 2012-03-04 89 87 72 11
The column names have a _ between them rather than a -, but you can change that if you want. I wouldn't recommend it, though, because then you will have problems later referencing the column since the - will be taken as subtraction (you would need to quote the name).

Resources