I am new to R, I looked for other solutions such as converting the datatype or exporting as csv (which generated weird formatting) and was unable to find a solution. I think I am overlooking something simple - thank you in advance!
I exported my dataframe (dfCCVul) to excel via write_xlsx. The data exported fine, except for the column "logPopDens.PopDensity" which is a column I had created by taking the log of another column (PopDensity) That column exports blank.
This is a snippet of the data :
PerPoverty PerNotWhit PerServWor logPopDens.PopDensity
13.1 42.5 12.92 6.288305
30.2 48.9 13.03 4.861129
10.1 17.1 9.16 4.819233
26.3 49.8 23.32 4.862599
16.6 42.8 20.24 5.02263
12.5 25.6 8.28 4.448282
15.3 20.3 5.89 5.048188
When I check the data type of the new column, the results look embedded:
$ logPopDens:'data.frame': 1315 obs. of 1 variable:
..$ PopDensity: num 3.52 3.07 2.64 1.16 2.27 ...
When I check the class, the output is:
> class(dfCCVul$logPopDens)
[1] "data.frame"
My thought was to convert the datatype, but I've received a series of errors after trying different syntax, for example:
> data$logPopDens <- as.numeric(as.character(data$logPopDens))
Error in data$col11 : object of type 'closure' is not subsettable
> data$logPopDens.PopDensity <- as.numeric(as.character(data$logPopDens.PopDensity))
Error in data$logPopDens.PopDensity :
object of type 'closure' is not subsettable
Is there another way to export the values of the logPopDens?
Thank you!
dfCCVul$logPopDens is a dataframe, convert it into a vector. One way would be with unlist.
dfCCVul$logPopDens <- unlist(dfCCVul$logPopDens)
Or I think this should work as well.
dfCCVul$logPopDens <- dfCCVul$logPopDens$PopDensity
I'm trying to convert data scraped from book depository, bests selling books into numeric data so that I can graph it.
My code currently is:
selector <- ".rrp"
library(rvest)
url <- "https://www.bookdepository.com/bestsellers"
doc <- read_html(url)
prices <- html_nodes(doc, selector)
html_text(prices)
library(readr)
Spiral <- read_csv("C:/Users/Ellis/Desktop/INFO204/Spiral.csv")
View(Spiral)
My attempting to clean the data:
text <- gsub('[$NZ]', '', Spiral) # removes NZ$ from data
But the data now looks like this:
[1] "c(\"16.53\", \"55.15\", \"36.39\", \"10.80\", \"27.57\", \"34.94\",
\"27.57\", \"22.06\", \"22.00\", \"16.20\", \"22.06\", \"22.06\",
\"19.84\", \"19.81\", \"27.63\", \"22.06\", \"10.80\", \"27.57\",
\"22.06\", \"22.94\", \"16.53\", \"25.36\", \"27.57\", \"11.01\",
\"14.40\", \"15.39\")"
and when I try run:
as.numeric(text)
I get:
Warning message:
NAs introduced by coercion
How do I clean the data up in such a way that NZ$ is removed from the price and I'm able to plot the 'cleaned data'
You have a single string that contains code, not numbers. You need to evaluate the code first.
as.numeric(eval(parse(text=text)))
[1] 16.53 55.15 36.39 10.80 27.57 34.94 27.57 22.06 22.00 16.20 22.06 22.06 19.84
[14] 19.81 27.63 22.06 10.80 27.57 22.06 22.94 16.53 25.36 27.57 11.01 14.40 15.39
Several options to get the desired outcome:
# option 1
as.numeric(gsub('(\\d+.\\d+).*', '\\1', html_text(prices)))
# option 2
as.numeric(gsub('\\s.*$', '', html_text(prices)))
# option 3
library(readr)
parse_number(html_text(prices))
all result in:
[1] 21.00 9.99 31.49 19.49 6.49 13.50 22.49 11.99 11.49 7.99 10.99 7.99 10.99 9.99 7.99 9.99 11.49 8.49 11.99 9.99 14.95 8.99 20.13 13.50 8.49 6.49
NOTES:
The result is a vector of prices in euros. Due to localisation prices may differ when you scrape from another county.
When the decimal spearator is a comma (,) in html_text(prices), the first two options can be changed to as.numeric(gsub('(\\d+),(\\d+).*', '\\1.\\2', html_text(prices))) to get the correct result. The third option should in that case be changed to: parse_number(html_text(prices), locale = locale(decimal_mark = ','))
I am trying to download some stocks data but the quantmod functions don't seem to work. For example:
getSymbols.yahoo("F",env= globalenv(), return.class = 'xts',
from = "2017-01-01",
to = Sys.Date())
[1] "F"
The package is upadated, as well as the local date set = Sys.setlocale("LC_TIME", "C"). I also tried with getSymbols.google but it doesn't work neither and to change the return class.
getSymbols() currently (as-of 0.4-10) loads the data into an environment, just like the load() function. In quantmod 0.5-0, it will return the data, like read.table() and most other functions.
If you want getSymbols() to return the data, you can set auto.assign = FALSE.
Data <- getSymbols("F", from = "2017-01-01", to = Sys.Date(), auto.assign = FALSE)
Also note that you should not call getSymbols.yahoo() directly (as it says in ?getSymbols.yahoo).
That's correct. Now if you want to see the historical data just type F:
> head(F)
F.Open F.High F.Low F.Close F.Volume F.Adjusted
2017-01-03 12.20 12.60 12.13 12.59 40510800 12.22555
2017-01-04 12.77 13.27 12.74 13.17 77638100 12.78876
2017-01-05 13.21 13.22 12.63 12.77 75628400 12.40034
2017-01-06 12.80 12.84 12.64 12.76 40315900 12.39063
2017-01-09 12.79 12.86 12.63 12.63 39183400 12.26440
2017-01-10 12.70 13.02 12.66 12.85 58703500 12.47803
I have a text file with close data that I am trying to convert to XTS format.
I am able to call it into R, but cannot figure out a way to convert this data to XTS format. Below is the sample data I am working with.
05/31/2017,32.78,FCOM
05/30/2017,32.72,FCOM
05/26/2017,32.56,FCOM
05/25/2017,32.57,FCOM
05/24/2017,32.47,FCOM
05/31/2017,35.63,FDIS
05/30/2017,35.71,FDIS
05/26/2017,35.67,FDIS
05/25/2017,35.54,FDIS
05/24/2017,35.23,FDIS
05/31/2017,18.17,FENY
05/30/2017,18.26,FENY
05/26/2017,18.53,FENY
05/25/2017,18.51,FENY
05/24/2017,18.90,FENY
05/31/2017,36.52,FHLC
05/30/2017,36.40,FHLC
05/26/2017,36.50,FHLC
05/25/2017,36.62,FHLC
05/24/2017,36.41,FHLC
05/31/2017,34.28,FIDU
05/30/2017,34.34,FIDU
05/26/2017,34.33,FIDU
05/25/2017,34.31,FIDU
05/24/2017,34.17,FIDU
05/31/2017,30.56,FMAT
05/30/2017,30.66,FMAT
05/26/2017,30.68,FMAT
05/25/2017,30.62,FMAT
05/24/2017,30.70,FMAT
05/31/2017,34.26,FNCL
05/30/2017,34.60,FNCL
05/26/2017,34.86,FNCL
05/25/2017,34.90,FNCL
05/24/2017,34.85,FNCL
05/31/2017,23.96,FREL
05/30/2017,23.96,FREL
05/26/2017,24.02,FREL
05/25/2017,24.21,FREL
05/24/2017,24.16,FREL
Thank you in advance for any assistance you can provide me with!
Use the split argument to read.zoo to indicate which column contains the data that should be used to create columns.
x <- read.zoo(text = "05/31/2017,32.78,FCOM
05/30/2017,32.72,FCOM
05/26/2017,32.56,FCOM
05/25/2017,32.57,FCOM
05/24/2017,32.47,FCOM
05/31/2017,35.63,FDIS
05/30/2017,35.71,FDIS
05/26/2017,35.67,FDIS
05/25/2017,35.54,FDIS
05/24/2018,35.23,FDIS
05/31/2017,18.17,FENY
05/30/2017,18.26,FENY
05/26/2017,18.53,FENY
05/25/2017,18.51,FENY
05/24/2017,18.90,FENY
05/31/2017,36.52,FHLC
05/30/2017,36.40,FHLC
05/26/2017,36.50,FHLC
05/25/2017,36.62,FHLC
05/24/2017,36.41,FHLC
05/31/2017,34.28,FIDU
05/30/2017,34.34,FIDU
05/26/2017,34.33,FIDU
05/25/2017,34.31,FIDU
05/24/2017,34.17,FIDU
05/31/2017,30.56,FMAT
05/30/2017,30.66,FMAT
05/26/2017,30.68,FMAT
05/25/2017,30.62,FMAT
05/24/2017,30.70,FMAT
05/31/2017,34.26,FNCL
05/30/2017,34.60,FNCL
05/26/2017,34.86,FNCL
05/25/2017,34.90,FNCL
05/24/2017,34.85,FNCL
05/31/2017,23.96,FREL
05/30/2017,23.96,FREL
05/26/2017,24.02,FREL
05/25/2017,24.21,FREL
05/24/2017,24.16,FREL", sep = ",", format = "%m/%d/%Y", split = 3)
Setting split = 3 tells read.zoo to use the 3rd column in the file to create columns. Then x is a zoo object:
R> x
FCOM FDIS FENY FHLC FIDU FMAT FNCL FREL
2017-05-24 32.47 35.23 18.90 36.41 34.17 30.70 34.85 24.16
2017-05-25 32.57 35.54 18.51 36.62 34.31 30.62 34.90 24.21
2017-05-26 32.56 35.67 18.53 36.50 34.33 30.68 34.86 24.02
2017-05-30 32.72 35.71 18.26 36.40 34.34 30.66 34.60 23.96
2017-05-31 32.78 35.63 18.17 36.52 34.28 30.56 34.26 23.96
You can convert x to xts using x <- as.xts(x).
I created with python a simple text file with 100 real numbers between 0 and 10 (one value per line).
So I read and set it in a variable 'a' on R, with 'read.table()' function
The mean() function works fine, but the median() function returns the following error when used 'a' as parameter (my R:Base is PT_BR version, so I'm translating the error messages to English. I don't know it is equal to the original English version)
#Error in median.default(a) : need numeric data
So i tried to convert it to numeric
as.numeric(a)
#Error: object (a) cannot be coerced to type 'double'
So I tried to convert to a list and get the median
a <- as.list(a)
median(a)
#Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
#'x' must be atomic
Printing the list:
a
$V1
[1] 0.003 0.161 0.227 0.331 0.416 0.441 0.536 0.619 0.730 0.737 0.764 0.799
[13] 0.939 1.009 1.036 1.217 1.321 1.615 1.684 1.878 1.930 1.933 1.949 2.018
[25] 2.053 2.126 2.181 2.464 2.488 2.725 2.838 2.874 2.893 2.954 3.054 3.092
[37] 3.149 3.192 3.216 3.233 3.422 3.424 3.695 3.720 3.743 4.097 4.229 4.229
[49] 4.264 4.317 4.447 4.461 4.529 4.794 4.992 5.121 5.138 5.161 5.241 5.264
[61] 5.286 5.428 5.430 5.430 5.498 5.520 5.706 5.928 5.956 6.074 6.154 6.398
[73] 6.402 6.536 6.549 6.748 6.994 7.196 7.397 7.440 7.840 7.854 7.862 7.913
[85] 7.976 8.002 8.151 8.185 8.237 8.485 8.632 8.688 8.718 9.200 9.372 9.401
[97] 9.487 9.615 9.701 9.702
What is this $V1?
How i get the median?
You have read the data in as a data frame: that means that the basic structure is a list of columns. Even though there's only one column in this data frame, you need to extract it before you can apply a numeric operation like computing the median. As you will see at ?"[[", there are a variety of ways of indexing a data frame.
median(a$V1)
median(a[[1]])
both pull out the first column.
median(unlist(a))
drops the list structure.
median(scan("data.txt"))
uses scan() instead, which reads the results in as a single vector rather than as a list of vectors (i.e. a data frame).