I am trying to obtain Bitcoin data from yahoo finance using the following code:
getSymbols("BTC-USD",from= "2020-01-01",to="2020-12-31",warnings=FALSE,auto.assign = TRUE)
BTC-USD=BTC-USD[,"BTC-USD.Adjusted"]
However, I get the following error:
Warning message:
BTC-USD contains missing values. Some functions will not work if objects contain missing values in the middle of the series. Consider using na.omit(), na.approx(), na.fill(), etc to remove or replace them.
How can I fix this?
Thanks.
You've got a first problem which is you're trying to assign to an invalid symbol. Use _ instead of - which is the subtraction operator. If you really want the -, you can use backticks around the symbol.
Then you can use is.na to find the NA values and replace them with 0.
library(quantmod)
getSymbols("BTC-USD",from= "2020-01-01",to="2020-12-31",warnings=FALSE,auto.assign = TRUE)
BTC_USD <- `BTC-USD`[,"BTC-USD.Adjusted"]
BTC_USD[is.na(BTC_USD)] <- 0
BTC_USD[100:110,]
# BTC-USD.Adjusted
#2020-04-09 7302.089
#2020-04-10 6865.493
#2020-04-11 6859.083
#2020-04-12 6971.092
#2020-04-13 6845.038
#2020-04-14 6842.428
#2020-04-15 6642.110
#2020-04-16 7116.804
#2020-04-17 0.000
#2020-04-18 7257.665
#2020-04-19 7189.425
A better plan is probably to just remove the NA rows instead of replacing them with 0:
BTC_USD <- BTC_USD[!is.na(BTC_USD),]
Related
I have a numeric column ("value") in a dataframe ("df"), and I would like to generate a new column ("valueBin") based on "value." I have the following conditional code to define df$valueBin:
df$valueBin[which(df$value<=250)] <- "<=250"
df$valueBin[which(df$value>250 & df$value<=500)] <- "250-500"
df$valueBin[which(df$value>500 & df$value<=1000)] <- "500-1,000"
df$valueBin[which(df$value>1000 & df$value<=2000)] <- "1,000 - 2,000"
df$valueBin[which(df$value>2000)] <- ">2,000"
I'm getting the following error:
"Error in $<-.data.frame(*tmp*, "valueBin", value = c(NA, NA, NA, :
replacement has 6530 rows, data has 6532"
Every element of df$value should fit into one of my which() statements. There are no missing values in df$value. Although even if I run just the first conditional statement (<=250), I get the exact same error, with "...replacement has 6530 rows..." although there are way fewer than 6530 records with value<=250, and value is never NA.
This SO link notes a similar error when using aggregate() was a bug, but it recommends installing the version of R I have. Plus the bug report says its fixed.
R aggregate error: "replacement has <foo> rows, data has <bar>"
This SO link seems more related to my issue, and the issue here was an issue with his/her conditional logic that caused fewer elements of the replacement array to be generated. I guess that must be my issue as well, and figured at first I must have a "<=" instead of an "<" or vice versa, but after checking I'm pretty sure they're all correct to cover every value of "value" without overlaps.
R error in '[<-.data.frame'... replacement has # items, need #
The answer by #akrun certainly does the trick. For future googlers who want to understand why, here is an explanation...
The new variable needs to be created first.
The variable "valueBin" needs to be already in the df in order for the conditional assignment to work. Essentially, the syntax of the code is correct. Just add one line in front of the code chuck to create this name --
df$newVariableName <- NA
Then you continue with whatever conditional assignment rules you have, like
df$newVariableName[which(df$oldVariableName<=250)] <- "<=250"
I blame whoever wrote that package's error message... The debugging was made especially confusing by that error message. It is irrelevant information that you have two arrays in the df with different lengths. No. Simply create the new column first. For more details, consult this post https://www.r-bloggers.com/translating-weird-r-errors/
You could use cut
df$valueBin <- cut(df$value, c(-Inf, 250, 500, 1000, 2000, Inf),
labels=c('<=250', '250-500', '500-1,000', '1,000-2,000', '>2,000'))
data
set.seed(24)
df <- data.frame(value= sample(0:2500, 100, replace=TRUE))
TL;DR ...and late to the party, but that short explanation might help future googlers..
In general that error message means that the replacement doesn't fit into the corresponding column of the dataframe.
A minimal example:
df <- data.frame(a = 1:2); df$a <- 1:3
throws the error
Error in $<-.data.frame(*tmp*, a, value = 1:3) : replacement
has 3 rows, data has 2
which is clear, because the vector a of df has 2 entries (rows) whilst the vector we try to replace has 3 entries (rows).
I am attempting to check a column in my dataset that is all character values with values like: "1","2","12","NAME1","NAME2",...
I am attempting to pick out the values that have non-numeric names and change them to 99. This is what I have attempted so far:
install.packages("stringi")
library(stringi)
stacked_data$NewCol=ifelse(stri_detect_fixed(stacked_data$OldCol,"NAME")==TRUE,99,stacked_data)
I get this error message when I run this code:
Error in table(stacked_data$NewCol) :
attempt to make a table with >= 2^31 elements
Can someone help point me in the right direction? Any help would be appreciated! Thank you!
One easier option is
i1 <- is.na(as.numeric(df1$col))
df1$col[i1] <- 99
I am having an issue with subsetting my Spark DataFrame.
I have a DataFrame called nfe, which contains a column called ITEM_PRODUTO that is formatted as a string. I would like to subset this DataFrame based on whether the item column contains the word "AREIA". I can easily subset the data based on an exact phrase:
nfe.subset1 <- subset(nfe, nfe$ITEM_PRODUTO == "AREIA LAVADA FINA")
nfe.subset2 <- subset(nfe, nfe$ITEM_PRODUTO %in% "AREIA")
However, what I would like is a subset of all rows that contain the word "AREIA" in the ITEM_PRODUTO column. When I try to use grep, though, I receive an error message:
nfe.subset3 <- subset(nfe, grep("AREIA", nfe$ITEM_PRODUTO))
# Error in as.character.default(x) :
# no method for coercing this S4 class to a vector
I've tried multiple iterations of syntax, and tried grepl as well, but nothing seems to work. It's probably a syntax error, but could anyone help me out?
Thanks!
Standard R functions cannot be applied to SparkDataFrame. Use either like`:
where(nfe, like(nfe$ITEM_PRODUTO, "%AREIA%"))
or rlike:
where(nfe, rlike(nfe$ITEM_PRODUTO, ".*AREIA.*"))
Another stumbling block. I have a large set of data (called "brightly") with about ~180k rows and 165 columns. I am trying to create a correlation matrix of these columns in R.
Several problems have arisen, none of which I can resolve with the suggestions proposed on this site and others.
First, how I created the data set: I saved it as a CSV file from Excel. My understanding is that CSV should remove any formatting, such that anything that is a number should be read as a number by R. I loaded it with
brightly = read.csv("brightly.csv", header=TRUE)
But I kept getting "'x' must be numeric" error messages every time I ran cor(brightly), so I replaced all the NAs with 0s. (This may be altering my data, but I think it will be all right--anything that's "NA" is effectively 0, either for the continuous or dummy variables.)
Now I am no longer getting the error message about text. But any time I run cor()--either on all of the variables simultaneously or combinations of the variables--I get "Warning message:
In cor(brightly$PPV, brightly, use = "complete") :
the standard deviation is zero"
I am also having some of the correlations of that one variable with others show up as "NA." I have ensured that no cell in the data is "NA," so I do not know why I am getting "NA" values for the correlations.
I also tried both of the following to make REALLY sure I wasn't including any NA values:
cor(brightly$PPV, brightly, use = "pairwise.complete.obs")
and
cor(brightly$PPV,brightly,use="complete")
But I still get warnings about the SD being zero, and I still get the NAs.
Any insights as to why this might be happening?
Finally, when I try to do corrplot to show the results of the correlations, I do the following:
brightly2 <- cor(brightly)
Warning message:
In cor(brightly) : the standard deviation is zero
corrplot(brightly2, method = "number")
Error in if (min(corr) < -1 - .Machine$double.eps || max(corr) > 1 + .Machine$double.eps) { :
missing value where TRUE/FALSE needed
And instead of making my nice color-coded correlation matrix, I get this. I have yet to find an explanation of what that means.
Any help would be HUGELY appreciated! Thanks very much!!
Please check if you replaced your NAs with 0 or '0' as one is character and other is int. Or you can even try using as.numeric(column_name) function to convert your char 0s with int 0. Also this error occurs if your dataset has factors, because those are not int values corrplot throws this error.
It would be helpful of you put sample of your data in the question using
str(head(your_dataset))
That would be helpful for you to check the datatypes of columns.
Let me know if I am wrong.
Cheerio.
I have two arrays "begin" and "end_a" which contain some integer indices, except that some of the entries in "end_a" are NA.
And panelDataset is a matrix which contains the data. I want to take the means of the rows of panelDataset corresponding to non-NA entries of begin and end_a.
I have this working in serial fashion and it works fine, but when I tried to vectorize it as follows
switch_mu=ifelse(!is.na(end_a),mean(panelDataset[begin: end_a,4]),NA)
It gives an error: Error in begin:end_a : NA/NaN argument.
When I check the entries of end_a separately for NAs using is.na(end_a), it does show the correct entries of the array as NA. So, that is not an issue.
I know I am missing something trivial. Any thoughts?
Try this:
means <- apply(na.omit(cbind(begin, end_a)), 1,
function(x) mean(panelDataset[x[1]:x[2], 4]))
replace(end_a, !is.na(end_a), means)