I have a dataframe in which I want to add a new column which is the product of two other columns, divided by 100.
The command I'm trying to use is:
fulldata$Conpolls <- fulldata$Conprct/100 * fulldata$Total.seats
for which I receive:
Error: unexpected input in "full.data$Conpolls <- fulldata$Conprct /100 * fulldata$Total.seats"
When I try to break up the process in 2 steps as:
fulldata$Conpolls <- fulldata$Conprct * fulldata$Total.seats
I get the error:
non-numeric argument to binary operator.
Any tips or help from experienced users greatly appreciated!
Veerendra Gadekar's answer should be correct if all the columns are numeric values.
If the columns with which you are doing the operations are not guaranteed to be numeric, you may turn them into numeric values with as.numeric(). It should look like this:
fulldata$Conpolls <- (as.numeric(fulldata$Conprct) * as.numeric(fulldata$Total.seats))/100
fulldata$Conpolls <- (fulldata$Conprct * fulldata$Total.seats)/100
This doesn't answer the question, however this should be the proper syntax to write such arithmetic operations. And yes as mentioned in the comments you should check the class of the objects you are using to find out what is wrong
Related
I want to multiply a this number 4.193215e+12 with a dataframe. My code is
df <- cbind(Dataset = df$Dataset, df[,2:4] * 4.193215e^12
However an error appears. What is the proper way to code this number 4.193215e+12 in R?
While this is found in the not-quite-obvious location ?NumericConstants , I am hard-pressed to think of a language in which Xe^Y is syntactically correct. Always use either e or ^ for powers.
I am attempting to check a column in my dataset that is all character values with values like: "1","2","12","NAME1","NAME2",...
I am attempting to pick out the values that have non-numeric names and change them to 99. This is what I have attempted so far:
install.packages("stringi")
library(stringi)
stacked_data$NewCol=ifelse(stri_detect_fixed(stacked_data$OldCol,"NAME")==TRUE,99,stacked_data)
I get this error message when I run this code:
Error in table(stacked_data$NewCol) :
attempt to make a table with >= 2^31 elements
Can someone help point me in the right direction? Any help would be appreciated! Thank you!
One easier option is
i1 <- is.na(as.numeric(df1$col))
df1$col[i1] <- 99
I am having an issue with subsetting my Spark DataFrame.
I have a DataFrame called nfe, which contains a column called ITEM_PRODUTO that is formatted as a string. I would like to subset this DataFrame based on whether the item column contains the word "AREIA". I can easily subset the data based on an exact phrase:
nfe.subset1 <- subset(nfe, nfe$ITEM_PRODUTO == "AREIA LAVADA FINA")
nfe.subset2 <- subset(nfe, nfe$ITEM_PRODUTO %in% "AREIA")
However, what I would like is a subset of all rows that contain the word "AREIA" in the ITEM_PRODUTO column. When I try to use grep, though, I receive an error message:
nfe.subset3 <- subset(nfe, grep("AREIA", nfe$ITEM_PRODUTO))
# Error in as.character.default(x) :
# no method for coercing this S4 class to a vector
I've tried multiple iterations of syntax, and tried grepl as well, but nothing seems to work. It's probably a syntax error, but could anyone help me out?
Thanks!
Standard R functions cannot be applied to SparkDataFrame. Use either like`:
where(nfe, like(nfe$ITEM_PRODUTO, "%AREIA%"))
or rlike:
where(nfe, rlike(nfe$ITEM_PRODUTO, ".*AREIA.*"))
I am a second year M.Sc student and I am running into a bit of a snag running my statistics.
I am trying to run a contingency table and Fishers test and I keep getting an error.
Error in fisher.test(GAL4UAS) : if 'x' is not a matrix, 'y' must be given
If anyone can see what I have done wrong/may be missing I would really appreciate it?
This is the code:
setwd("/Users/Pria/Desktop/Data Analysis/")
GAL4UAS <-- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- lapply(GAL4UAS, abs)
fisher.test(GAL4UAS)
fisher.test(GAL4UAS[c(1,2)])
fisher.test(GAL4UAS[c(1,3)])
fisher.test() is anticipating a matrix as an input and not a data frame. Try putting your data into a matrix. One option among several would be:
m <- matrix(c(20,21,19,10,9,11),nrow = 3,ncol=2,byrow=FALSE)
fisher.test(m)
When you apply the abs() using lapply the output is a list and not a data.frame. The apply function returns the output in a matrix format which is expected in the fisher.test(). So maybe you can try this:
GAL4UAS <- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- apply(GAL4UAS, abs, MARGIN=c(1,2))
fisher.test(GAL4UAS)
New to R and having problem with a very simple task! I have read a few columns of .csv data into R, the contents of which contains of variables that are in the natural numbers plus zero, and have missing values. After trying to use the non-parametric package, I have two problems: first, if I use the simple command bw=npregbw(ydat=y, xdat=x, na.omit), where x and y are column vectors, I get the error that "number of regression data and response data do not match". Why do I get this, as I have the same number of elements in each vector?
Second, I would like to call the data ordered and tell npregbw this, using the command bw=npregbw(ydat=y, xdat=ordered(x)). When I do that, I get the error that x must be atomic for sort.list. But how is x not atomic, it is just a vector with natural numbers and NA's?
Any clarifications would be greatly appreciated!
1) You probably have a different number of NA's in y and x.
2) Can't be sure about this, since there is no example. If it is of following type:
x <- c(3,4,NA,2)
Then ordered(x) should work fine. Please provide an example of your case.
EDIT: You of course tried bw=npregbw(ydat=y, xdat=x)? ordered() makes your vector an ordered factor (see ?ordered), which is not an atomic vector (see 2.1.1 link and ?factor)
EDIT2: So the problem was the way of subsetting data. Note the difference in various ways of subsetting. data$x and data[,i] (where i = column number of column x) give you vectors, while data[c("x")] and data[i] give a data frame. Functions expect vectors, unless they call for data = (your data). In that case they work with column names