Thanks in advance for your response.
I am trying to create a stacked bar plot from a csv file, and I have run into the following hiccup:
First I put the csv into a variable:
test <- read.csv(file=\"test4.csv\",sep=\",\",head=TRUE")
Then I try to create a bar plot using the following
barplot(test)
and I get the following error,
Error in barplot.default(test) : 'height' must be a vector or a matrix
so I try
barplot(t(test))
and it works but as expected the axis are switched, so I try
barplot(t(t(test)))
and it works, but I feel there must be a better solution than transposing the transposed.
The issue is that read.csv outputs a data frame and barplot expects either a vector or a matrix. The barplot function works when you transpose because t() coerces data frames to matrices.
If you either start with
test <- as.matrix(read.csv(file="test4.csv",sep=",",head=TRUE))
or later on do
barplot(as.matrix(test))
then you should be fine.
Related
I would like to perform a HCPC on the columns of my dataset, after performing a CA. For some reason I also have to specify at the start, that all of my columns are of type 'factor', just to loop over them afterwards again and convert them to numeric. I don't know why exactly, because if I check the type of each column (without specifying them as factor) they appear to be numeric... When I don't load and convert the data like this, however, I get an error like the following:
Error in eigen(crossprod(t(X), t(X)), symmetric = TRUE) : infinite or
missing values in 'x'
Could this be due to the fact that there are columns in my dataset that only contain 0's? If so, how come that it works perfectly fine by reading everything in first as factor and then converting it to numeric before applying the CA, instead of just performing the CA directly?
The original issue with the HCPC, then, is the following:
# read in data; 40 x 267 data frame
data_for_ca <- read.csv("./data/data_clean_CA_complete.csv",row.names=1,colClasses = c(rep('factor',267)))
# loop over first 267 columns, converting them to numeric
for(i in 1:267)
data_for_ca[[i]] <- as.numeric(data_for_ca[[i]])
# perform CA
data.ca <- CA(data_for_ca,graph = F)
# perform HCPC for rows (i.e. individuals); up until here everything works just fine
data.hcpc <- HCPC(data.ca,graph = T)
# now I start having trouble
# perform HCPC for columns (i.e. variables); use their coordinates that are stocked in the CA-object that was created earlier
data.cols.hcpc <- HCPC(data.ca$col$coord,graph = T)
The code above shows me a dendrogram in the last case and even lets me cut it into clusters, but then I get the following error:
Error in catdes(data.clust, ncol(data.clust), proba = proba, row.w =
res.sauv$call$row.w.init) : object 'data.clust' not found
It's worth noting that when I perform MCA on my data and try to perform HCPC on my columns in that case, I get the exact same error. Would anyone have any clue as how to fix this or what I am doing wrong exactly? For completeness I insert a screenshot of the upper-left corner of my dataset to show what it looks like:
Thanks in advance for any possible help!
I know this is old, but because I've been troubleshooting this problem for a while today:
HCPC says that it accepts a data frame, but any time I try to simply pass it $col$coord or $colcoord from a standard ca object, it returns this error. My best guess is that there's some metadata it actually needs/is looking for that isn't in a data frame of coordinates, but I can't figure out what that is or how to pass it in.
The current version of FactoMineR will actually just allow you to give HCPC the whole CA object and tell it whether to cluster the rows or columns. So your last line of code should be:
data.cols.hcpc <- HCPC(data.ca, cluster.CA = "columns", graph = T)
I am trying to plot 9 barplots in a 3X3 matrix in R using base-R wrapped inside a for loop. (I am working on a workhorse solution for visualizing every column before I begin working on manipulating data) Below is the code:
library(ISLR);
library(ggplot2);
# load wage data
data(Wage)
par(mfrow=c(3,3))
for(i in 1:(dim(Wage)[2]-2)){
plot(Wage[,i],main = paste0(names(Wage)[i]),las = 2)
}
But unfortunately can't do properly for first 2 columns because they are numeric and actually needs a histogram. I get it that I need to fit if-else condition somewhere inside for() statement but that is giving me errors. below is the output where first 2 columns are plotted wrong. (Age and year are actually numeric and I may need to use them in X-axis instead of defaulting them to y).
Kindly requesting to suggest an edit/hack? I also learnt that I cant' use par() when I am wrapping ggplot inside for so I had to use base-R otherwise ggplot would have been great aesthetically.
I am a second year M.Sc student and I am running into a bit of a snag running my statistics.
I am trying to run a contingency table and Fishers test and I keep getting an error.
Error in fisher.test(GAL4UAS) : if 'x' is not a matrix, 'y' must be given
If anyone can see what I have done wrong/may be missing I would really appreciate it?
This is the code:
setwd("/Users/Pria/Desktop/Data Analysis/")
GAL4UAS <-- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- lapply(GAL4UAS, abs)
fisher.test(GAL4UAS)
fisher.test(GAL4UAS[c(1,2)])
fisher.test(GAL4UAS[c(1,3)])
fisher.test() is anticipating a matrix as an input and not a data frame. Try putting your data into a matrix. One option among several would be:
m <- matrix(c(20,21,19,10,9,11),nrow = 3,ncol=2,byrow=FALSE)
fisher.test(m)
When you apply the abs() using lapply the output is a list and not a data.frame. The apply function returns the output in a matrix format which is expected in the fisher.test(). So maybe you can try this:
GAL4UAS <- data.frame(Yes=c(20,21,19),No=c(10,9,11))
GAL4UAS <- apply(GAL4UAS, abs, MARGIN=c(1,2))
fisher.test(GAL4UAS)
I am trying to create a variable that contains "buckets" of a numeric value in another column. For example:
nts$size_bucket<-cut(nts$loansize, c(0, 5000,10000, 25000,50000,100000,200000,300000, 500000,Inf),
c('<$5K', '5-10K', '10-25K', '25-50K', '50-100K', '100-200K', '200-300K', '300-500K', '500K+'))
In normal R, cut would work perfectly, but it doesn't appear to work with a SparkR dataframe and gives the exception:
'x' must be numeric
even though x is numeric.
Any suggestions for how to accomplish this in SparkR?
Thanks!
New to R and having problem with a very simple task! I have read a few columns of .csv data into R, the contents of which contains of variables that are in the natural numbers plus zero, and have missing values. After trying to use the non-parametric package, I have two problems: first, if I use the simple command bw=npregbw(ydat=y, xdat=x, na.omit), where x and y are column vectors, I get the error that "number of regression data and response data do not match". Why do I get this, as I have the same number of elements in each vector?
Second, I would like to call the data ordered and tell npregbw this, using the command bw=npregbw(ydat=y, xdat=ordered(x)). When I do that, I get the error that x must be atomic for sort.list. But how is x not atomic, it is just a vector with natural numbers and NA's?
Any clarifications would be greatly appreciated!
1) You probably have a different number of NA's in y and x.
2) Can't be sure about this, since there is no example. If it is of following type:
x <- c(3,4,NA,2)
Then ordered(x) should work fine. Please provide an example of your case.
EDIT: You of course tried bw=npregbw(ydat=y, xdat=x)? ordered() makes your vector an ordered factor (see ?ordered), which is not an atomic vector (see 2.1.1 link and ?factor)
EDIT2: So the problem was the way of subsetting data. Note the difference in various ways of subsetting. data$x and data[,i] (where i = column number of column x) give you vectors, while data[c("x")] and data[i] give a data frame. Functions expect vectors, unless they call for data = (your data). In that case they work with column names