I'm trying to create a stacked bar chart using r. I know a bit of R, but mainly SPSS. The barcharts are really ugly in SPSS so I have been trying to use ggplot2 to make something more elegant.
Following other posts, I have tried to make my variables work. I converted the data to long form. Because this is original research I can't give too many specifics on the case. The first column is categorical data, and the second is numeric, because I imported it from SPSS but is actually categorical as well.
In longform there are 110 obs and 2 variables. My code here is:
Barchart <- ggplot(psydatacomp, aes(x=PsyType, y=Agreement, fill=row)) +
geom_bar(stat = "identity")
psydatacomp is the matrix I created in order to remove the NaN's.
The error message I receive is:
Don't know how to automatically pick scale for object of type function. Defaulting to continuous.
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows: 110, 0
I have a basic backing in R, but it's not strong enough to be able to interpret what this error message is saying. Any help would be great.
It seems that one of your variables is interpreted as a function. For example "row" is a function (just search ?row in R). You should change here the column name from "row" to "Row".
Here is a similar case: ggplot Error: Don't know how to automatically pick scale for object of type function
An alternative to R would be to run the analysis in SPSS and use Excel to visualize your results. It's much easier to run a simple SPSS analysis and drop the output into Excel than to import to R. A stacked bar chart takes no time to produce in Excel. I only mention this because it sounds like you are new to R but are more familiar with SPSS.
Related
I'm looking to plot the effects for a TOBIT regression model at -1SD and +1SD using a bar graph. I would normally use a line graph, but my co-authors have asked for a bar graph instead. I have had some help identifying what I should do here, but I am getting an error that no one seems to be able to figure out.
Creating dataframe - this runs fine.
tobitforgraph <- survreg(Surv(S1, S2, type='right') ~ +(Var1centered)*Var2centered*Var3standardized,data=Datafilename, dist='gaussian', robust=TRUE))
Load effects library.
library(effects)
Extract values for plotting (the values coming from here are then supposed to be used to plot as normal in Excel, although there may be a way to do it in R; regardless, this is where I am getting an error).
print((intplot<-as.data.frame(ef< Effect(c("Var1centered","Var3standardized"),mod=tobitforgraph,xlevels=list(Var1centered=c(-.4987531,.5012469),Var3standardized=c(-1.0,1.0), data=subset(Datafilename, Var4==-0.5025))))))
Error that comes from this value extraction:
Error in Effect.default(c("Var1centered", "Var3standardized"), mod = tobitforgraph, :
argument "offset" is missing, with no default
I am working with weighted survey data of the class survey.design2 and survey.design. With the package survey, and the function call svytable, I can create contingency tables for survey data. With these contingency tables, I can then create normal bar-charts using lattice. The standard way for doing this (e.g. barchart(cars ~ mpg | factor(cyl), data=mtcars,...)) doesn't work for this data type.
I am used to working with ggplot2, and would like to create either stacked or grouped bar-charts, if possible even with facet-wraps. Unfortunately, ggplot2 does not know how to deal with data of the type survey.design2 either. As far as I am concerned, there also does not exist some sort of add-on, which would allow ggplot2 to deal with this kind of data.
So far I have:
sub-set my data set
converted it into class survey.design2 with the function call svydesign(),
plotted multiple bar-charts in one window using grid.arrange(). This sort of provides for a work around for facetting, but still doesn't allow me to create stacked or grouped bar-charts.
I'd be grateful for any suggestions.
Thank you
Good morning MatthewR
I have a data set with 62732 observations and 691 variables.
Original Data Set
So any example based on a random number generator should work as well, I guess. I am really just interested in a work around to this issue, not necessarily the final code.
I then convert the data frame into survey.design format using:
df_Survey <- svydesign(id=~1, weights=~IXPXHJ, data=df). IXPXHJ is the variable by which the original sample data set will be weighted so as to get the entire population. head(df$IXPXHJ) looks something like this:
87.70876
78.51809
91.95209
94.38899
105.32005
56.30210
str(df_Survey) looks something like this.
Survey Data Structure
I had to teach my friend how to run a t-test on R (using the t.test function) and I just wished that the function was more interactive. Newcomers could run the function easily if the function guides them through the test. I was unable to find such function online so I decided to make one myself. Trying to make an interactive function is a huge challenge for me but it is a fun breather in my graduate school life.
I want my function to be able to run like myttest(x, y, paired = T) so that Rmarkdown could produce the output normally. I also want the function to run interactively by typing myttest(). Thus I decided to base my function on t.test.default and add readline in the source code where needed.
I used getAnywhere(t.test.default) function to display the source code. And I put the following code right after the first { so that R could ask for a vector like data$GPA.
if (missing(x)) {x <- readline("What is the name of the data set?")}
However, I got the following error message when I ran myttest() and typed data$GPA in the interactive dialogue.
Error in myttest() : not enough 'x' observations
In addition: Warning message:
In myttest() : NAs introduced by coercion
The data set data actually exists in the Global Environment and it has the column GPA so I think it is the problem with my coding. Why isn't R reading the observations in the GPA column?
(Also, one of my final goals is to make R ask for only a data set. R would read the columns of the data set, display them in the interactive dialogue, and ask Which variable do you want to use as the DV?. Then I could type in GPA, for example. I also think if R to asked for the type of t-test at the beginning (e.g. one-sample, two-sample, or paired-sample). Do you think this level of interaction is possible?)
My question is with regards to creating a weighted box plot using ENmisc library. I have a dataframe and I want to plot the boxplot based on two different categories (both type chr).
The error given is ## Error: missing value where TRUE/FALSE needed from the line wtd.boxplot(df2J$mean_P32 ~ df2J$mode_Litho,weights=df2J$length). I've attached a log of the portion of code in question below which shows the values of each data type as well as that there is not any data missing. The last line produces a boxplot similar to the one I would expect from the line above.
Unfortunately I don't know how to recreate this error with a general example so I haven't provided code that can be run.
If anyone could shed some light on this error it would be much appreciated.
Other Info:
The plots work if I use the base package boxplot function.
There are other ways I could create weighted boxplots if needed such as this but I really don't see any reason this shouldn't work.
wtd.boxplot function
ENmisc library
I'm not sure why this doesn't show up in the Knitr ourput but The error that shows up in the R console is Error in if (any(out[nna])) stats[c(1, 5)] <- range(x[!out], na.rm = TRUE) :
missing value where TRUE/FALSE needed
I have the same problem and it happens because (I think) you have only 1 member for one of your groups. Check it.
I run a bunch of simulations to evaluate type I error, so the result is a vector such as
pdata = c(0,0,0,0,0,0,0,0,0,0.07,0,0.02,0.03)
The mean of the simulated vector should be 0.05. Now I am thinking of a way to display the results via boxplots. The default function in R
boxplot(pdata)
gives a boxplot that is rather hard to see the typical value as there are many 0's. In addition, it shows the median, but what I really want is the mean to be displayed on the plot. Are there any graphical display that is effective in such situation? I know that I can simply report the numerical values, but because my simulation involves other factors which I hope to compare, a boxplot-like graph will be ideal. Thanks!
Something like this maybe :
plot(table(pdata))
Here a ggplot2 version :
ggplot(as.data.frame(table(pdata)),aes(x=pdata,y=Freq))+geom_bar()