Sorry for basic question, I am new to R.
I would like to plot a box with subcategories and then with measurements taken over time.
For example I have tried this:
boxplot(field_data$week_1~field_data$field, ylab= 'number of infected plants')
This gives me two box plots (field is either ‘north’ or ‘south’). I want to split each boxplot into two boxplots by "position" variable (1 or 2). Is there a way to make it so that I will still have a plot with 2 main categories defined by "field", but then each will consist of two boxplots defined by "position" variable. I would also then like to plot the results from the ‘week_2’ readings next to the 'week_1' set of box plots. All of the data is in one df. I have other variables ('beds' and 'rows') with different levels too that categorise the measurements taken.
I have tried with ggplot but not sure how to do this or if this is the right function.
Thank you.
Related
I would like to split the legend that is generated by ggplot based on the variables that are taken from the data frame. I'm starting from melted data.
Using the Iris data set, I'd like to get this:
What I'm looking for is to obtain a legend box that includes only the Sepal variables and another legend box with Petal variables. Originally I'm dealing with similar data from different sources and would like to make clear to which source belong certain variables.
I am trying to make a line graph, that will plot data over 4x time points for different conditions. Right now, I have the conditions as one variable, but the values for each time point are each in their own variable column.
I can't figure out how to best graph it such that the y-axis shows each condition, and the x-axis shows the "score" over each time point.
How do I graph variables that represent different time points?
I am trying to create a bar chart using ggplot that adds up difference scores and groups them with positive or negative values and then creates a graph of the percentage. I can't seem to figure out the right code to do this however and could use some guidance.
I have two columns I am focusing on: one for the grade level and then another column with the difference score. I tried summing up the values of positive and negative for an aggregate total, but kept running into errors manipulating that data.
I ended up making a new column and merged it to the data frame if the values in a row were less than or greater than 0. I was able to graph this, but I struggle to create a 100% stacked bar chart.
Ideally what I hope to do is to create a stacked bar chart with grades 6th - 10th in the X-axis and the y-axis being the percentage of students in that grade with a positive difference score against the % with a negative score.
# Attempting to create a new column of boolean values to create the chart
Pos_Neg_df <- c(Fall_Math_Data$RITDifference >0)
Percentage_Math_Data <- cbind(Fall_Math_Data, Pos_Neg_df)
# Plotted this
ggplot(Percentage_Math_Data) +geom_bar(aes(x = Grade, fill = Pos_Neg_df)
Can you provide some sample data? It's difficult to see what exactly you're trying to do. That said, in your geom_bar, adding position = "stack" may be what you're looking for (see ggplot2 documentation.)
I am using R to analyse some ecological data, and when I try to create a boxplot two different boxes appear for one of my variables.
Here is my code:
plot(Ratio.gap.per ~ Circadian, data=circ)
which should produce a boxplot with a box for each of my x axis factors, but I always get two different bars for my category N (for 'nocturnal'). Boxplot shown here:
Does anyone know how to correct this?
It looks like some of your data is coded as "N " (with a space) rather than "N".
Try
circ$Circadian[circ$Circadian=="N "] <- "N"
and then if needed (i.e. if circ$Circadian is a factor),
circ$Circadian <- as.factor(as.character(circ$Circadian))
to get rid of the extra factor level.
I've been trying to create a proportional stacked bar graph using ggplot and a huge data set that is one column of a dummy variable and one column a factor variable with 14 different levels.
I posted a small sample of the data here.
Despite not having a clear y-variale in my data, I can produce a plot that is only really useful looking at the factors that have a lot of observations, but when there's only one or two, you can't see the proportion at all. The code I used is here.
ggplot(data,aes(factor(data$factor),fill=data$dummy))+
geom_bar()
ggplot says you need to apply a ddply function to the data frame.
ce<-ddply(data,"factor",transform, percent_y=y/sum(y)*100)
Their example doesn't really apply in the case of this data since there's no clear y-variable to call in the plot; just counts of each factor that is 1 or 0.
My best guess for a ddply function spits out an error about differeing number of rows.
ce<-ddply(plot,"factor(data$factor)",transform,
percent=sum(data$dummy)*100/(dim(data$dummy)[1]))