Box-plot by month in r, remove RA - r

Im very new in R and I am making a boxplot in R using data collected annually. The questions is about depression and falls in to to this categories: all, most, some, a little, or none of the time. I want to see if there is some correlation between frequency of depression and months of the year.
How do i remove RA?
enter image description here

If you mean how do you avoid plotting the NA values try this:
ggplot(data %>% filter(!is.na(menthlth)), aes(...)) +
geom_boxplot()

Related

How to plot a gg barplot for a single factor column?

My data frame has 621 rows and each column describes something about it. I'm trying to do a exploratory data analysis where I plot out all the data into a bar plot.
I have a factor column called phenotype, which has 86 levels which describe the main condition in my cohort. I want to plot this out as 86 separate bar plots, each with the total number of people who have that condition on ggplot.
I've attached a screenshot of my data below, I basically want the x axis to have the condition name like the 'Bardet-Biedl Syndrome', 'Classic Ehlers Danlos Syndrome' etc and on the y axis the number of people who have that condition, such as 3,4,5 as displayed below etc. I got the below data by basically doing
table(data.frame$Phenotype)
I'm using the below code to generate my ggplot
ggplot (tiering, aes(x = Phenotype, y = count(tiering$Phenotype))) +
theme bw() +
geom bar(stat = "identity")
I'm sure the answer is out there, but I've looked on the R help websites and I can't seem to figure this out, so would be very grateful for the help.
EDIT: I got to a marplot with the help of the below code, just trying to reorder the bar/columns in decreasing order and tried this method but it hasn't worked. Would anyone have any suggestions?

plotting multiple lines in ggplot R

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

R ggplot2 Visualize categorical variable that levels appear more than once

I am trying to visualize some tennis data with ggplot2 in R.
Here are my data:
Year<-c(1999:2020)
Player <- rep("Federer",22)
Rank <-
c("Q1","3R","3R","4R","4R","W","SF","W","W","SF","F","W","SF","SF","SF","SF","3R",
"SF","W","W","4R","SF")
data <- data.frame(Year, Player, Rank)
data$Rank <- factor(data$Rank, levels = unique(data$Rank))
What I want to do is a diagram that looks like a bar plot but actually is not a bar plot. I would like to have as x-axis Years from 1999 to 2020 and correspond them to Rank level.
My problem is that Rank, which is I converted to categorical variable, has some levels that appear more than once in time and this makes things difficult for me.
I am looking to do something like the following pic from Wikipedia with specific color for every level of Rank variable.
The Australian open result is what I want to visualize.
Maybe something like this, using geom_tile() to make like a heatmap..instead of a barplot:
library(ggthemes)
ggplot(data,aes(x=factor(Year),y=Player,fill=Rank)) +
geom_tile() + scale_fill_economist()

Density plot for multiple group shows one line, however legend shows 3

I am analyzing US election data volume from Google trend. I type the below command in R studio.
The poliData dataframe contains the SearchVolume for all months for three Politicians.
ggplot(data = poliData, aes(x=Date, group=Politician, colour=Politician)) +
geom_density()
But I only get the density line (blue) for one politician only with the above command.See the attached picture. Can you please help
I guess you got three lines on top each other because Date variable values are the same for all three politicians. My understanding of your analysis could be something like this:
ggplot(data = poliData,
aes(x=Date, colour=Politician,
weight = SearchVolume/sum(SearchVolume))) +
geom_density()
Adding weight should produce distinct lines for different politicians. If this is not what you wanted, please dput your data for others to work out a solution for you. Also, as I do not have the data, I have not tested the above code yet. Please let me know if it does not work.

I am using ggplot2 to make a bar chart and can't get the years correct along the x-axis

I am using ggplot2 to make a bar chart of the number of participants per year by gender. If I have 14 years included, I would like 2 bars for each year corresponding to the number of males and females for that year. I am not getting each year along the x-axis. I think data is being binned. I have tried changing the bin width, using scale_x_date and am still stuck.
Can you help me figure out how to have the data for EACH year in my graph?
As an example, here is my data for years 2004-2017:
year=c(2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017)
gender=c("male" , "female")
Participants is by gender, male then female respectively per year:
Participants=c(1307,443,1847,630,2109,765, 1824,691,2250,952,3123,1421,4097,1904,6415,3284,8788,4678,11581,6694,13141,8478,16389,10575,20990,13811,26951,19729)
data=data.frame(year,gender,Participants)
Here is how I am trying to generate my plot:
MyPlot <- ggplot(data, aes(fill=gender, y=Participants, x=year)) +
geom_bar(position="dodge", stat="identity",width = .8)
print(MyPlot + ggtitle("Annual Number of Participants by Gender"))
On the x-axis, the years 2006, 2010, 2014 and 2018 are marked and the bars correspond to data from two years. I want data for each year, both in terms of the bars and in terms of the ticks on the x-axis.
Any help would be appreciated!
You have more participants than years, so you don't have a clear dataframe design to serve as an input to ggplot.
Start here:
Read this: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
The key to which is:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
Then once you have a tibble/data frame your ggplot2 code should work fine. I'd kill the width= option until you have it working.

Resources