Is there a way to run a random sample of that data I want to plot in one swoop?
ggplot()+
labs(y="Monthly Pay $", title="Monthly Pay per Pay Instance by Employee")+
geom_line(aes(y=MonthlyPay, x=PayInstance, group=EMPLID),
data = linetechanalysisTermed)
I have 7,759 in this population and myplot is completely unreadable (I thought about altering the axis but I don't think it will help with such a large population).
Related
I'm working on a figure for a publication where we are looking at a combination of plant coverage and other environmental data on differing communities. I am trying to make a multi-panel figure, with panels that display all the envfit results, one that displays only the plants, and one that displays only the other enviro. Because of the complexity of the figure, it's actually a little easier to construct in the base plot function than in ggvegan.
My challenge is figuring out how to subset the results of the envfit analysis object for the different panels. A simplified example would be:
library(vegan)
data("mite")
data("mite.env")
set.seed(55)
nmds<-metaMDS(mite)
set.seed(55)
ef<-envfit(nmds, mite.env, permu=999)
plot(ef, p.max = .05)
which produces this figure
For sake of the example, does anyone have suggestions on a way I could create two separate figures, one with only the WatrCont vector and one with only the SubsDens vector? I'm sure there is a way to pull specific results out of the ef object, but my coding is not savvy enough to understand how.
Additionally, is there a way to have the jumble of text at the center not overlap, similar to jitter in ggplot?
Thank y'all for all of your help!
I would suggest extracting the data from nmds and ef and using ggplot to add the required elements to your plots.
Here is an example:
library(vegan)
library(ggplot2)
data("mite")
data("mite.env")
set.seed(55)
nmds<-metaMDS(mite)
set.seed(55)
ef<-envfit(nmds, mite.env, permu=999)
# Get the NMDS scores
nmds_values <- as.data.frame(scores(nmds))
# Get the coordinates of the vectors produced for continuous predictors in your envfit
vector_coordinates <- as.data.frame(scores(ef, "vectors")) * ordiArrowMul(ef)
# Plot the vectors separately
ggplot(nmds_values,
aes(x=NMDS1, y = NMDS2)) +
geom_point() +
geom_segment(aes(x=0, y=0, xend=NMDS1, yend=NMDS2),
vector_coordinates[1,]) +
geom_text(aes(x=NMDS1,y=NMDS2),
vector_coordinates[1,],
label=row.names(vector_coordinates[1,]))
ggplot(nmds_values,
aes(x=NMDS1, y = NMDS2)) +
geom_point() +
geom_segment(aes(x=0, y=0, xend=NMDS1, yend=NMDS2),
vector_coordinates[2,]) +
geom_text(aes(x=NMDS1,y=NMDS2),
vector_coordinates[2,],
label=row.names(vector_coordinates[2,]))
You can play around with the colours, size of the different elements as you see fit. Coordinates for categorical predictors can be extracted in a similar manner.
I'm trying to make a stack of histograms (or a ridgeplot) so I can compare distributions at certain timepoints in my observations.
I used this source for the histogram, and this for the ridge plots.
However, I cannot figure out how to set up my code to make either a stacked histogram of each length (L) by week, so that I can see L distributions at different weeks. I have tried the fill option in ggplot (which in the example seems to produce automatic color differences for the weeks because it is in the aes()?) and other "stacks" using the y= argument, but haven't had much success, I think due to the way my data is set up. If anyone can help me figure out how to make multiple histograms by week, that would be useful!
Thanks!
#fake data
L = rnorm(100, mean=10, sd=2)
t = c((rep.int(7,10)), (rep.int(14,20)), rep.int(21,30), rep.int(28,20), (rep.int(31, 20)), (rep.int(36,10)))
fake = data.frame(cbind(L,t))
#subset data into weeks for convenience
dayofweek = seq(7,120,7)
fake2 = as.data.frame(subset(fake, t %in% dayofweek))
fake2$week <- floor(fake2$t/7)
#Plots, basic code
ggplot(fake2, aes(x=L, fill=week)) +
geom_histogram()
I tried facet_grid before, but for some reason facet_wrap actually at least separated the graphs correctly, AND magically made the color fill work:
ggplot(fake2, aes(x=L, fill = week)) +
geom_histogram()+
facet_wrap(.~week)
I am loving using geom_density_ridges(), with individual points also included for each group. However, some groups have small sample sizes (e.g. n=1 or 2) precluding the generation of the density ridges. For these groups, I'd like to be able to plot the locations of the existing observations - even though no probability density function will be shown.
In this example, I'd like to be able to plot the 2 data points for May on the appropriate line.
library(tidyverse)
library(ggridges)
data("lincoln_weather")
#pull weather from all months that are NOT May
lincoln_weather_nomay<-lincoln_weather[which(lincoln_weather$Month!="May"),]
#pull weather just from May
lincoln_weather_may<-lincoln_weather[which(lincoln_weather$Month=="May"),]
#recombine, keeping only the first two rows for the May dataset
new_weather<-rbind(lincoln_weather_nomay,lincoln_weather_may[c(1:2),])
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)
How can I add the points for the May observations to the appropriate location (i.e. the May slot) and at the appropriate location along the x-axis?
Simply add a separate geom_point() call to the function, in which you subset the data to include only observations for the previously-unplotted categories. You can apply any of the usual customizations to either 'match' the points plotted for the other categories, or to make these points 'stand out'.
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
geom_point(data=subset(new_weather, Month %in% c("May")),
aes(),shape=13)+
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)
I'm hoping to get some help on making the following histogram looks as nice and understandable as possible. I am plotting the salaries of Immigrant versus US Born workers. I am wondering
1. How would you modify colors, axis intervals, etc. to make the graph more clear/appealing?
2. How could I add a key to indicate purple is for US born workers, and pink is for foreign born?
3. How can I add two different lines to indicate the median of each group? And a corresponding label for each?
My current code is set up as this:
ggplot(NHIS1,aes(x=adj_SALARY, y=..density..)) +
geom_histogram(data=subset(NHIS1,IMMIGRANT=='0'), alpha=.5,binwidth=800, fill="purple",position="identity") + xlim(4430.4,50000) +
geom_vline(xintercept=median(NHIS1$adj_SALARY), col="black", linetype="dashed") +
geom_histogram(data=subset(NHIS1,IMMIGRANT=='1'), alpha=.5,binwidth=800,fill="red") + xlim(4430.4,50000)
geom_vline(xintercept=median(NHIS1$adj_SALARY), col="black", linetype="dashed")
And my final histogram at the moment appears as this:
If you have two variables, one for income , one for immigrant status, you do not need to plot two histograms but one will suffice if you specify the grouping. Also, I'd suggest you also use density lines, which help smooth over the histogram's bumps:
Assuming this is roughly like your data:
df <- data.frame(income = sample(1000:5000, 1000),
born = sample(c("US", "Foreign"), 1000, replace = T))
Then a crude way to plot one histogram as well as density lines for the two groups would be this:
ggplot(df, aes(x=income, color=born, fill=born)) +
geom_histogram(aes(y=..density..), alpha=0.5, binwidth=100,
position="identity") +
geom_density(alpha=.2)
This question has been asked before: overlaying-histograms-with-ggplot2-in-r discusses several options with many examples. You should definitely take a look at it.
Another option to compare the distributions could be violin plots using geom_violin(). I see violin plots as the better option when you need to compare distributions because they give you more flexibility and are still clearer. But that may be just me. Refer to the examples in the manual.
I am analyzing US election data volume from Google trend. I type the below command in R studio.
The poliData dataframe contains the SearchVolume for all months for three Politicians.
ggplot(data = poliData, aes(x=Date, group=Politician, colour=Politician)) +
geom_density()
But I only get the density line (blue) for one politician only with the above command.See the attached picture. Can you please help
I guess you got three lines on top each other because Date variable values are the same for all three politicians. My understanding of your analysis could be something like this:
ggplot(data = poliData,
aes(x=Date, colour=Politician,
weight = SearchVolume/sum(SearchVolume))) +
geom_density()
Adding weight should produce distinct lines for different politicians. If this is not what you wanted, please dput your data for others to work out a solution for you. Also, as I do not have the data, I have not tested the above code yet. Please let me know if it does not work.