How do I graph variables that represent different timepoints? - r

I am trying to make a line graph, that will plot data over 4x time points for different conditions. Right now, I have the conditions as one variable, but the values for each time point are each in their own variable column.
I can't figure out how to best graph it such that the y-axis shows each condition, and the x-axis shows the "score" over each time point.
How do I graph variables that represent different time points?

Related

Counts, bars, bins for each pandas DataFrame histogram subplot

I am making separate histograms of travel distance per departure hour. However, for making further calculations I'd like to have the value of each bin in a histogram, for all histograms.
Up until now, I have the following:
df['Distance'].hist(by=df['Departuretime'], color = 'red',
edgecolor = 'black',figsize=(15,15),sharex=True,density=True)
This creates in my case a figure with 21 small histograms.
With single histograms, I'd paste counts, bins, bars = in front of the entire line and the variable counts would contain the data I was looking for, however, in this case it does not work.
Ideally I'd like a dataframe or list of some sort for each histogram, containing the density values of the bins. I hope someone can help me out! Thanks in advance!
Edit:
Data I'm using, about 2500 columns of this, Distance is float64, the Departuretime is str
Histogram output I'm receiving
Of all these histograms I want to know the y-axis value of each bar, preferably in a dataframe with the distance binning as rows and the hours as columns
By using the 'cut' function you can withdraw the requested data directly from your dataframe, instead of from the graph. This is less error-sensitive.
df['DistanceBin'] = pd.cut(df['Distance'], bins=10)
Then, you can use pivot_table to obtain a table with the counts for each combination of DistanceBin and Departuretime as rows and columns respectively as you asked.
df.pivot_table(index='DistanceBin', columns='Departuretime', aggfunc='count')

Plot points only when there are observations, else draw nothing in ggplot in R

I have a big data frame with IDs, date, and test results in it - I want to run a for loop to go through all the IDs and plot a line graph that shows the evolution of the results across time, but also to add some key-indicators as points in the graph. (key indicators come from a different data frame).
Not all the IDs have all the indicators - the problem is that when I plot these points, I use dplyr filtering to filter for ID == i and something like key_ind != 0. My problem is that when there are no key indicators for a certain ID, the filtered data frame has 0 observations and ggplot returns an error.
I want that when there are no points to plot (the test results still get plotted as a line) - they wont be plotted, but the line graph of the results still be plotted. Does that make sense? How can I do that? I have tried using tryCatch() but it didn't work.

ggplot2 Line graphs in R: Plotting dependent variable on y axis

I am trying to plot the vertical concentration profile of a pollutant. By convention, altitude is plotted on the vertical axis, and concentration is on the x (even though altitude is the independent variable). When plotting the concentrations for pollutants that do not fit a one-to-one function, R connects the points in a most annoying zig-zag pattern, instead of connecting them in order by altitude.
I tried changing the concentration values to factors, with levels based on altitude values:
concSummary$value <- factor(concSummary$value, levels =
concSummary$value[order(concSummary$altitude)])
But this didn't seem to work.
Does anyone know how to get around this problem?
Update: Someone posted a useful solution here: controlling order of points in ggplot2 in R?
Using geom_path() instead of geom_point() tells R to connect points in the order in which they appear in a dataframe. This happened to work for me because the data were ordered by altitude.

Box plot categories from two variables

Sorry for basic question, I am new to R.
I would like to plot a box with subcategories and then with measurements taken over time.
For example I have tried this:
boxplot(field_data$week_1~field_data$field, ylab= 'number of infected plants')
This gives me two box plots (field is either ‘north’ or ‘south’). I want to split each boxplot into two boxplots by "position" variable (1 or 2). Is there a way to make it so that I will still have a plot with 2 main categories defined by "field", but then each will consist of two boxplots defined by "position" variable. I would also then like to plot the results from the ‘week_2’ readings next to the 'week_1' set of box plots. All of the data is in one df. I have other variables ('beds' and 'rows') with different levels too that categorise the measurements taken.
I have tried with ggplot but not sure how to do this or if this is the right function.
Thank you.

R + ggplot2, multiple histograms in the same plot with each histogram normalised to unit area?

Sorry for the newbie R question...
I have a data.frame that contains measurements of a single variable. These measurements will be distributed differently depending on whether the thing being measured is of type A or type B; that is, you can imagine that my column names are: measurement, type label (A or B). I want to plot the histograms of the measurements for A and B separately, and put the two histograms in the same plot, with each histogram normalised to unit area (this is because I expect the proportions of A and B to differ significantly). By unit area, I mean that A and B each have unit area, not that A+B have unit area. Basically, I want something like geom_density, but I don't want a smoothed distributions for each; I want the histogram bars. Not interleaved, but plotted one on top of the other. Not stacked, although it would be interesting to know how to do this also. (The purpose of this plot is to explore differences in the shapes of the distributions that would indicate that there are quantitative differences between A and B that could be used to distinguish between them.) That's all. Two or more histograms -- not smoothed density plots -- in the same plot with each normalised to unit area. Thanks!
Something like this?
# generate example
set.seed(1)
df <- data.frame(Type=c(rep("A",1000),rep("B",4000)),
Value=c(rnorm(1000,mean=25,sd=10),rchisq(4000,15)))
# you start here...
library(ggplot2)
ggplot(df, aes(x=Value))+
geom_histogram(aes(y=..density..,fill=Type),color="grey80")+
facet_grid(Type~.)
Note that there are 4 times as many samples of type B.
You can also set the y-axis scales to float using: scales="free_y" in the call to facet_grid(...).

Resources