Plot every year as line with months on Xaxis and variable on Y-axis from NetCDF - netcdf

I have netcdf data with lat,lon,time as dimensions and temperature temp as variable. It has daily temperature data for 10 years.
For single location I can plot time series. But how to plot for every year, Year as hue and Months on Xaxis and temp on Y axis. So i want 10 lines as 10 years on my graph. Every line is an year which represents 12 monthly means or daily data. example is here.
And if possible please tell how to add mean and median of all the years as seperate line among these 10 yearly line plots. example picture image example

I'm tempted to agree with the comment that it would be good to show a little more effort in terms of what you've tried. It would also be good to mention what you've read (in e.g. the xarray documentation: https://xarray.pydata.org/en/stable/), which I believe has many of the components you need.
I'll start by setting up some mock data, like you mention, with four years of daily (random) data.
time = pd.date_range("2000-01-01", "2004-12-31")
base = xr.DataArray(
data=np.ones((time.size, 3, 2)),
dims=("time", "lat", "lon"),
coords={
"time": time,
"lat": [1, 2, 3],
"lon": [0.5, 1.5],
},
)
To make the data a bit more comparable with your example, I'm going to add yearly seasonality (based on day of year), and make every year increase by 0.1.
seasonality = xr.DataArray(
data=np.sin((time.dayofyear / 365.0) * (2 * np.pi)),
coords={"time": time},
dims=["time"],
)
trend = xr.DataArray(
data=(time.year - 2000) * 0.1,
coords={"time": time},
dims=["time"],
)
da = base + seasonality + trend
(You can obviously skip these two parts, in your case, you'd only do an xarray.open_dataset() or xarray.open_dataarray`)
I don't think your example is grouped by month: it's too smooth. So I'm going to group by day of year instead.
Let's start by getting a single locations, then using the dt accessor:
https://xarray.pydata.org/en/stable/time-series.html#datetime-components
In this case, it's also most convenient to store the data as a DataFrame, since it essentially becomes a table (month of dayofyear as the rows, separate years etc as columns). First we select one location, and calculate the minimum and maximum values and store them in a pandas DataFrame:
location = da.isel(lat=0, lon=0)
dataframe = location.groupby(da["time"].dt.dayofyear).min().drop(["lat", "lon"]).to_dataframe(name="min")
dataframe["max"] = location.groupby(da["time"].dt.dayofyear).max().values
Next, grab the year by year data, and add it to the DataFrame:
for year, yearda in location.groupby(location["time"].dt.year):
dataframe[year] = pd.Series(index=yearda["time"].dt.dayofyear, data=yearda.values)
If you want monthly values, add another groupby step:
for year, yearda in location.groupby(location["time"].dt.year):
monthly_mean = yearda.groupby(yearda["time"].dt.month).mean()
dataframe[year] = pd.Series(index=monthly_mean["month"], data=monthly_mean.values)
Note that by turning the data into a pandas Series first, it can add the values appriopriately, based on the values of the index (dayofyear here), even though we don't have 366 values for every year.
Next, plot it:
dataframe.plot()
It will automatically assign hue based on the columns.
(My minimum and maximum coincide with 2000 and 2004 due to the way I setup the mock data, ... you get the idea.)
In terms of styling, options, etc., you might like seaborn better:
https://seaborn.pydata.org/index.html
import seaborn as sns
sns.plot(data=dataframe)
If you want to use different styling, different kind of plots (e.g. the colored zones your example has), you'll have to combine different plot, e.g. as follows:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.fill_between(x=dataframe.index, y1=dataframe["min"], y2=dataframe["max"], alpha=0.5, color="orange")
dataframe.plot(ax=ax)
Note that seaborn, pandas, xarray, etc. all use matplotlib behind the scenes. Many of the plotting functions also accept an ax argument, to draw on top of an existing plot.

Related

Problem running asdetect package in r with original age data

I am trying to run the asdetect package on my time series data. The age data for my time series data starts at -65 years and goes to 13800 years. The dt function by default counts number of rows instead of the actual age. When I customize the interval between tick marks on the x-axis (dt) based on the number of data points divided by the maximum age, it causes the plot to start at 0, which is not accurate. Is there a way to change the x-axis labeling from a time-based scale to the actual age data?
This is what I tried:
detect1 <- asdetect::as_detect(charcoal$influx_area_mm2xcm2xyr, dt=0.5)
or
detect1 <- asdetect::as_detect(charcoal$influx_area_mm2xcm2xyr, dt=38.4)
plot(detect1, type="l", xlab='Time', ylab='Detection Value', ylim=c())

Add points to geom_density_ridges for groups with small number of observations

I am loving using geom_density_ridges(), with individual points also included for each group. However, some groups have small sample sizes (e.g. n=1 or 2) precluding the generation of the density ridges. For these groups, I'd like to be able to plot the locations of the existing observations - even though no probability density function will be shown.
In this example, I'd like to be able to plot the 2 data points for May on the appropriate line.
library(tidyverse)
library(ggridges)
data("lincoln_weather")
#pull weather from all months that are NOT May
lincoln_weather_nomay<-lincoln_weather[which(lincoln_weather$Month!="May"),]
#pull weather just from May
lincoln_weather_may<-lincoln_weather[which(lincoln_weather$Month=="May"),]
#recombine, keeping only the first two rows for the May dataset
new_weather<-rbind(lincoln_weather_nomay,lincoln_weather_may[c(1:2),])
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)
How can I add the points for the May observations to the appropriate location (i.e. the May slot) and at the appropriate location along the x-axis?
Simply add a separate geom_point() call to the function, in which you subset the data to include only observations for the previously-unplotted categories. You can apply any of the usual customizations to either 'match' the points plotted for the other categories, or to make these points 'stand out'.
ggplot( new_weather, aes(x=`Min Temperature [F]`,y=Month,fill=Month))+
geom_density_ridges(alpha = 0.5,jittered_points = TRUE, point_alpha=1,point_shape=21) +
geom_point(data=subset(new_weather, Month %in% c("May")),
aes(),shape=13)+
labs(x="Average temperature (F)",y='')+
guides(fill=FALSE,color=FALSE)

plotting multiple lines in ggplot R

I have neuroscientific data where we count synapses/cells in the cochlea and quantify these per frequency. We do this for animals of different ages. What I thus ideally want is the frequencies (5,10,20,30,40) in the x-axis and the amount of synapses/cells plotted on the y-axis (usually a numerical value from 10 - 20). The graph then will contain 5 lines of the different ages (6 weeks, 17 weeks, 43 weeks, 69 weeks and 96 weeks).
I try this with ggplot and first just want to plot one age. When I use the following command:
ggplot(mydata, aes(x=Frequency, y=puncta6)) + geom_line()
I get a graph, but no line and the following error: 'geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?'
So I found I have to adjust the code to:
ggplot(mydata, aes(x=Frequency, y=puncta6, group = 1)) + geom_line()
This works, except for the fact that my first data point (5 kHz) is now plotted behind my last data point (40 kHz)......... (This also happens without the 'group = 1' addition). How do I solve this or is there an easier way to plot this kind of data?
I couldnt add a file so I added a photo of my code + graph with the 5 kHz data point oddly located and I added a photo of my data in excel.
example data
example code and graph

choosing specific values on the X axis when using ggplot2

I am trying to plot a graph showing the number of events at the Olympics as a function of the year that a specific Olympic took place.
My data frame is called supertable and it consists of 2 columns, the first is the year and the second is the number of events in the games held that year.
My problem is that on the x axis I only get the years 1920 and 1980 and I would like to have 1920,1950,1980,2010
this is my code
ggplot(data = supertable,aes(x=year,y=no.of.events))+geom_point(colour='red')+
scale_x_discrete(breaks=c(1920,1950,1980,2010))
This is the picture I get
I tried doing this
scale_x_discrete(breaks=c(1920,1950,1980,2010),limits=c(1920,1950,1980,2010)
but it didn't help
I am assuming It is some thing small that I am missing, I tried searching for the answer but didn't find it.
Your x-axis is a continuous variable, so you need to use scale_x_continuous.
You used breaks correctly to indicate where your ticks on the x axis are, but the limits value should be a c(min, max) of the range of the plot you want to show.
Try this: scale_x_continuous(breaks=c(1920,1950,1980,2010), limits = c(1920, 2019))

Plotting multiple frequency polygon lines using ggplot2

I have a dataset with records that have two variables: "time" which are id's of decades, and "latitude" which are geographic latitudes. I have 7 time periods (numbered from 26 to 32).
I want to visualize a potential shift in latitude through time. So what I need ggplot2 to do, is to plot a graph with latitude on the x-axis and the count of records at a certain latitude on the y-axis. I need it do this for the seperate time periods and plot everything in 1 graph.
I understood that I need the function freqpoly from ggplot2, and I got this so far:
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25)
This gives me the correct graph of the data, ignoring the time. But how can I implement the time? I tried subsetting the data, but I can't really figure out if this is the best way..
So basically I'm trying to get a graph with 7 lines showing the frequency distribution in each decade in order to look for a latitude shift.
Thanks!!
Without sample data it is hard to answer but try to add color=factor(time) (where time is name of your column with time periods). This will draw lines for each time period in different color.
qplot(latitude, data = lat_data, geom = "freqpoly", binwidth = 0.25,
color=factor(time))

Resources