I'm trying to create a histogram that shows count of event on date for each month so i can see the total for each month. When I create the histogram the left hand side is a density instead of a count.
How to a get a graph that shows total number of a date per month
Example code (rough indication only)
data_toview <- read.csv("file_with_data.csv", stringsAsFactors = FALSE)
#Distribution of count per day, so i know how the data can spike
hist(data_toview$interesting_date, breaks = "days")
Not i may be using the wrong plot type that is why i did not specify histogram in the question title. Also any suggestions to get the months on the labels.
Related
I am trying to run the asdetect package on my time series data. The age data for my time series data starts at -65 years and goes to 13800 years. The dt function by default counts number of rows instead of the actual age. When I customize the interval between tick marks on the x-axis (dt) based on the number of data points divided by the maximum age, it causes the plot to start at 0, which is not accurate. Is there a way to change the x-axis labeling from a time-based scale to the actual age data?
This is what I tried:
detect1 <- asdetect::as_detect(charcoal$influx_area_mm2xcm2xyr, dt=0.5)
or
detect1 <- asdetect::as_detect(charcoal$influx_area_mm2xcm2xyr, dt=38.4)
plot(detect1, type="l", xlab='Time', ylab='Detection Value', ylim=c())
I have netcdf data with lat,lon,time as dimensions and temperature temp as variable. It has daily temperature data for 10 years.
For single location I can plot time series. But how to plot for every year, Year as hue and Months on Xaxis and temp on Y axis. So i want 10 lines as 10 years on my graph. Every line is an year which represents 12 monthly means or daily data. example is here.
And if possible please tell how to add mean and median of all the years as seperate line among these 10 yearly line plots. example picture image example
I'm tempted to agree with the comment that it would be good to show a little more effort in terms of what you've tried. It would also be good to mention what you've read (in e.g. the xarray documentation: https://xarray.pydata.org/en/stable/), which I believe has many of the components you need.
I'll start by setting up some mock data, like you mention, with four years of daily (random) data.
time = pd.date_range("2000-01-01", "2004-12-31")
base = xr.DataArray(
data=np.ones((time.size, 3, 2)),
dims=("time", "lat", "lon"),
coords={
"time": time,
"lat": [1, 2, 3],
"lon": [0.5, 1.5],
},
)
To make the data a bit more comparable with your example, I'm going to add yearly seasonality (based on day of year), and make every year increase by 0.1.
seasonality = xr.DataArray(
data=np.sin((time.dayofyear / 365.0) * (2 * np.pi)),
coords={"time": time},
dims=["time"],
)
trend = xr.DataArray(
data=(time.year - 2000) * 0.1,
coords={"time": time},
dims=["time"],
)
da = base + seasonality + trend
(You can obviously skip these two parts, in your case, you'd only do an xarray.open_dataset() or xarray.open_dataarray`)
I don't think your example is grouped by month: it's too smooth. So I'm going to group by day of year instead.
Let's start by getting a single locations, then using the dt accessor:
https://xarray.pydata.org/en/stable/time-series.html#datetime-components
In this case, it's also most convenient to store the data as a DataFrame, since it essentially becomes a table (month of dayofyear as the rows, separate years etc as columns). First we select one location, and calculate the minimum and maximum values and store them in a pandas DataFrame:
location = da.isel(lat=0, lon=0)
dataframe = location.groupby(da["time"].dt.dayofyear).min().drop(["lat", "lon"]).to_dataframe(name="min")
dataframe["max"] = location.groupby(da["time"].dt.dayofyear).max().values
Next, grab the year by year data, and add it to the DataFrame:
for year, yearda in location.groupby(location["time"].dt.year):
dataframe[year] = pd.Series(index=yearda["time"].dt.dayofyear, data=yearda.values)
If you want monthly values, add another groupby step:
for year, yearda in location.groupby(location["time"].dt.year):
monthly_mean = yearda.groupby(yearda["time"].dt.month).mean()
dataframe[year] = pd.Series(index=monthly_mean["month"], data=monthly_mean.values)
Note that by turning the data into a pandas Series first, it can add the values appriopriately, based on the values of the index (dayofyear here), even though we don't have 366 values for every year.
Next, plot it:
dataframe.plot()
It will automatically assign hue based on the columns.
(My minimum and maximum coincide with 2000 and 2004 due to the way I setup the mock data, ... you get the idea.)
In terms of styling, options, etc., you might like seaborn better:
https://seaborn.pydata.org/index.html
import seaborn as sns
sns.plot(data=dataframe)
If you want to use different styling, different kind of plots (e.g. the colored zones your example has), you'll have to combine different plot, e.g. as follows:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.fill_between(x=dataframe.index, y1=dataframe["min"], y2=dataframe["max"], alpha=0.5, color="orange")
dataframe.plot(ax=ax)
Note that seaborn, pandas, xarray, etc. all use matplotlib behind the scenes. Many of the plotting functions also accept an ax argument, to draw on top of an existing plot.
I am plotting a time series bar chart with a measure for different categories. When I plot the time series bar chart, the width of the bars fills over many dates so that the neighbouring bars touch, even if they are a month apart, but this means that it is unclear which date that bar corresponds to. How do I change the code so that the bars only appear over the date in the underlying dataframe?
I have successfully plotted another time series bar chart with exactly the same ggplot code but different underlying data and so it is unclear to me why this is happening with this particular dataframe.
In this following example, I use a dataframe with only one category for simplicity in highlighting the issue:
data <- data.frame(a = c(as.Date("2019-05-30"), as.Date("2019-06-19")), b = c("FX FORWARD", "FX FORWARD"), c = c(29.2, 74.7))
colnames(data ) <- c("Expiration Date", "Security Type", "Exposure $M")
plot <- ggplot(data , aes(x=`Expiration Date`, y=`Exposure $M`, fill=`Security Type`)) +
geom_bar(stat="identity") + scale_x_date(labels = scales::date_format("%d-%b"), date_breaks = "3 day")
I expected the bars to appear only above the day in which they are stored in the dataframe and not as it is shown in the chart, i.e. $29.2 above 31st May 2019 only and not spreading from 23rd May to 8th June; same for the second data point. Can anyone advise how I may correct this in my code?
Thanks in advance for any help, I've tried looking all over for a solution.
I am a beginner of R.
The question is that you have this data set ready plot the monthly average temperature over the entire length of the data set. That is, on the x-axis plot the months (numbered 1, 2, 3…. 12, 13, 14,…), while on the y-axis plot the monthly averages.
I already have the data set like this
my attempt is
plot(Temp$month,Temp$averagetemp)
the result is
I wonder how to change the code.
You can create a new variable that counts by months:
Temp$month_addative <- Temp$month + (Temp$year-min(Temp$year))*12
plot(Temp$month_additive,Temp$averagetemp)
Assuming 2003 is your lowest year, this will add 12 for every year past 2003 to the month number, creating a cumulative count of months.
I am trying to plot a graph showing the number of events at the Olympics as a function of the year that a specific Olympic took place.
My data frame is called supertable and it consists of 2 columns, the first is the year and the second is the number of events in the games held that year.
My problem is that on the x axis I only get the years 1920 and 1980 and I would like to have 1920,1950,1980,2010
this is my code
ggplot(data = supertable,aes(x=year,y=no.of.events))+geom_point(colour='red')+
scale_x_discrete(breaks=c(1920,1950,1980,2010))
This is the picture I get
I tried doing this
scale_x_discrete(breaks=c(1920,1950,1980,2010),limits=c(1920,1950,1980,2010)
but it didn't help
I am assuming It is some thing small that I am missing, I tried searching for the answer but didn't find it.
Your x-axis is a continuous variable, so you need to use scale_x_continuous.
You used breaks correctly to indicate where your ticks on the x axis are, but the limits value should be a c(min, max) of the range of the plot you want to show.
Try this: scale_x_continuous(breaks=c(1920,1950,1980,2010), limits = c(1920, 2019))