I would like to create a time series portrayed visually as a spiral graph like this one. I would like for the ticks to be in months instead of hours. Each spiral will represent years instead of days. I would like to do the option of having the main ticks to be broken into four minor ticks (represented by weeks) or no minor ticks and just have the main ticks of months only.
Time-Spiral Graph
I have included a sample of mock data. The daily temperature means could be binned into four bins (as represented by weeks).
Year Month Day Temperature
1993 January 1 9
1993 January 2 6
1993 January 3 6
1993 January 4 5
1993 January 5 5
1993 January 6 5
1993 January 7 8
1993 January 8 9
1993 January 9 6
1993 January 10 5
1993 January 11 7
1993 January 12 10
1993 January 13 7
1993 January 14 10
1993 January 15 5
1993 January 16 5
1993 January 17 7
1993 January 18 7
1993 January 19 10
1993 January 20 8
1993 January 21 9
1993 January 22 8
1993 January 23 9
1993 January 24 9
1993 January 25 5
1993 January 26 6
1993 January 27 7
1993 January 28 6
1993 January 29 8
1993 January 30 8
1993 January 31 10
1993 February 1 8
1993 February 2 9
1993 February 3 9
1993 February 4 6
1993 February 5 5
1993 February 6 9
1993 February 7 8
1993 February 8 10
1993 February 9 9
1993 February 10 6
1993 February 11 6
1993 February 12 9
1993 February 13 8
1993 February 14 6
1993 February 15 6
1993 February 16 9
1993 February 17 10
1993 February 18 5
1993 February 19 7
1993 February 20 6
1993 February 21 8
1993 February 22 9
1993 February 23 5
1993 February 24 10
1993 February 25 10
1993 February 26 8
1993 February 27 10
1993 February 28 9
1993 March 1 10
1993 March 2 9
1993 March 3 9
1993 March 4 6
1993 March 5 7
1993 March 6 6
1993 March 7 5
1993 March 8 10
1993 March 9 9
1993 March 10 8
1993 March 11 9
1993 March 12 7
1993 March 13 7
1993 March 14 6
1993 March 15 6
1993 March 16 9
1993 March 17 7
1993 March 18 6
1993 March 19 10
1993 March 20 7
1993 March 21 6
1993 March 22 6
1993 March 23 10
1993 March 24 9
1993 March 25 8
1993 March 26 6
1993 March 27 5
1993 March 28 5
1993 March 29 10
1993 March 30 7
1993 March 31 8
1993 April 1 6
1993 April 2 7
1993 April 3 10
1993 April 4 7
1993 April 5 8
1993 April 6 5
1993 April 7 7
1993 April 8 5
1993 April 9 10
1993 April 10 7
1993 April 11 6
1993 April 12 9
1993 April 13 10
1993 April 14 10
1993 April 15 6
1993 April 16 5
There is a thread that shows the code needed to achieve this (How to Create A Time-Spiral Graph Using R); however, I am having a difficulty understanding the code and modifying it to fit my purpose. I am hoping someone can either point me in the right direction or help me customize the code.
Thank you!!
As #42 said, it sounds like you have some other pre-processing to do to get your data ready for what you want.
In ggplot, here's the approach I would take. First get your data printing as a bar chart. Then add an ascending baseline. Finally, use coord_polar to put it around an annual circle.
sample <- data.frame(date = seq.Date(from = as.Date("1993-01-01"), to = as.Date("1996-12-31"), by = 1),
day_num = 1:1461,
temp = rnorm(1461, 10, 2))
# as normal bar
ggplot(sample, aes(date, temp, fill = temp)) +
geom_col() +
scale_fill_viridis_c() + theme_minimal()
# or use the fill pattern below to replicate OP picture:
# scale_fill_gradient2(low="green", mid="yellow", high="red", midpoint=10)
# as ascending bar
ggplot(sample, aes(date, 0.01*day_num + temp/2,
height = temp, fill = temp)) +
geom_tile() +
scale_fill_viridis_c() + theme_minimal()
# as spiral
ggplot(sample, aes(day_num %% 365,
0.05*day_num + temp/2, height = temp, fill = temp)) +
geom_tile() +
scale_y_continuous(limits = c(-20, NA)) +
scale_x_continuous(breaks = 30*0:11, minor_breaks = NULL, labels = month.abb) +
coord_polar() +
scale_fill_viridis_c() + theme_minimal()
Related
The following is my data, of which I would like to plot the monthly frequency. There are missing values.
YEAR MONTH
1960 5
1961 7
1961 8
1961 11
1962 5
1963 6
1964
1965 7
1966 7
1966 7
1966 10
1967 4
1967 8
1968
1969
1970 8
1971 6
1971 9
1971 10
1972 7
1973 6
1973 9
1974 10
1974 10
1975 10
1976
1977
1978 9
1979 11
1980 7
1980 7
1980 8
1981
1982 10
1982 12
1983
1984 7
1985 9
1986
1987
1988 9
1988 10
1989 7
1989 10
1990
1991 7
1992
1993 6
1993 7
1993 9
1993 9
1994
1995 7
1996 8
1996 9
1997 5
1998 8
1998 9
1998 10
1999 8
1999 9
2000 9
2001
2002 1
2003 5
2003 7
2003 8
2003 9
2003 10
2004
2005 11
2006 7
2006 10
2007 9
2007 11
2007 11
2008 5
2009 5
2009 7
2009 9
2009 9
2010 10
2011 5
2011 9
2011 9
2012 8
2013 7
2014 9
2015 7
2016
2017 8
2018 10
2019 11
2020
I used the following code in a Jupyter Notebook. There are other columns but I selected only the month.
#Plot Frequency
ISA = pd.read_csv (r'G\:data.csv', encoding="ISO-8859-1")
ISA = pd.DataFrame(ISA,columns=['YEAR','MONTH','TYPE'])
ISA= ISA[ISA['YEAR'].between(1960,2020, inclusive="both")]
ISA['YEAR'] = pd.to_datetime(ISA['MONTH'])
ISA = ISA.set_index('YEAR')
ISA=ISA.drop(['MSW','TC NAME', 'KNOTS','PAR BEG', 'PAR END'],axis=1)
ISA=ISA.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
ax=ISA.groupby([ISA.index.month, 'MONTH']).count().plot(kind='bar',color='lightgray',width=1, edgecolor='darkgray')
plt.xlabel('Month', color='black', fontsize=14, weight='bold')
plt.ylabel('Monthly frequency' , color='black', fontsize=14, weight='bold',)
plt.xticks([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct','Nov','Dec'],rotation=0, fontsize=12)
ax.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))
plt.yticks(fontsize=12)
plt.ylim(0,20)
plt.suptitle("Monthly Frequency",fontweight='bold',y=0.95,x=0.53)
plt.title("ISA", pad=0)
L=plt.legend()
L.get_texts()[0].set_text('Frequency')
plt.bar_label(ax.containers[0], label_type='center', fontsize=11)
plt.plot()
plt.tight_layout()
plt.show()
Using this code, the resulting plot includes February and other months. It should be zero. Can you help me adjust the bar chart? OR if there is something wrong with my code.
This comes close with your supplied example data:
# Read the initial data to a dataframe
import pandas as pd
from matplotlib import pyplot as plt
import matplotlib.ticker as mtick
ISA = pd.read_csv (r'data.txt', delim_whitespace=True)
ISA = pd.DataFrame(ISA,columns=['YEAR','MONTH'])
ISA['MONTH'] = ISA['MONTH'].astype(dtype='Int64')
ISA= ISA[ISA['YEAR'].between(1960,2020, inclusive="both")]
# Use `value_counts()` with that dataframe to collect counts fixing for the month numbers that are missing
# because no values ever reported for those months in imported data
months_count_collected = {}
for x in range (1,13):
if x in ISA['MONTH'].value_counts():
months_count_collected[x] = ISA['MONTH'].value_counts()[x]
#print(ISA['MONTH'].value_counts()[x])
else:
months_count_collected[x] = 0
#print(0)
# Make a dataframe with the frequency from `months_count_collected` where those with zero counts added back in
df = pd.DataFrame.from_dict(months_count_collected, orient='index', columns = ["Frequency"])
# Make plot from frequency dataframe
ax = df.sort_index().plot(kind='bar',color='lightgray',width=1, edgecolor='darkgray'); # note that `sort_index().` isn't
# needed here but would come in handy perhaps if values for unrepresented months added later/differently and can be useful when developing
# and left in so it's handy; `sort_index()` usee based on https://stackoverflow.com/a/57876952/8508004 .
# Set tick labels to the month names based on https://stackoverflow.com/a/30280076/8508004
ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct','Nov','Dec'],rotation=0, fontsize=12);
ax.set_xlabel('Month', color='black', fontsize=14, weight='bold')
ax.set_ylabel('Annual frequency' , color='black', fontsize=14, weight='bold',)
#ax.set_title("Passage Frequency", pad=0);
#plt.yaxis.set_major_formatter(FormatStrFormatter('%.0f'))
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.0f')) # based on OP code and https://stackoverflow.com/a/36319915/8508004 to import and use `mtick` with Pandas
plt.yticks(fontsize=12)
ax.set_ylim(0,20)
plt.suptitle("Monthly Frequency",fontweight='bold',y=0.95,x=0.53)
plt.title("ISA", pad=0)
L=plt.legend()
L.get_texts()[0].set_text('Frequency')
plt.bar_label(ax.containers[0], label_type='center', fontsize=11)
plt.plot()
plt.tight_layout()
plt.show();
There's probably a more clever way to fill in the months unrepresented in the input.
And titles and labels get generated but may not be correct text right now.
What it makes:
I need to assign populations (denominators) to a data frame in R. For each age group and each year, the populations are different.
My data frame is
Year agegroup count
2000 0-4 24
2000 5-9 36
....
2021 0-4 42
2021 95+ 132
How can I assign each year and age group (row) a different population?
I don't know how to do it, can someone help me? Thanks
Thank you,
I have this data frame:
head(pop)
Year Age_group Count Population
1:00 1993 7 12
2:00 1994 7 18
3:00 1995 7 14
4:00 1993 8 16
5:00 1994 8 26
6:00 1995 8 27
7:00 1996 8 21
… Continue
And I want to put in the populations column the data that I have in another dataframe, so that the result is this:
head(pop1)
Year Age_group Count Population
1:00 1993 7 12 133404
2:00 1994 7 18 155638
3:00 1995 7 14 100053
4:00 1993 8 16 211223
5:00 1994 8 26 111170
6:00 1995 8 27 255691
7:00 1996 8 21 255691
… Continue
Sorry, I have this data frame:
Year agegroup count
2000 0-4 24
2000 5-9 36
....
2021 0-4 42
2021 95+ 132
And I want to put in the populations column the data that I have in another dataframe, so that the result is this:
Year agegroup count population
2000 0-4 24 123500
2000 5-9 36 132600
....
2021 0-4 42 145200
2021 95+ 132 187540
I might be overcomplicating things - would love to know if if there is an easier way to solve this. I have a data frame (df) with 5654 observations - 1332 are foreign-born, and 4322 Canada-born subjects.
The variable df$YR_IMM captures: "In what year did you come to live in Canada?"
See the following distribution of observations per immigration year table(df$YR_IMM) :
1920 1926 1928 1930 1939 1942 1944 1946 1947 1948 1949 1950 1951 1952 1953 1954
2 1 1 2 1 2 1 1 1 9 5 1 7 13 3 5
1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970
10 5 8 6 6 1 5 1 6 3 7 16 18 12 15 13
1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986
10 17 8 18 25 16 15 12 16 27 13 16 11 9 17 16
1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
24 21 31 36 26 30 26 24 22 30 29 26 47 52 53 28 9
Naturally these are only foreign-born individuals (mean = 1985) - however, 348 foreign-borns are missing. There are a total of 4670 NAs that also include Canada-borns subjects.
How can I code these df$YR_IMM NAs in such a way that
348 (NA) --> 1985
4322(NA) --> 100
Additionally, the status is given by df$Brthcoun with 0 = "born in Canada" and 1 = "born outside of Canada.
Hope this makes sense - thank you!
EDIT: This was the solution ->
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 0] <- 100
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 1] <- 1985
Try the below code:
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 0] <- 100
df$YR_IMM[is.na(df$YR_IMM) & df$Brthcoun == 1] <- 1985
I hope this helps!
Something like this should also work:
df$YR_IMM <- ifelse(is.na(df$YR_IMM) & df$Brthcoun == 0, 100, 1985)
I have two data frames. The first one looks like
Country Year production
Germany 1996 11
France 1996 12
Greece 1996 15
UK 1996 17
USA 1996 24
The second one contains all the countries that are in the first data frame plus a few more countries for year 2018. It looks likes this
Country Year production
Germany 2018 27
France 2018 29
Greece 2018 44
UK 2018 46
USA 2018 99
Austria 2018 56
Japan 2018 66
I would like to merge the two data frames, and the final table should look like this:
Country Year production
Germany 1996 11
France 1996 12
Greece 1996 15
UK 1996 17
USA 1996 24
Austria 1996 NA
Japan 1996 NA
Germany 2018 27
France 2018 29
Greece 2018 44
UK 2018 46
USA 2018 99
Austria 2018 56
Japan 2018 66
I've tried several functions including full_join, merge, and rbind but they didn't work. Does anybody have any ideas?
With dplyr and tidyr, you may use:
bind_rows(df1, df2) %>%
complete(Country, Year)
Country Year production
<chr> <int> <int>
1 Austria 1996 NA
2 Austria 2018 56
3 France 1996 12
4 France 2018 29
5 Germany 1996 11
6 Germany 2018 27
7 Greece 1996 15
8 Greece 2018 44
9 Japan 1996 NA
10 Japan 2018 66
11 UK 1996 17
12 UK 2018 46
13 USA 1996 24
14 USA 2018 99
Consider base R with expand.grid and merge (and avoid any dependencies should you be a package author):
# BUILD DF OF ALL POSSIBLE COMBINATIONS OF COUNTRY AND YEAR
all_country_years <- expand.grid(Country=unique(c(df_96$Country, df_18$Country)),
Year=c(1996, 2018))
# MERGE (LEFT JOIN)
final_df <- merge(all_country_years, rbind(df_96, df_18), by=c("Country", "Year"),
all.x=TRUE)
# ORDER DATA AND RESET ROW NAMES
final_df <- data.frame(with(final_df, final_df[order(Year, Country),]),
row.names = NULL)
final_df
# Country Year production
# 1 Germany 1996 11
# 2 France 1996 12
# 3 Greece 1996 15
# 4 UK 1996 17
# 5 USA 1996 24
# 6 Austria 1996 NA
# 7 Japan 1996 NA
# 8 Germany 2018 27
# 9 France 2018 29
# 10 Greece 2018 44
# 11 UK 2018 46
# 12 USA 2018 99
# 13 Austria 2018 56
# 14 Japan 2018 66
Demo
I have managed to aggregate some data into the following:
Month Year Number
1 1 2011 3885
2 2 2011 3713
3 3 2011 6189
4 4 2011 3812
5 5 2011 916
6 6 2011 3813
7 7 2011 1324
8 8 2011 1905
9 9 2011 5078
10 10 2011 1587
11 11 2011 3739
12 12 2011 3560
13 1 2012 1790
14 2 2012 1489
15 3 2012 1907
16 4 2012 1615
I am trying to create a barplot where the bars for the months are next to each other, so for the above example January through April will have two bars (one for 2011 and one for 2012) and the remaining months will only have one bar representing 2011.
I know I have to use beside=T, but I guess I need to create some sort of matrix in order to get the barplot to display properly. I am having an issue figuring out what that step is. I have a feeling it may involve matrix but for some reason I am completely stumped to what seems like a very simple solution.
Also, I have this data: y=c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec') which I would like to use in my names.arg. When I try to use it with the above data it tells me undefined columns selected which I am taking to mean that I need 16 variables in y. How can I fix this?
To use barplot you need to rearrange your data:
dat <- read.table(text = " Month Year Number
1 1 2011 3885
2 2 2011 3713
3 3 2011 6189
4 4 2011 3812
5 5 2011 916
6 6 2011 3813
7 7 2011 1324
8 8 2011 1905
9 9 2011 5078
10 10 2011 1587
11 11 2011 3739
12 12 2011 3560
13 1 2012 1790
14 2 2012 1489
15 3 2012 1907
16 4 2012 1615",sep = "",header = TRUE)
y <- c('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec')
barplot(rbind(dat$Number[1:12],c(dat$Number[13:16],rep(NA,8))),
beside = TRUE,names.arg = y)
Or you can use ggplot2 with the data pretty much as is:
dat$Year <- factor(dat$Year)
dat$Month <- factor(dat$Month)
ggplot(dat,aes(x = Month,y = Number,fill = Year)) +
geom_bar(position = "dodge") +
scale_x_discrete(labels = y)