I have daily temperature values for several years, 1949-2010. I would like to calculate monthly means. Here is an example of the data:
head(tmeasmax)
TIMESTEP MEAN.C. MINIMUM.C. MAXIMUM.C. VARIANCE.C.2. STD_DEV.C. SUM COUNT
1949-01-01 6.836547 6.65 7.33 0.02850574 0.1688364 1.426652 6
1949-01-02 10.533371 10.24 10.74 0.06140426 0.2477988 1.426652 6
1949-01-03 18.746729 18.02 19.78 0.18507860 0.4302076 1.426652 6
1949-01-04 21.244562 20.09 22.40 0.76106980 0.8723931 1.426652 6
1949-01-05 3.826716 3.11 5.37 0.52706647 0.7259935 1.426652 6
1949-01-06 9.127782 8.46 10.26 0.20236358 0.4498484 1.426652 6
str(tmeasmax)
'data.frame': 22645 obs. of 8 variables:
$ TIMESTEP : Date, format: "1949-01-01" "1949-01-02" ...
$ MEAN.C. : num 6.84 10.53 18.75 21.24 3.83 ...
$ MINIMUM.C. : num 6.65 10.24 18.02 20.09 3.11 ...
$ MAXIMUM.C. : num 7.33 10.74 19.78 22.4 5.37 ...
$ VARIANCE.C.2.: num 0.0285 0.0614 0.1851 0.7611 0.5271 ...
$ STD_DEV.C. : num 0.169 0.248 0.43 0.872 0.726 ...
$ SUM : num 1.43 1.43 1.43 1.43 1.43 ...
$ COUNT : int 6 6 6 6 6 6 6 6 6 6 ...
There is a previous question that I couldn't make heads or tails of. I imagine I can probably use aggregate, but I don't know how to break up the dates into the years and months and then approach the nesting of the months inside the years. I tried a loop inside of a loop, but I can never get nested loops to work.
EDIT to reply to comments/questions:
I was looking for the mean of "MEAN.C."
Here's a quick data.table solution. I assuming you want the means of MEAN.C. (?)
library(data.table)
setDT(tmeasmax)[, .(MontlyMeans = mean(MEAN.C.)), by = .(year(TIMESTEP), month(TIMESTEP))]
# year month MontlyMeans
# 1: 1949 1 11.71928
You can also do this for all the columns at once if you want
tmeasmax[, lapply(.SD, mean), by = .(year(TIMESTEP), month(TIMESTEP))]
# year month MEAN.C. MINIMUM.C. MAXIMUM.C. VARIANCE.C.2. STD_DEV.C. SUM COUNT
# 1: 1949 1 11.71928 11.095 12.64667 0.2942481 0.482513 1.426652 6
Here's a way to do it with the dplyr package:
library(dplyr)
library(lubridate)
tmeasmax$TIMESTEP = ymd(tmeasmax$TIMESTEP)
tmeasmax %>%
group_by(Year=year(TIMESTEP), Month=month(TIMESTEP)) %>%
summarise(meanDailyMin=mean(MINIMUM.C.),
meanDailyMean=mean(MEAN.C.))
Year Month meanDailyMin meanDailyMean
1 1949 1 11.095 11.71928
You can summarise any other column by month in a similar way.
You can use the lubridate package to create a new factor variable consisting of the year-month combinations, then use aggregate.
library('lubridate')
tmeasmax2 <- within(tmeasmax, {
monthlies <- paste(year(TIMESTEP),
month(TIMESTEP))
})
aggregate(tmeasmax2, list(monthlies), mean, na.rm = TRUE)
Related
So this is I'm sure a fairly elementary problem. I have a data frame that has data for 10 years for a bunch of countries. It looks like this. The data frame is df.
X2003 X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012
Afghanistan 7.321 7.136 6.930 6.702 6.456 6.196 5.928 5.659 5.395 5.141
Albania 2.097 2.004 1.919 1.849 1.796 1.761 1.744 1.741 1.748 1.760
Algeria 2.412 2.448 2.507 2.580 2.656 2.725 2.781 2.817 2.829 2.820
Angola 6.743 6.704 6.657 6.598 6.523 6.434 6.331 6.218 6.099 5.979
Antigua and Barbuda 2.268 2.246 2.224 2.203 2.183 2.164 2.146 2.130 2.115 2.102
Argentina 2.340 2.310 2.286 2.268 2.254 2.241 2.228 2.215 2.201 2.188
The first column is metadata. It hasn't got a name. I'd like to use qplot to plot time series for each of the rows. Something like the following command:
library(ggplot2)
qplot (data = df, binwidth = 1, geom="freqpoly") but I get the following error
Error: stat_bin requires the following missing aesthetics: x.
I would like to set x = first column but I don't have a name on that column. Do I have to create a first column of country names? If so, how do I do that?
Seems like there should be an easier way. Sorry if this is so elementary.
Not sure what you need, maybe something like this?
library(reshape2)
library(ggplot2)
df$metadata <- row.names(df)
df <- melt(df, "metadata")
ggplot(df, aes(variable, value, group = metadata, color = metadata)) +
geom_line()
following your comments, I guess you want this kind of graphic?
# Create a "long" data frame rather than a "wide" data frame.
country <- rep(c("Afghanistan", "Albania", "Algeria","Angola",
"Antigua and Barbuda", "Argentina"),each = 10, times = 1)
year <- rep(c(2003:2012), each = 1, times = 6)
value <- runif(60, 0, 50)
foo <- data.frame(country,year,value,stringsAsFactors=F)
foo$year <- as.factor(foo$year)
# Draw a ggplot figure
ggplot(foo, aes(x=year, y = value,group = country, color = country)) +
geom_line() +
geom_point()
Hi. Here is a very similar solution to what Charles correctly suggested using melt. I've used the package ggvis to produce the plot and made sure the scale of the y-axsis is fixed at 0. The block of code below assumes that df is already read into R.
R Code:
library(reshape2)
library(ggvis)
str(df) # just to demonstrate initial structure of df....results in coment block below
# data.frame': 6 obs. of 11 variables:
# $ Country: chr "Afghanistan" "Albania" "Algeria" "Angola" ...
# $ X2003 : num 7.32 2.1 2.41 6.74 2.27 ...
# $ X2004 : num 7.14 2 2.45 6.7 2.25 ...
# $ X2005 : num 6.93 1.92 2.51 6.66 2.22 ...
# $ X2006 : num 6.7 1.85 2.58 6.6 2.2 ...
# $ X2007 : num 6.46 1.8 2.66 6.52 2.18 ...
# $ X2008 : num 6.2 1.76 2.73 6.43 2.16 ...
# $ X2009 : num 5.93 1.74 2.78 6.33 2.15 ...
# $ X2010 : num 5.66 1.74 2.82 6.22 2.13 ...
# $ X2011 : num 5.39 1.75 2.83 6.1 2.12 ...
# $ X2012 : num 5.14 1.76 2.82 5.98 2.1 ...
df1 <- melt(df, "Country")
df1 %>% ggvis(~factor(variable),~value,stroke=~Country) %>% layer_lines(strokeWidth:=2.5) %>%
add_axis("x",title="Year") %>% scale_numeric("y",zero=TRUE)
I never really started using ggplot, but when I saw ggvis and especially its use of %>% pipe operator introduced in the magrittr package I was hooked. Best....
This question already has answers here:
Code to import data from a Stack overflow query into R
(4 answers)
Closed 1 year ago.
I got a report from a large function in R. It prints a table for plotting. I copied the text format output like this
cyl wt mpg se LLCI ULCI
4.0000 2.1568 27.3998 0.7199 25.9250 28.8746
6.0000 2.1568 23.2805 0.8261 21.5882 24.9727
8.0000 2.1568 19.1611 1.5544 15.9770 22.3452
4.0000 3.3250 21.0658 1.2733 18.4575 23.6742
6.0000 3.3250 18.8352 0.6544 17.4947 20.1758
8.0000 3.3250 16.6046 0.7792 15.0084 18.2008
4.0000 3.8436 18.2540 1.8031 14.5604 21.9476
6.0000 3.8436 16.8619 0.8921 15.0345 18.6892
8.0000 3.8436 15.4697 0.6120 14.2161 16.7234
and pasted it using x <- readClipboard(). Then I summary(x) and got
Length Class Mode
10 character character
How can I change x into a numeric table with headings for plotting? Thanks!
If we need to convert to a data.frame, an option is soread after copying the text (Ctrl + C)
library(overflow)
df1 <- soread()
The output is a data.frame which can be used for plotting
str(df1)
#'data.frame': 9 obs. of 6 variables:
# $ cyl : num 4 6 8 4 6 8 4 6 8
# $ wt : num 2.16 2.16 2.16 3.33 3.33 ...
# $ mpg : num 27.4 23.3 19.2 21.1 18.8 ...
# $ se : num 0.72 0.826 1.554 1.273 0.654 ...
# $ LLCI: num 25.9 21.6 16 18.5 17.5 ...
# $ ULCI: num 28.9 25 22.3 23.7 20.2 ...
Once, we have the data.frame, it can be used with ggplot after reshaping into 'long' format
library(ggplot2)
library(dplyr)
library(tidyr)
df1 %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn) %>%
ggplot(aes(x = rn, y = value, color = name)) +
geom_line() +
theme_bw()
I am simply looking to standardise my set of data frame variables to a 100 point scale. The original variables were on a 10 point scale with 4 decimal points.
I can see that my error is not unheard of e.g
Why am I getting a function error in seemingly similar R code?
Error: only defined on a data frame with all numeric variables with ddply on large dataset
but I have verified that all variables are numeric using
library(foreign)
library(scales)
ches <- read.csv("chesshort15.csv", header = TRUE)
ches2 <- ches[1:244, 3:10]
rescale(ches2, to = c(0,100), from = range(ches2, na.rm = TRUE, finite = TRUE))
This gives the error: Error in FUN(X[[i]], ...) :
only defined on a data frame with all numeric variables
I have verified that all variables are of type numeric using str(ches2) - see below:
'data.frame': 244 obs. of 8 variables:
$ galtan : num 8.8 9 9.65 8.62 8 ...
$ civlib_laworder : num 8.5 8.6 9.56 8.79 8.56 ...
$ sociallifestyle : num 8.89 7.2 9.65 9.21 8.25 ...
$ immigrate_policy : num 9.89 9.6 9.38 9.43 9.13 ...
$ multiculturalism : num 9.9 9.6 9.57 8.77 9.07 ...
$ ethnic_minorities : num 8.8 9.6 9.87 9 8.93 ...
$ nationalism : num 9.4 10 9.82 9 8.81 ...
$ antielite_salience: num 8 9 9.47 8.88 8.38
In short, I'm stumped as to why it refuses to carry out the code.
For info, Head(bb) gives :
galtan civlib_laworder sociallifestyle immigrate_policy multiculturalism ethnic_minorities
1 8.800 8.500 8.889 9.889 9.900 8.800
2 9.000 8.600 7.200 9.600 9.600 9.600
3 9.647 9.563 9.647 9.375 9.571 9.867
4 8.625 8.786 9.214 9.429 8.769 9.000
5 8.000 8.563 8.250 9.133 9.071 8.929
6 7.455 8.357 7.923 8.800 7.800 8.455
nationalism antielite_salience
1 9.400 8.000
2 10.000 9.000
3 9.824 9.471
4 9.000 8.882
5 8.813 8.375
6 8.000 8.824
The rescale function is throwing that error because it expects a numeric vector, and you are feeding it a dataframe instead. You need to iterate; go through every column on your dataframe and scale them individually.
Try this:
sapply(ches2, rescale, to = c(0,100))
You don't need the range(ches2, na.rm = TRUE, finite = TRUE) portion of your code because rescale is smart enough to remove NA values on its own
I'm using continuous Morlet wavelet transform (cwt) analysis over a time series by the use of the R-package dplR. The time series corresponds to a 15min data (gam_15min) with length 7968 (corresponding to 83 days of measurements).
I have the following output:
cwtGamma=morlet(gam_15min,x1=seq_along(gam_15min),p2=NULL,dj=0.1,siglvl=0.95)
str(cwtGamma)
List of 9
$ y : Time-Series [1:7968] from 1 to 1993: 672 674 673 672 672 ...
$ x : int [1:7968] 1 2 3 4 5 6 7 8 9 10 ...
$ wave : cplx [1:7968, 1:130] -0.00332+0.0008i 0.00281-0.00181i -0.00194+0.00234i ...
$ coi : num [1:7968] 0.73 1.46 2.19 2.92 3.65 ...
$ period: num [1:130] 1.03 1.11 1.19 1.27 1.36 ...
$ Scale : num [1:130] 1 1.07 1.15 1.23 1.32 ...
$ Signif: num [1:130] 0.000382 0.001418 0.005197 0.018514 0.062909 ...
$ Power : num [1:7968, 1:130] 1.17e-05 1.11e-05 9.26e-06 7.09e-06 5.54e-06 ...
$ siglvl: num 0.95
In my analysis I want to truncate the time-series (I suppose $wave) by removing 1 period length in the beginning and 1 period length at the end. how do I do that? maybe its easy but I'm seeing how... Thanks
So this is I'm sure a fairly elementary problem. I have a data frame that has data for 10 years for a bunch of countries. It looks like this. The data frame is df.
X2003 X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012
Afghanistan 7.321 7.136 6.930 6.702 6.456 6.196 5.928 5.659 5.395 5.141
Albania 2.097 2.004 1.919 1.849 1.796 1.761 1.744 1.741 1.748 1.760
Algeria 2.412 2.448 2.507 2.580 2.656 2.725 2.781 2.817 2.829 2.820
Angola 6.743 6.704 6.657 6.598 6.523 6.434 6.331 6.218 6.099 5.979
Antigua and Barbuda 2.268 2.246 2.224 2.203 2.183 2.164 2.146 2.130 2.115 2.102
Argentina 2.340 2.310 2.286 2.268 2.254 2.241 2.228 2.215 2.201 2.188
The first column is metadata. It hasn't got a name. I'd like to use qplot to plot time series for each of the rows. Something like the following command:
library(ggplot2)
qplot (data = df, binwidth = 1, geom="freqpoly") but I get the following error
Error: stat_bin requires the following missing aesthetics: x.
I would like to set x = first column but I don't have a name on that column. Do I have to create a first column of country names? If so, how do I do that?
Seems like there should be an easier way. Sorry if this is so elementary.
Not sure what you need, maybe something like this?
library(reshape2)
library(ggplot2)
df$metadata <- row.names(df)
df <- melt(df, "metadata")
ggplot(df, aes(variable, value, group = metadata, color = metadata)) +
geom_line()
following your comments, I guess you want this kind of graphic?
# Create a "long" data frame rather than a "wide" data frame.
country <- rep(c("Afghanistan", "Albania", "Algeria","Angola",
"Antigua and Barbuda", "Argentina"),each = 10, times = 1)
year <- rep(c(2003:2012), each = 1, times = 6)
value <- runif(60, 0, 50)
foo <- data.frame(country,year,value,stringsAsFactors=F)
foo$year <- as.factor(foo$year)
# Draw a ggplot figure
ggplot(foo, aes(x=year, y = value,group = country, color = country)) +
geom_line() +
geom_point()
Hi. Here is a very similar solution to what Charles correctly suggested using melt. I've used the package ggvis to produce the plot and made sure the scale of the y-axsis is fixed at 0. The block of code below assumes that df is already read into R.
R Code:
library(reshape2)
library(ggvis)
str(df) # just to demonstrate initial structure of df....results in coment block below
# data.frame': 6 obs. of 11 variables:
# $ Country: chr "Afghanistan" "Albania" "Algeria" "Angola" ...
# $ X2003 : num 7.32 2.1 2.41 6.74 2.27 ...
# $ X2004 : num 7.14 2 2.45 6.7 2.25 ...
# $ X2005 : num 6.93 1.92 2.51 6.66 2.22 ...
# $ X2006 : num 6.7 1.85 2.58 6.6 2.2 ...
# $ X2007 : num 6.46 1.8 2.66 6.52 2.18 ...
# $ X2008 : num 6.2 1.76 2.73 6.43 2.16 ...
# $ X2009 : num 5.93 1.74 2.78 6.33 2.15 ...
# $ X2010 : num 5.66 1.74 2.82 6.22 2.13 ...
# $ X2011 : num 5.39 1.75 2.83 6.1 2.12 ...
# $ X2012 : num 5.14 1.76 2.82 5.98 2.1 ...
df1 <- melt(df, "Country")
df1 %>% ggvis(~factor(variable),~value,stroke=~Country) %>% layer_lines(strokeWidth:=2.5) %>%
add_axis("x",title="Year") %>% scale_numeric("y",zero=TRUE)
I never really started using ggplot, but when I saw ggvis and especially its use of %>% pipe operator introduced in the magrittr package I was hooked. Best....