I am trying to run a discriminant analysis to identify variables that separate different populations and season. I have 5 estimated continuous variables which i am using to identify the separation between these populations and seasons.
My factor variables are season and SITE. My continuous variables are calcNDVI, meanNDVI, maxNDVI, minNDVI, cvNDVI, diffNDVIvals.
head(df)
X x y date dx dy dist dt R2n abs.angle
3 6677 15.380 52.210 2010-08-12 1.960 -5.900 6.2170411 86400 16.95890 -1.250063
4 6678 17.340 46.310 2010-08-13 -3.300 -0.900 3.4205263 86400 105.41690 -2.875341
5 6679 14.040 45.410 2010-08-14 -1.980 -0.055 1.9807637 86400 106.77890 -3.113822
6 6680 12.060 45.355 2010-08-15 -0.495 0.675 0.8370484 86400 108.54852 2.203545
7 6681 11.565 46.030 2010-08-16 -0.360 0.105 0.3750000 86400 96.40842 2.857799
8 6682 11.205 46.135 2010-08-17 -0.245 -0.485 0.5433691 86400 95.70065 -2.038559
rel.angle id burst SITE COUNTRY year month newDate
3 -0.02783079 21333_A31271 21333_A31271 SOUTH.SWEDEN SWEDEN 2010 8 X2010.08.12
4 -1.62527754 21333_A31271 21333_A31271 SOUTH.SWEDEN SWEDEN 2010 8 X2010.08.13
5 -0.23848141 21333_A31271 21333_A31271 SOUTH.SWEDEN SWEDEN 2010 8 X2010.08.14
6 -0.96581813 21333_A31271 21333_A31271 SOUTH.SWEDEN SWEDEN 2010 8 X2010.08.15
7 0.65425338 21333_A31271 21333_A31271 SOUTH.SWEDEN SWEDEN 2010 8 X2010.08.16
8 1.38682762 21333_A31271 21333_A31271 SOUTH.SWEDEN SWEDEN 2010 8 X2010.08.17
calcNDVI meanNDVI maxNDVI minNDVI cvNDVI diffNDVIvals yDay seas
3 7542.487 6296.268 8399 978 20.82924 7421 224 Aug-Sep
4 5018.169 5906.929 7908 3181 22.97476 4727 225 Aug-Sep
5 7513.909 6390.036 8172 3803 22.54474 4369 226 Aug-Sep
6 5763.429 4564.911 7120 2456 25.60007 4664 227 Aug-Sep
7 6161.736 6115.429 8052 1217 25.97495 6835 228 Aug-Sep
8 7995.656 6207.036 7852 2191 20.11494 5661 229 Aug-Sep
As far as i know my variables are in correct format i.e numeric and factors.
now when i run a DA using ade4 package, i get an error which i am not sure what it means:
df.pca=dudi.pca(df[,19:24],scan=F)
df.dis=discrimin(df.pca,interaction(df$SITE,df$seas),scan=F)
Error in if (any(row.w < 0)) stop("row weight < 0") :
missing value where TRUE/FALSE needed
First i thought it is probably because of NAs, but its not.
Any thoughts?
I replicated the error with mtcarssince you didn't provide a dput output, and pasting from the clipboard didn't work:
> df = mtcars
> df.pca = dudi.pca(df,scannf=F)
> df.disc = discrimin(dudi=df.pca,interaction(df$carb,df$cyl),scan=F)
Gives:
Error in if (any(row.w < 0)) stop("row weight < 0") :
missing value where TRUE/FALSE needed
However, a little tweak fixed the problem: I just specified the fac option and made it into factor, even though str(interaction(df$carb,df$cyl))returns a factor.
df.disc = discrimin(dudi=df.pca,fac=factor(interaction(df$carb,df$cyl)),scan=F)
Returns no errors
Related
I'm having a lot of trouble plotting my time series data in R Studio. My data is laid out as follows:
tsf
Time Series:
Start = 1995
End = 2021
Frequency = 1
Jan Feb Mar Apr May Jun July Aug Sep Oct Nov Dec
1995 10817 8916 9697 10314 9775 7125 9007 6000 4155 3692 2236 996
1996 12773 12562 13479 14280 13839 9168 10959 6582 5162 4815 3768 1946
1997 14691 12982 13545 14131 14162 10415 11420 7870 6340 6869 6777 6637
1998 17192 15480 14703 16903 15921 13381 13779 9127 6676 6511 5419 3447
1999 13578 19470 23411 18190 18979 17296 16588 12561 10405 8537 7304 4003
2000 20100 29419 30125 27147 27832 23874 19728 15847 11477 9301 6933 3486
2001 16528 22258 22146 19027 19436 15688 14558 10609 6799 6563 4816 2480
2002 14724 19424 21391 17215 18775 13017 14385 10044 7649 6598 4497 2766
2003 17051 20182 18564 18484 15365 12180 13313 8859 6830 6371 3781 2012
2004 16875 20084 21150 19057 16153 13619 14144 9599 7390 5830 3763 2033
2005 20002 24153 23160 20864 18331 14950 14149 11086 7475 6290 3779 2134
2006 24605 26384 24858 20634 18951 15048 14905 10749 7259 5479 3074 1509
2007 29281 26495 25974 21427 20232 15465 15738 10006 6674 5301 2857 1304
2008 32961 24290 20190 17587 12172 7369 16175 6822 4364 2699 1174 667
2009 10996 8793 7345 5558 4840 4833 4355 2422 2272 1596 948 474
2010 10469 11707 12379 9599 8893 8314 7018 5310 4683 3742 2146 647
2011 13624 13470 12390 11171 9359 9240 6953 3653 2861 2216 1398 597
2012 14507 10993 10581 9388 7986 5481 6164 3736 2783 2442 1421 774
2013 10735 9671 10596 8113 7095 3293 9306 4504 3257 2832 1307 639
2014 15975 11906 11485 11757 7767 3390 14037 6201 4376 3082 1465 920
2015 20105 15384 17054 13166 9027 3924 21290 8572 5924 3943 1874 847
2016 27106 21173 20096 14847 10125 4143 22462 9781 5842 3831 1846 679
2017 26668 16905 17180 13427 9581 3585 21316 8105 4828 3255 1594 601
2018 25813 16501 16088 11557 9362 3716 20743 7681 4397 2874 1647 778
2019 22279 14178 14404 13794 9126 3858 18741 7202 4104 3214 1676 729
2020 20665 13263 10239 1338 1490 2189 15329 7360 5747 4189 1468 1032
2021 16948 11672 10672 8214 7337 4980 20232 8563 6354 3882 2167 832
When I attempt rudimentary code to plot the data I get the following
plot(tsf)
'Error in plotts(x = x, y = y, plot.type = plot.type, xy.labels = xy.labels, :
cannot plot more than 10 series as "multiple"'
My data is monthly and therefore 12 months exceed this apparent limit of 10 graphs.I've been able to make some plot by excluding two months but this is not practical for me.
I've looked at lots of answers on this, many of which recommending ggplot() {ggplot2}
The link below had data most closely resembling my data but I still wasn't able to apply it.
issues plotting multivariate time series in R
Any help greatly appreciated.
I think the problem is with the shape of your data. It's indicating Frequency = 1, showing that it thinks the monthly columns are separate yearly time series, rather than a continuous time series across months. To plot the whole time length you can reshape your time series to match monthly frequency (from a simulated dataset of values):
tsf_switched <- ts(as.vector(t(tsf)), start = 1995, frequency = 12)
plot(tsf)
Created on 2022-05-07 by the reprex package (v2.0.1)
one solution with {ggplot2} and two convenience libraries:
library(dplyr)
library(tsbox) ## for convenient ts -> dataframe conversion
library(lubridate) ## time and date manipulation
## example monthly data for twelve years:
example_ts <- ts(runif(144), start = 2010, end = 2021, frequency = 12)
ts_data.frame(example_ts) %>% ## {tsbox}
mutate(year = year(time),
day_month = paste0(day(time),'/', month(time))
) %>%
ggplot() +
geom_line(aes(day_month,
value,
group = year
)
)
ways to convert time series to dataframes (required as ggplot input): Transforming a time-series into a data frame and back
I have a dataframe with 18 column and I want to see seasonally adjusted state of each variables on a single chart.
Here is head of my dataframe;
head(cityFootfall))
Istanbul Eskisehir Mersin
1 44280 12452 11024
2 58713 13032 12773
3 21235 5629 5749
4 20934 5968 5764
5 21667 6022 5752
6 21386 6281 5920
Ankara Bursa Adana Izmir
1 19073 5098 8256 15623
2 22812 7551 10631 18511
3 8777 2260 3733 8625
4 8798 2252 3536 8573
5 8893 2398 3641 9713
6 8765 2391 3618 10542
Kayseri Antalya Konya
1 8450 2969 4492
2 8378 4421 0
3 3491 1744 0
4 3414 1833 0
5 3596 1733 0
6 3481 1785 1154
Samsun Kahramanmaras Aydin
1 4472 4382 4376
2 4996 4773 5561
3 1662 1865 2012
4 1775 1710 1957
5 1700 1704 1940
6 1876 1848 1437
Gaziantep Sanliurfa Izmit
1 3951 3752 3825
2 5412 4707 4125
3 2021 1326 1890
4 1960 1411 1918
5 1737 1204 1960
6 1833 1143 2047
Denizli Malatya
1 2742 3809
2 3658 4346
3 1227 1975
4 1172 1884
5 1102 2073
6 1171 2060
Here is my function for this:
plot_seasonality=function(x){
par(mfrow=c(6,3))
plot_draw=lapply(x, function(x) plot(decompose(ts(x,freq=7),type="additive")$x-decompose(ts(x,freq=7),type="additive")$seasonal)
}
plot_seasonality(cityFootfall)
When I run this function I get error says: Error in plot.new() : figure margins too large but when I change my codes frompar(mfrow=c(6,3) to par(mfrow=c(3,3) its works and give me last 9 columns plot like this image but I want to see all variable in a single chart
Could anyone help me about solve my problem?
Fundamentally your windows is not big enough to plot that:
1) open a big window with dev.new(), or from Rstudio X11() under linux or quartz() under MacOSX)
2) simplify your ylab that will free space
# made up data
x <- seq(0,14,length.out=14*10)
y <- matrix(rnorm(14*10*6*3),nrow=3*6)
# large window (may use `X11()` on linux/Rstudio to force opening of new window)
dev.new(width=20,height=15)
par(mfrow=c(6,3))
# I know you could do that with `lapply` but don't listen to the fatwas
# on `for` loops it often does exactly the job you need in R
for(i in 1:dim(y)[1]){
plot(x,y[i,],xlab="Time",ylab=paste("variable",i),type="l")
}
You should also consider plotting several variables in the same graph (using lines after an initial plot).
As suggested: transform data in long format with package tidyr, see function gather:
I added a time variables since it was missing.
temp <- cityFootfall %>% transform(time = 1:nrow(temp)) %>% gather(variable, key, -time)
Now plot it with ggplot2(default settings, you can adjust this like you want)
gplot(temp, aes(x = time, y = key, group = variable, color = variable)) + geom_point() + geom_line()
Please before make it as duplicate read carefully my question!
I am new in R and I am trying to figure it out how to calculate the sequential date difference from one row/variable compare to the next row/variable in based on weeks and create another field/column for making a graph accordingly.
There are couple of answer here Q1 , Q2 , Q3 but none specifically talk about making difference in one column sequentially between rows lets say from top to bottom.
Below is the example and the expected results:
Date Var1
2/6/2017 493
2/20/2017 558
3/6/2017 595
3/20/2017 636
4/6/2017 697
4/20/2017 566
5/5/2017 234
Expected
Date Var1 week
2/6/2017 493 0
2/20/2017 558 2
3/6/2017 595 4
3/20/2017 636 6
4/6/2017 697 8
4/20/2017 566 10
5/6/2017 234 12
You can use a similar approach to that in your first linked answer by saving the difftime result as a new column in your data frame.
# Set up data
df <- read.table(text = "Date Var1
2/6/2017 493
2/20/2017 558
3/6/2017 595
3/20/2017 636
4/6/2017 697
4/20/2017 566
5/5/2017 234", header = T)
df$Date <- as.Date(as.character(df$Date), format = "%m/%d/%Y")
# Create exact week variable
df$week <- difftime(df$Date, first(df$Date), units = "weeks")
# Create rounded week variable
df$week2 <- floor(difftime(df$Date, first(df$Date), units = "weeks"))
df
# Date Var1 week week2
# 2017-02-06 493 0.000000 weeks 0 weeks
# 2017-02-20 558 2.000000 weeks 2 weeks
# 2017-03-06 595 4.000000 weeks 4 weeks
# 2017-03-20 636 6.000000 weeks 6 weeks
# 2017-04-06 697 8.428571 weeks 8 weeks
# 2017-04-20 566 10.428571 weeks 10 weeks
# 2017-05-05 234 12.571429 weeks 12 weeks
This question already has answers here:
Aggregate Daily Data to Month/Year intervals
(9 answers)
Closed 7 years ago.
I have day-wise data of interest rate of 15 years from 01-01-2000 to 01-01-2015.
I want to convert this data to monthly data, which only having month and year.
I want to take mean of the values of all the days in a month and make it one value of that month.
How can I do this in R.
> str(mibid)
'data.frame': 4263 obs. of 6 variables:
$ Days: int 1 2 3 4 5 6 7 8 9 10 ...
$ Date: Date, format: "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06" ...
$ BID : num 8.82 8.82 8.88 8.79 8.78 8.8 8.81 8.82 8.86 8.78 ...
$ I.S : num 0.092 0.0819 0.0779 0.0801 0.074 0.0766 0.0628 0.0887 0.0759 0.073 ...
$ BOR : num 9.46 9.5 9.52 9.36 9.33 9.37 9.42 9.39 9.4 9.33 ...
$ R.S : num 0.0822 0.0817 0.0828 0.0732 0.084 0.0919 0.0757 0.0725 0.0719 0.0564 ...
> head(mibid)
Days Date BID I.S BOR R.S
1 1 2000-01-03 8.82 0.0920 9.46 0.0822
2 2 2000-01-04 8.82 0.0819 9.50 0.0817
3 3 2000-01-05 8.88 0.0779 9.52 0.0828
4 4 2000-01-06 8.79 0.0801 9.36 0.0732
5 5 2000-01-07 8.78 0.0740 9.33 0.0840
6 6 2000-01-08 8.80 0.0766 9.37 0.0919
>
I'd do this with xts:
set.seed(21)
mibid <- data.frame(Date=Sys.Date()-100:1,
BID=rnorm(100, 8, 0.1), I.S=rnorm(100, 0.08, 0.01),
BOR=rnorm(100, 9, 0.1), R.S=rnorm(100, 0.08, 0.01))
require(xts)
# convert to xts
xmibid <- xts(mibid[,-1], mibid[,1])
# aggregate
agg_xmibid <- apply.monthly(xmibid, colMeans)
# convert back to data.frame
agg_mibid <- data.frame(Date=index(agg_xmibid), agg_xmibid, row.names=NULL)
head(agg_mibid)
# Date BID I.S BOR R.S
# 1 2015-04-30 8.079301 0.07189111 9.074807 0.06819096
# 2 2015-05-31 7.987479 0.07888328 8.999055 0.08090253
# 3 2015-06-30 8.043845 0.07885779 9.018338 0.07847999
# 4 2015-07-31 7.990822 0.07799489 8.980492 0.08162038
# 5 2015-08-07 8.000414 0.08535749 9.044867 0.07755017
A small example of how this might be done using dplyr and lubridate
set.seed(321)
dat <- data.frame(day=seq.Date(as.Date("2010-01-01"), length.out=200, by="day"),
x = rnorm(200),
y = rexp(200))
head(dat)
day x y
1 2010-01-01 1.7049032 2.6286754
2 2010-01-02 -0.7120386 0.3916089
3 2010-01-03 -0.2779849 0.1815379
4 2010-01-04 -0.1196490 0.1234461
5 2010-01-05 -0.1239606 2.2237404
6 2010-01-06 0.2681838 0.3217511
require(dplyr)
require(lubridate)
dat %>%
mutate(year = year(day),
monthnum = month(day),
month = month(day, label=T)) %>%
group_by(year, month) %>%
arrange(year, monthnum) %>%
select(-monthnum) %>%
summarise(x = mean(x),
y = mean(y))
Source: local data frame [7 x 4]
Groups: year
year month x y
1 2010 Jan 0.02958633 0.9387509
2 2010 Feb 0.07711820 1.0985411
3 2010 Mar -0.06429982 1.2395438
4 2010 Apr -0.01787658 1.3627864
5 2010 May 0.19131861 1.1802712
6 2010 Jun -0.04894075 0.8224855
7 2010 Jul -0.22410057 1.1749863
Another option is using data.table which has several very convenient datetime functions. Using the data of #SamThomas:
library(data.table)
setDT(dat)[, lapply(.SD, mean), by=.(year(day), month(day))]
this gives:
year month x y
1: 2010 1 0.02958633 0.9387509
2: 2010 2 0.07711820 1.0985411
3: 2010 3 -0.06429982 1.2395438
4: 2010 4 -0.01787658 1.3627864
5: 2010 5 0.19131861 1.1802712
6: 2010 6 -0.04894075 0.8224855
7: 2010 7 -0.22410057 1.1749863
On the data of #JoshuaUlrich:
setDT(mibid)[, lapply(.SD, mean), by=.(year(Date), month(Date))]
gives:
year month BID I.S BOR R.S
1: 2015 5 7.997178 0.07794925 8.999625 0.08062426
2: 2015 6 8.034805 0.07940600 9.019823 0.07823314
3: 2015 7 7.989371 0.07822263 8.996015 0.08195401
4: 2015 8 8.010541 0.08364351 8.982793 0.07748399
If you want the names of the months instead of numbers, you will have to include [, day:=as.IDate(day)] after the setDT() part and use months instead of month:
setDT(mibid)[, Date:=as.IDate(Date)][, lapply(.SD, mean), by=.(year(Date), months(Date))]
Note: Especially on larger datasets, data.table will probably be (a lot) faster then the other two solutions.
This is probably a very simple question that has been asked already but..
I have a data frame that I have constructed from a CSV file generated in excel. The observations are not homogeneously sampled, i.e they are for "On Peak" times of electricity usage. That means they exclude different days each year. I have 20 years of data (1993-2012) and am running both non Robust and Robust LOESS to extract seasonal and linear trends.
After the decomposition has been done, I want to focus only on the observations from June through September.
How can I create a new data frame of just those results?
Sorry about the formatting, too.
Date MaxLoad TMAX
1 1993-01-02 2321 118.6667
2 1993-01-04 2692 148.0000
3 1993-01-05 2539 176.0000
4 1993-01-06 2545 172.3333
5 1993-01-07 2517 177.6667
6 1993-01-08 2438 157.3333
7 1993-01-09 2302 152.0000
8 1993-01-11 2553 144.3333
9 1993-01-12 2666 146.3333
10 1993-01-13 2472 177.6667
As Joran notes, you don't need anything other than base R:
## Reproducible data
df <-
data.frame(Date = seq(as.Date("2009-03-15"), as.Date("2011-03-15"), by="month"),
MaxLoad = floor(runif(25,2000,3000)), TMAX=runif(25,100,200))
## One option
df[months(df$Date) %in% month.name[6:9],]
# Date MaxLoad TMAX
# 4 2009-06-15 2160 188.4607
# 5 2009-07-15 2151 164.3946
# 6 2009-08-15 2694 110.4399
# 7 2009-09-15 2460 150.4076
# 16 2010-06-15 2638 178.8341
# 17 2010-07-15 2246 131.3283
# 18 2010-08-15 2483 112.2635
# 19 2010-09-15 2174 160.9724
## Another option: strftime() will be more _generally_ useful than months()
df[as.numeric(strftime(df$Date, "%m")) %in% 6:9,]