I have the following dataframe:
Count Year
32 2018
346 2017
524 2016
533 2015
223 2014
1 2010
3 2008
1 1992
Is it possible to exclude the years 1992 and 2008. I tried different ways, but don't find a flexible solution.
I would like to have the same dataframe without the years 1993 and 2008.
Many thanks in advance,
jeemer
library(dplyr); filter(df, year != 1992 | year != 2008)
Related
The question says: Find the number of storms per year since 2010.
So far, I have this as my code in R.
The data set is "storms" which is a dataset that is loaded into R, and is a subset of the NOAA Atlantic hurricane database.
storms %>%
select(status, year) %>%
filter(year == 2010) %>%
tally()
What I don't know is if the "since" keyword means before 2010 or should I just count the number of storms found in 2010?
Storms since 2010 per year means including 2010 and afterwards the number of storms each year. Maybe this is what the question is asking:
storms2 = storms %>% filter(year>= 2010)
storms2 %>% count(year)
# A tibble: 11 × 2
year n
<dbl> <int>
1 2010 402
2 2011 323
3 2012 454
4 2013 202
5 2014 139
6 2015 220
7 2016 396
8 2017 306
9 2018 266
10 2019 330
11 2020 570
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 4 years ago.
I have a df that resembles this:
Year Country Sales($M)
2013 Australia 120
2013 Australia 450
2013 Armenia 80
2013 Armenia 175
2013 Armenia 0
2014 Australia 500
2014 Australia 170
2014 Armenia 0
2014 Armenia 100
I'd like to combine the rows that match Year and Country, adding the Sales column. The result should be:
Year Country Sales($M)
2013 Australia 570
2013 Armenia 255
2014 Australia 670
2014 Armenia 100
I'm sure I could write a long loop to check whether Year and Country are the same and then add the Sales from those rows, but this is R so there must be a simple function that I'm totally missing.
Many thanks in advance.
library(tidyverse)
df %>%
group_by(Year,Country) %>%
summarise(Sales = sum(Sales))
I tried to search on the website but I didn't find the answer to my question; if there is already one please write the link.
I have two data frames from a national survey: each year I have some families that have already been interviewed and others that are new. I want to merge the data frames in order to have only the families present in both data frames and match them in order to have the 2014 values in a row and the 2012 values in the next one for each individual (for the sake of semplicity I omitted other social variables present in the survey).
For example: df1 and df2
> df1 <- data.frame(nquest=c(173, 526, 1066, 1066), nord=c(1,1,1,2), year=c(2014, 2014, 2014, 2014))
> structure(df1)
nquest nord year
1 173 1 2014
2 526 1 2014
3 1066 1 2014
4 1066 2 2014
> df2 <- data.frame(nquest=c(173, 526, 3456, 3456), nord=c(1,1,1,2), year=c(2012, 2012, 2012, 2012))
> structure(df2)
nquest nord year
1 173 1 2012
2 526 1 2012
3 3456 1 2012
4 3456 2 2012
where nquest is the number of the family and nord the component of the family (ex. 1 father, 2 mother).
I want to merge them in this way:
> df <- data.frame(nquest=c(173, 173, 526,526), nord=c(1,1,1,1), year=c(2014, 2012, 2014, 2012))
> structure(df)
nquest nord year
1 173 1 2014
2 173 1 2012
3 526 1 2014
4 526 1 2012
I tried the to merge them:
tot <- merge (df1, df2, by=c("nquest", "nord")
structure(tot)
nquest nord year.x year.y
1 173 1 2014 2012
2 526 1 2014 2012
and I tried the rbind function:
> tot <- rbind(s, df2)
> structure(tot)
nquest nord year
1 173 1 201
2 526 1 2014
3 1066 1 2014
4 1066 2 2014
5 173 1 2012
6 526 1 2012
7 3456 1 2012
8 3456 2 2012
Thank you
This is an approach using "dplyr", there is probably a better way to do the filtering though
bind_rows(df1, df2) %>%
filter( nquest %in% df1$nquest & nquest %in% df2$nquest) %>%
arrange(nquest, desc(year))
The second condition on the "arrange" function, that specifies year, is not necessary in this case but I am putting it there for completness
I'm trying to perform analysis on a time series data of inflation rates from the year 1960 to 2015. The dataset is a yearly time series over 56 years with 1 real value per each year, which is the following:
Year Inflation percentage
1960 1.783264746
1961 1.752021563
1962 3.57615894
1963 2.941176471
1964 13.35403727
1965 9.479452055
1966 10.81081081
1967 13.0532972
1968 2.996404315
1969 0.574712644
1970 5.095238095
1971 3.081105573
1972 6.461538462
1973 16.92815855
1974 28.60169492
1975 5.738605162
1976 -7.63438068
1977 8.321619342
1978 2.517518817
1979 6.253164557
1980 11.3652609
1981 13.11510484
1982 7.887270664
1983 11.86886396
1984 8.32157969
1985 5.555555556
1986 8.730811404
1987 8.798689021
1988 9.384775808
1989 3.26256011
1990 8.971233545
1991 13.87024609
1992 11.78781925
1993 6.362038664
1994 10.21150033
1995 10.22488756
1996 8.977149075
1997 7.16425362
1998 13.2308409
1999 4.669821024
2000 4.009433962
2001 3.684807256
2002 4.392199745
2003 3.805865922
2004 3.76723848
2005 4.246353323
2006 6.145522388
2007 6.369996746
2008 8.351816444
2009 10.87739112
2010 11.99229692
2011 8.857845297
2012 9.312445605
2013 10.90764331
2014 6.353194544
2015 5.872426595
'stock1' contains my data where the first column stands for Year, and the second for 'Inflation.percentage', as follows:
stock1<-read.csv("India-Inflation time series.csv", header=TRUE, stringsAsFactors=FALSE, as.is=TRUE)
The following is my code for creating the time series object:
stock <- ts(stock1$Inflation.percentage,start=(1960), end=(2015),frequency=1)
Following this, I am trying to decompose the time series object 'stock' using the following line of code:
decom_add <- (decompose(stock, type ="additive"))
Here I get an error:
Error in decompose(stock, type = "additive") : time series has no
or less than 2 periods
Why is this so? I initially thought it has something to do with frequency, but since the data is annual, the frequency has to be 1 right? If it is 1, then aren't there definitely more than 2 periods in the data?
Why isn't decompose() working? What am I doing wrong?
Thanks a lot in advance!
Please try for frequency=2, because frequency needs to be greater than 1. Because this action will change your model, for me the better way is to load data which contain and month column, so the frequency will be 12.
I started learning R recently, and I am completely new. Sorry if my question will seem lame to some of you but I have spent more than an hour trying to research how to do this using indexing or subset but couldn't find anything.
So here it goes :
I have a file which has
temperature lower rain month yr
10.8 6.5 12.2 1 1987
10.5 4.5 1.3 1 1987
7.5 -1 0.1 1 1987
This file contains 6,940 lines of data.
I read the file in R. and I wanted to find the average rainfall per year for which i used :
A <- tapply(temperature,yr,mean)
this function returned:
1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
13.27014 13.79126 15.54986 15.62986 14.11945 14.61612 14.30984 15.12877 15.81260 13.98082 15.63918 15.02568 15.63736 14.94071 14.90849 15.47589 16.03260 15.25109 15.06000
Now the question is I need the year where the average rain is the min.
when I apply :
min(A)
It returns 13.27014 which corresponds for the year 1987 but how do I query for the year which corresponds to the min Value.
And when I try :
A[,min(A)]
It returns an error
Sorry again for the lame question but this is driving me crazy