Dropping dates except specific year from panel data [duplicate] - r

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Closed 4 years ago.
I have panel data from 2000 to 2017. I want to select rows which are 2005.
Amazingly
mydata <- subset(mydata, select= c(mydata$Year>="2005"))
did not work. Any suggestion?

It is assumed the data is in date or numeric format.
library(dplyr)
df%>%
filter(year>=2005)
Only 2005:
library(dplyr)
flights %>%
filter(year==2005)

Related

Summarise a set of dates contained in a column [duplicate]

This question already has answers here:
Count number of unique levels of a variable
(7 answers)
Closed 2 years ago.
I have a dataset with transactions made from 2018/07/01 to 2019/06/30 and I want to find how many unique dates are in the "DATE" column (it has over 260k rows, so a date can be repeated several times).
I have tried the following but it just lists all the dates contained in the "DATE" column:
numberofdates <- dplyr::summarize(transactionData, DATE)
Thanks for your help!
Try one of these approaches:
In dplyr:
library(dplyr)
#Code 1
numberofdates <- transactionData %>% summarize(n_distinct(DATE))
Or in base R:
#Code 2
length(unique(transactionData$DATE))
With data.table we can use
library(data.table)
numberofdates <- uniqueN(transactionData$DATE)

How to sort the dataframe by items? [duplicate]

This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 3 years ago.
I have a data frame like;
dataframe <- data.frame(country=c("Japan","Korea","China","Japan","Korea","China","Japan","Korea","China"),
count=c(4,5,6,1,2,3,0,2,3))
Now I want to sort by the country like;
dataframe <- data.frame(country=c("Japan","Japan","Japan","Korea","Korea","Korea","China","China","China"),
count=c(4,1,0,5,2,2,6,3,3))
I tried grouped_by function, but it doesn't work.
Please tell me how to do.
You can use the arrange function of the dplyr package:
library(tidyverse)
dataframe <- dataframe %>%
arrange(match(country, c("Japan", "Korea", "China")))

Subsetting dataframe based on two values in one column [duplicate]

This question already has answers here:
Filter data.frame rows by a logical condition
(9 answers)
Filtering a data frame on a vector [duplicate]
(2 answers)
Closed 3 years ago.
I'm trying to subset two years of microclimate data from a larger dataset in R. I can subset one year, but am struggling to subset two years in the same operation.
This operation works fine:
ChamberTemp <- subset(ChamberTemp,
subset=year=="2011",
select=c(year,month,chamber,cat1.avg,cat2.avg,cat3.avg))
How do I subset by two years? i.e. 2011 and 2012
Thank you!
We need %in% instead of == for a vector of length greater than 1
subset(ChamberTemp,
subset=year %in% c("2011", "2012"),
select=c(year,month,chamber,cat1.avg,cat2.avg,cat3.avg))
with dplyr, this can be done using
library(dplyr)
ChamberTemp %>%
filter(year %in% c("2011", "2012")) %>%
select(year, month, chamber, matches("^cat[1-3]\\.avg$"))

Filtering Column by Multiple values [duplicate]

This question already has answers here:
Filter multiple values on a string column in dplyr
(6 answers)
Closed 2 years ago.
I would like to filter values based on one column with multiple values.
For example, one data.frame has s&p 500 tickers, i have to pick 20 of them and associated closing prices. How to do it?
If I understand well you question, I believe you should do it with dplyr:
library(dplyr)
target <- c("Ticker1", "Ticker2", "Ticker3")
filter(df, Ticker %in% target)
The answer can be found in https://stackoverflow.com/a/25647535/9513536
Cheers !

How to get sum up rows of a column by grouping another column? [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 5 years ago.
My DF has two columns One is state names and another one is No of event happened.
This Data had more columns and now I want to sum up all the no of events happened by state. how can I do that?
How about using dplyr,
library(dplyr)
DF %>%
group_by(state) %>%
summarize(total_event = sum(event))
Here I have supposed your data is in DF data frame and it has columns names state and event.

Resources