Combing two columns to form a Date in R using zoo - r

I am trying to use zoo to formulate a Date using two columns in a data.table.
data$Date <- as.yearmon(paste(data$Month,data$Year), "%Y %m")
But all I get in NA's
Here is what the data looks like:
Year Month State County Rate
2015 October California Santa Clara County 4.0
2015 March California Santa Clara County 4.4
2015 August California Santa Clara County 4.1
2015 May California Santa Clara County 4.1
2015 January California Santa Clara County 4.7

You have two issues. One, you're pasting Month, Year but telling it you're sending Year, Month. In addition, %m is for month as a decimal 1-12. You want %B for full name of month. You need to switch the order of the paste and change the format.
data$Date <- as.yearmon(paste(data$Year,data$Month), "%Y %B")
Year Month State County Rate Date
1: 2015 October California Santa Clara County 4.0 Oct 2015
2: 2015 March California Santa Clara County 4.4 Mar 2015
3: 2015 August California Santa Clara County 4.1 Aug 2015
4: 2015 May California Santa Clara County 4.1 May 2015
5: 2015 January California Santa Clara County 4.7 Jan 2015

Related

Create a dataframe from some rows of a already existing dataframe in R

I need to create a new dataframe from the data of 4 different countries of this already existing dataframe:
Country Happiness.Score GDP year
Switzerland 7.587 1.39651 2015
Iceland 7.561 1.30232 2015
Denmark 7.527 1.32548 2015
Norway 7.522 1.459 2015
Canada 7.427 1.32629 2015
Finland 7.406 1.29025 2015
Netherlands 7.378 1.32944 2015
Sweden 7.364 1.33171 2015
New Zealand 7.286 1.25018 2015
Australia 7.284 1.33358 2015
Israel 7.278 1.22857 2015
Costa Rica 7.226 0.95578 2015
Austria 7.2 1.33723 2015
Mexico 7.187 1.02054 2015
United States 7.119 1.39451 2015
Brazil 6.983 0.98124 2015
Ireland 6.94 1.33596 2015
Belgium 6.937 1.30782 2015
OBS.: This dataframe above is just an example, the original dataframe has way more countries and which one has data from the following years: 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022
Something like
selected_countries <- c("Iceland", "Norway", "Spain", "Nigeria")
new_dd <- dd[dd$Country %in% selected_countries, ]
or
new_dd <- subset(dd, Country %in% selected_countries)
or
library(dplyr)
new_dd <- dd %>% filter(Country %in% selected_countries)

Replace For Loop to fill column depending on other column value

I have a two-column dataframe (HOME & AWAY) called 'gamelist' with sports games. The HOME column also includes some dates with the corresponding games listed below.
HOME AWAY
15 Oct 2019 Pre-season
Phoenix Suns Denver Nuggets
Utah Jazz Sacramento Kings
Dallas Mavericks Oklahoma City Thunder
Memphis Grizzlies Charlotte Hornets
14 Oct 2019 Pre-season
Miami Heat Atlanta Hawks
13 Oct 2019 Pre-season
Orlando Magic Philadelphia 76ers
Toronto Raptors Chicago Bulls
Washington Wizards Milwaukee Bucks
I want to create a new column with the dates for each game. Coming from a excel vba approach, I've used a for loop which is giving the result intented but I was wondering if there was a more efficient approach in R, and I'm sure there is.
This is the code I've used:
gamelist<-add_column(gamelist,SDATE="",.before = 1)
for(i in 1:nrow(gamelist)){
if(str_count(gamelist[[i,3]],"\\d")==6){
gamelist[i,2]<-gamelist[i,3]
}else{
gamelist[i,2]<-gamelist[i-1,2]
}
}
Which gives me this as intended
SDATE HOME AWAY
15 Oct 2019 15 Oct 2019 Pre-season
15 Oct 2019 Phoenix Suns Denver Nuggets
15 Oct 2019 Utah Jazz Sacramento Kings
15 Oct 2019 Dallas Mavericks Oklahoma City Thunder
15 Oct 2019 Memphis Grizzlies Charlotte Hornets
14 Oct 2019 14 Oct 2019 Pre-season
14 Oct 2019 Miami Heat Atlanta Hawks
13 Oct 2019 13 Oct 2019 Pre-season
13 Oct 2019 Orlando Magic Philadelphia 76ers
13 Oct 2019 Toronto Raptors Chicago Bulls
13 Oct 2019 Washington Wizards Milwaukee Bucks
My apologies for the dataframe formatting, couldn't figure out how to reproduce one properly here.
Thanks for your help
We could use str_extract to get only the 'dates' so that if there is no match it returns NA, then we use fill to fill the NA elements with the previous non-NA values
library(dplyr)
library(tidyr)
library(stringr)
gamelist %>%
mutate(SDATE = str_extract(HOME, "^\\d+ [A-Za-z]+ \\d{4}")) %>%
fill(SDATE)
# HOME AWAY SDATE
#1 15 Oct 2019 Pre-season 15 Oct 2019
#2 Phoenix Suns Denver Nuggets 15 Oct 2019
#3 Utah Jazz Sacramento Kings 15 Oct 2019
#4 Dallas Mavericks Oklahoma City Thunder 15 Oct 2019
#5 Memphis Grizzlies Charlotte Hornets 15 Oct 2019
#6 14 Oct 2019 Pre-season 14 Oct 2019
#7 Miami Heat Atlanta Hawks 14 Oct 2019
#8 13 Oct 2019 Pre-season 13 Oct 2019
#9 Orlando Magic Philadelphia 76ers 13 Oct 2019
#10 Toronto Raptors Chicago Bulls 13 Oct 2019
#11 Washington Wizards Milwaukee Bucks 13 Oct 2019
If we need the SDATE column first, we can use select
gamelist %>%
mutate(SDATE = str_extract(HOME, "^\\d+ [A-Za-z]+ \\d{4}")) %>%
fill(SDATE) %>%
select(SDATE, everything())
Or use add_column from tibble with either .after or .before
library(tibble)
gamelist %>%
add_column(SDATE = str_extract(.$HOME, "^\\d+ [A-Za-z]+ \\d{4}"),
.before = 1 ) %>%
fill(SDATE)
data
gamelist <- structure(list(HOME = c("15 Oct 2019", "Phoenix Suns", "Utah Jazz",
"Dallas Mavericks", "Memphis Grizzlies", "14 Oct 2019", "Miami Heat",
"13 Oct 2019", "Orlando Magic", "Toronto Raptors", "Washington Wizards"
), AWAY = c("Pre-season", "Denver Nuggets", "Sacramento Kings",
"Oklahoma City Thunder", "Charlotte Hornets", "Pre-season", "Atlanta Hawks",
"Pre-season", "Philadelphia 76ers", "Chicago Bulls", "Milwaukee Bucks"
)), class = "data.frame", row.names = c(NA, -11L))
If the date is always in the HOME column when the AWAY column is "Pre-season" (or some other predictable condition), then you could do something like:
# data
gamelist <- data.frame(
stringsAsFactors = FALSE,
HOME = c("15-Oct-19","Phoenix Suns",
"Utah Jazz","Dallas Mavericks","Memphis Grizzlies",
"14-Oct-19","Miami Heat","13-Oct-19","Orlando Magic",
"Toronto Raptors","Washington Wizards"),
AWAY = c("Pre-season","Denver Nuggets",
"Sacramento Kings","Oklahoma City Thunder",
"Charlotte Hornets","Pre-season","Atlanta Hawks","Pre-season",
"Philadelphia 76ers","Chicago Bulls","Milwaukee Bucks")
)
# create blank column to fill in
gamelist$date <- NA
# fill cases where there's a date
gamelist$date[gamelist$AWAY=="Pre-season"] <- gamelist$HOME[gamelist$AWAY=="Pre-season"]
# user zoo::na.locf() to fill in missing values
gamelist$date <- zoo::na.locf(gamelist$date)

decompose() for yearly time series in R

I'm trying to perform analysis on a time series data of inflation rates from the year 1960 to 2015. The dataset is a yearly time series over 56 years with 1 real value per each year, which is the following:
Year Inflation percentage
1960 1.783264746
1961 1.752021563
1962 3.57615894
1963 2.941176471
1964 13.35403727
1965 9.479452055
1966 10.81081081
1967 13.0532972
1968 2.996404315
1969 0.574712644
1970 5.095238095
1971 3.081105573
1972 6.461538462
1973 16.92815855
1974 28.60169492
1975 5.738605162
1976 -7.63438068
1977 8.321619342
1978 2.517518817
1979 6.253164557
1980 11.3652609
1981 13.11510484
1982 7.887270664
1983 11.86886396
1984 8.32157969
1985 5.555555556
1986 8.730811404
1987 8.798689021
1988 9.384775808
1989 3.26256011
1990 8.971233545
1991 13.87024609
1992 11.78781925
1993 6.362038664
1994 10.21150033
1995 10.22488756
1996 8.977149075
1997 7.16425362
1998 13.2308409
1999 4.669821024
2000 4.009433962
2001 3.684807256
2002 4.392199745
2003 3.805865922
2004 3.76723848
2005 4.246353323
2006 6.145522388
2007 6.369996746
2008 8.351816444
2009 10.87739112
2010 11.99229692
2011 8.857845297
2012 9.312445605
2013 10.90764331
2014 6.353194544
2015 5.872426595
'stock1' contains my data where the first column stands for Year, and the second for 'Inflation.percentage', as follows:
stock1<-read.csv("India-Inflation time series.csv", header=TRUE, stringsAsFactors=FALSE, as.is=TRUE)
The following is my code for creating the time series object:
stock <- ts(stock1$Inflation.percentage,start=(1960), end=(2015),frequency=1)
Following this, I am trying to decompose the time series object 'stock' using the following line of code:
decom_add <- (decompose(stock, type ="additive"))
Here I get an error:
Error in decompose(stock, type = "additive") : time series has no
or less than 2 periods
Why is this so? I initially thought it has something to do with frequency, but since the data is annual, the frequency has to be 1 right? If it is 1, then aren't there definitely more than 2 periods in the data?
Why isn't decompose() working? What am I doing wrong?
Thanks a lot in advance!
Please try for frequency=2, because frequency needs to be greater than 1. Because this action will change your model, for me the better way is to load data which contain and month column, so the frequency will be 12.

State FIPS, county FIPS AND FIPS to latitude longitude?

I have a dataset looking like this, with 600 columns:
COUNTY_NAME STATE_NAME STATE_FIPS CNTY_FIPS FIPS Year
Boone Illinois 17 007 17007 2010
Bureau Illinois 17 011 17011 2008
Champaign Illinois 17 019 17019 2010
Cook Illinois 17 031 17031 2006
I need to get the centroids of smallest possible unit/area (counties?) for further analysis.
Is it possible to get this information in latitude longitude in R?

Mean of time - hh:mm:ss - group by a variable

Need to calculate the mean of Time by Country. Time is a Date variable - hh:mm:ss.
This command with(df,tapply(as.numeric(times(df$Time)),Country,mean))
is not returning the correct mean in hh:mm:ss.
Country Time
1 Germany 2:26:21
2 Germany 2:19:19
3 Brazil 2:06:34
4 USA 2:06:17
5 Eth 2:18:58
6 Japan 2:08:35
7 Morocco 2:05:27
8 Germany 2:13:57
9 Romania 2:21:30
10 Spain 2:07:23
Output:
>with(df,tapply(as.numeric(times(df$Time)),Country,mean))
Andorra Australia Brazil Canada China
0.09334491 0.09634259 0.09578125 0.09634645 0.09481192
Eritrea Ethiopia France Germany Great Britain
0.09709491 0.09010031 0.10025463 0.09713349 0.09524306
Ireland Italy Japan Kenya Morocco
0.09593750 0.09520255 0.09579630 0.08934854 0.09400463
New Zeland Peru Poland Romania Russia
0.09664931 0.09809606 0.09638889 0.09875000 0.09327932
Spain Switzerland Uganda United States Zimbabwe
0.09314236 0.09620949 0.10068287 0.09399016 0.09892940
I see you've discovered the agony of working with date and time values in R...
Is this what you had in mind?
df$nTime <- difftime(strptime(df$Time,"%H:%M:%S"),
strptime("00:00:00","%H:%M:%S"),
units="secs")
df.means <- aggregate(df$nTime,by=list(df$Country),mean)
df.means$Time <- format(.POSIXct(df.means$x,tz="GMT"), "%H:%M:%S")
df.means
Group.1 x Time
# 1 Brazil 7594.000 02:06:34
# 2 Eth 8338.000 02:18:58
# 3 Germany 8392.333 02:19:52
# 4 Japan 7715.000 02:08:35
# 5 Morocco 7527.000 02:05:27
# 6 Romania 8490.000 02:21:30
# 7 Spain 7643.000 02:07:23
# 8 USA 7577.000 02:06:17
The first line adds a column nTime which is the time, in seconds, since midnight.
The second line calculates the means.
The third line converts back to H:M:S.
The problem you were having is the strptime(...), when forced to convert to numeric, returns the number of second between 1970-01-01 and the indicated time today. So, a really big number. This code just subtracts out the number of second from 1970-01-01 and 00:00:00 today.
Are you trying to do this -
dades$Time <- strptime(dades$Time,'%H:%M:%S')
by(dades$Time, dades$Country, mean)
If I didn't understand your question, can you please post sample output.

Resources