I have a data frame in R where I have couple of variables, right now concerned is with two variables, title and Date. I write down the short data similar with real data frame
Title Date
Veterans, Sacrame 1997
Action Newsmaker 2005
New Tri-Cable 1990 mar
EFEST June 16, 1987 28494
The Inhuman Perception: what we do 1999 june
New Tri-Cable 2003 july/august
Interviews Concerning His/her 1991-1992
Festival EFEST June 6, 1997 83443
Intervention of the people Undated
What I want is create a new variable year where we only have the year(no date/month or anything like that).
I can extract year from date format or exact similar text format, but here it's different because the title is complicated and not same(not equal word/letter) for each row. I am just wondering any easy way to create a variable 'year' in r-studio I desire. I can extract the year from the date variable if it's some sort of date format. However in some data where the date are like 83443, but I see the year in title but can't extract the year manually because of huge dataset of this format.
Use mdy to convert to Date class and then year to extract the year.
library(lubridate)
year(mdy(dat1$Title, quiet = TRUE))
## [1] NA NA NA 1987 NA NA NA 1997 NA
Note
The data in reproducible form:
Lines <- "Title Date
Veterans, Sacrame 1997
Action Newsmaker 2005
New Tri-Cable 1990 mar
EFEST June 16, 1987 28494
The Inhuman Perception: what we do 1999 june
New Tri-Cable 2003 july/august
Interviews Concerning His/her 1991-1992
Festival EFEST June 6, 1997 83443
Intervention of the people Undated"
L <- readLines(textConnection(Lines))
dat1 <- read.csv(text = sub(" +", ";", trimws(L)), sep = ";")
Related
My data set is monthly from Jan 1997 to Dec 2021. I need the month code to be in the correct format, however as.date doesn't recognise the cell contents as they are. Please help.
Month BrentSpot GDP Agriculture Production Construction Services
1 Jan-1997 23.54 63.8229 53.5614 81.9963 87.2775 59.4453
2 Feb-1997 20.85 64.7182 53.9091 82.1917 87.8350 60.5018
3 Mar-1997 19.13 64.9264 54.2569 81.6142 88.6714 60.8375
4 Apr-1997 17.56 65.2327 55.1264 82.0006 89.5170 61.0981
5 May-1997 19.02 64.7336 55.8220 82.0093 89.8144 60.4470
6 Jun-1997 17.58 65.1322 56.3438 82.3350 89.4891 60.8886
Gdp_Brent_Table$Month = seq(ymd('1997-01-01'),ymd('2021-12-01'), by = 'months')
(this seemed to do the trick)
I started learning r. We have to tidy up a dataset. The date column has the date as May_08. The column has to be separate by month and year. Ex: from May_08 to May 2008. This is the code that I have so far.
dataset %>%
separate(date, c("month","year"))
You could use strftime. Just paste a day in front of the string beforehand.
x <- "May_08"
strftime(as.Date(paste(1, x), format="%d %b_%y"), "%b %Y")
# [1] "May 2008"
You can also use lubridate.
x <- "May_08"
library(lubridate)
paste(month(parse_date_time(x, "my"), label = T), year(parse_date_time(x, "my")), sep = " ")
# [1] "May 2008"
If you know the year breaking 20th century from 21st century years in your series, a simple ifelse statement should do. In the example, 1990 is the breaking year, so:
yrs <- c(91,94,97,00,03,06,09,12,15,18,21)
yrs <- ifelse(yrs>90, yrs+1900, yrs+2000)
> print(yrs)
[1] 1991 1994 1997 2000 2003 2006 2009 2012 2015 2018 2021
I would like help with replicating a vlookup from Excel in R. I have two data tables of the following kind but with several more rows and attributes. I have redimensioned them for the sake of simplicity -
FX <- data.table(Currency = c("USD","EUR","AUD"), Y2014 = c(2.13,3.45,1.8), Y2015 = c(2.16,3.48,1.7), Y2016 = c(2.19,3.49,1.6))
DATA <- data.table(Customer = c("Abc","Def","Ghi","Jkl","Mno"), Year = c(2013,2014,2015,2012,2018), CurrencyCode = c("AUD","USD","USD","EUR","USD"))
FX has a list of currencies as the rows and different years as columns denoting their exchange rate against a fixed currency (SEK) and DATA has some customer deals which were originally reported in that fixed currency (SEK).
I would like to add another attribute to DATA called, as an example, ConversionRate by first matching the Currency attribute in FX to CurrencyCode in DATA and then selecting the corresponding conversion rate for the year given in Year from DATA by matching it to the column Yxxxx in FX.
It would result in something like this -
data <- data.table(Customer = c("Abc","Def","Ghi","Jkl","Mno"), Year = c(2013,2014,2015,2012,2018), ConversionRate = c(1.7,2.13,2.16,3.45,2.19))
Please note that for Year < 2014, I would like it to pick up the rate of the corresponding currency in 2014 and for Year > 2016, I would like it to pick up the rate of the corresponding currency in 2016 as it has done for Row 1,4,5.
I have tried using loops, merge, and even a custom vlookup function but it seems I am going wrong when it comes to comparing the Year to the column names Yxxxx.
Any idea on how this can be achieved?
Thank you!
After melting FX to long and converting the "Y2016" etc. values to numbers, you can do an update join to DATA with this fx_long. If you want to join on a year other than the year in the data, you can first create a new column join_year and join on that instead.
library(data.table)
fx_long <- melt(FX, 'Currency')[, Year := as.numeric(sub('Y', '', variable))]
DATA[, join_year := pmin(pmax(Year, 2014), 2016)]
DATA[fx_long, on = .(join_year = Year, CurrencyCode = Currency), ConversionRate := i.value]
DATA
# Customer Year CurrencyCode join_year ConversionRate
# 1: Abc 2013 AUD 2014 1.80
# 2: Def 2014 USD 2014 2.13
# 3: Ghi 2015 USD 2015 2.16
# 4: Jkl 2012 EUR 2014 3.45
# 5: Mno 2018 USD 2016 2.19
I'm confused about the way the paste() function is behaving. I have a dplyr table with the following columns:
Year Month DayofMonth
2001 May 21
2001 May 22
2001 June 9
2001 March 4
Which I'd like to combine into a single column called "Date". I figured I'd used the command:
df2 = mutate(df, Date = paste(c(Year, Month, DayofMonth), sep = "-",))
Unfortunately, this seems to concatenate every element in Year, then every element in Month, then every element in DayofMonth so the result looks something like this:
2001-2001-2001-2001 ... May-May-June-March ... 21-22-9-4
How should I modify my command so that the paste function iterates over each row individually?
P.S. This is part of a Data Camp course and as such I am running commands through whatever version of R they've got on their server.
Currently you are concatenating all the columns together. Take c() out of your paste() call to paste them together element-by-element.
mutate(df, Date = paste(Year, Month, DayofMonth, sep = "-"))
# Year Month DayofMonth Date
# 1 2001 May 21 2001-May-21
# 2 2001 May 22 2001-May-22
# 3 2001 June 9 2001-June-9
# 4 2001 March 4 2001-March-4
I am trying to do something which seems simple but is proving a bit of a challenge so I hope someone can help!
I have a time series of observations of temperature:
Lines <-"1971-01-17 298.9197
1971-01-17 298.9197
1971-02-16 299.0429
1971-03-17 299.0753
1971-04-17 299.3250
1971-05-17 299.5606
1971-06-17 299.2380
2010-07-14 298.7876
2010-08-14 298.5529
2010-09-14 298.3642
2010-10-14 297.8739
2010-11-14 297.7455
2010-12-14 297.4790"
DF <- read.table(textConnection(Lines), col.names = c("Date", "Value"))
DF$Date <- as.Date(DF$Date)
mean.ts <- aggregate(DF["Value"], format(DF["Date"], "%m"), mean)
This produces:
> mean.ts
Date Value
1 01 1.251667
2 02 1.263333
This is just an example -- my data is for many years so I can calculate a full monthly average of the data.
What I then want to do is calculate the difference in for all of the January's (individually) with the mean January I have calculated above.
If I move away from using Date/Time class I could do this with some loops but I want to see if there is a "neat" way to do this in R? Any ideas?
You can just add the year as an aggregating variable. This is easier using the formula interface:
> aggregate(Value~format(Date,"%m")+format(Date,"%Y"),data=DF,mean)
format(Date, "%m") format(Date, "%Y") Value
1 01 1971 298.9197
2 02 1971 299.0429
3 03 1971 299.0753
4 04 1971 299.3250
5 05 1971 299.5606
6 06 1971 299.2380
7 07 2010 298.7876
8 08 2010 298.5529
9 09 2010 298.3642
10 10 2010 297.8739
11 11 2010 297.7455
12 12 2010 297.4790
At least as I understand your question you want the differences of each month with the mean of those months, so you probably you want to use ave rather than aggregate:
diff.mean.ts <- ave(DF[["Value"]],
list(format(DF[["Date"]], "%m")), FUN=function(x) x-mean(x) )
If you wanted it in the same dataframe, then just assign it as a column:
DF$ diff.mean.ts <- diff.mean.ts
The ave function is designed for adding columns to existing dataframes because it returns a vector of the same length as the number of values in the its first argument, in this case DF[["Value"]]. In the present instance it returns all 0's which is the correct answer because there is only one value for each month.