Two Dataframes: A - Companies with their Listing Date, B - Daily Trading Data After One Year.
Problem - Merging Data by incrementing listing date by one year creates NA values as some dates fall on weekends or holidays. Need to find dates near the one-year mark.
Any ideas?
Related
I have a dataset of 2015 with every day of the year. In this dataset, there are actions that happen on any given day. Some days have more actions than others, therefore some days have many more entries than others.
I am trying to create a function that will create an individual dataset per day of the year without having to code 365 of these:
df <- subset(dataset, date== "2015-01-01")
I have looked at dyplyr's group_by(), however I do not want a summary per day, it is important that I get to see the whole observation on any given day for graphing purposes.
I have longitudinal data in a data frame in long format in R, such that a person can be present on several rows, where each row has a specific date - but never the same date. Data is sorted by personal ID firstly and secondly by date, such that early dates for an individual comes first.
Following is what I would like to accomplish:
The first date for each individual should be kept. For the rest of the dates I want to remove all dates occurring within 30 days of a previous date for that person. But, if a row is removed, no other following dates should be compared to that date. The dates should be removed in order, from top to bottom. I.e. if a person has dates 14 May 2020, 20 May 2020, 22 May 2020 and 17 June 2020 I would like to remove the rows in the data frame with the two middle dates, as they are close to the first date: 14 May 2020. I have been able to do this with for loops, but it is not at all time efficient for big data. Does anybody know how I could solve this in a better way?
I hope we're all doing great
I have several decades of daily rainfall data from several monitoring stations. The data all beings at separate dates. I have combined them into a single data frame with the date in the first column, with the rainfall depth in the second column. I want to sort the variable 'Total' by the variable: 'Date and time' (please see the links below)
ms1 <- read.csv('ms1.csv')
ms2 <- read.csv('ms2.csv')
etc.etc
df <- merge(ms1, ms2 etc. etc, by = "Date and Time")
The problem is that the range of dates would differ for each monitoring station (csv file). There may also missing dates in a range. Is there a way around this?
Would I have to create a separate vector with the greatest possible date range? Or would it automatically detect the earliest start date from the imported data.
for monitoring station 1 (ms1)
for monitoring station 2 (ms2)
Note: the data continues to the current date
I am working on stock markets of two different nations, i.e China and the US. I used "quantmod" library in r, to import daily historical prices from yahoo finance. My sample data belongs form 01 JAN 2010 to 31 March 2015, but due to the different culture of these nations they have holidays on different dates and stock markets are closed on those days. Hence, i have different no. of rows of data and I can not apply the garch model on these values. For example, stock market of China has 1267 rows (one column) and the US market has 1303 rows (one column).
now my question is, how can I make a data frame with similar dates and delete/ skip the values with different dates?
my codes and error in r are given below,
library("rugarch")
library("rmgarch")
library("quantmod")
startdate<-as.Date("2010-01-01")
enddate<-as.Date("2015-03-31")
getSymbols("^SSEC", from=startdate, to=enddate)
getSymbols("^GSPC", from=startdate, to=enddate)
rsse<-dailyReturn(SSEC$SSEC.Close) # *calculate returns*
rgspc<-dailyReturn(GSPC$GSPC.Close)# *calculate returns*
returns<-data.frame(rsse, rgspc) # *making data frame with both market returns*
**Error**
Error in data.frame(rsse, rgspc) :
arguments imply differing number of rows: 1267, 1303
You should do an inner join on two dataframes. Each dataframe needs to have a date and the price for that day. I don't know the structure of your dataframes but something like:
dplyr::inner_join(SSEC, GSPC, by='my.date.variable')
or if the two dataframes have different names for their date variables, for example SSEC_date and GSPC_date:
dplyr::inner_join(SSEC, GSPC, by=c('SSEC_date' = 'GSPC_date'))
I have a data set with sales by date, where date is not unique and not all dates are represented: my data set has dates (the date of the sale), quantity, and totalprice. This is an irregular time series.
What I'd like is a vector of sales by date, with every date represented exactly once, and quantities and totalprice summed by date, with zeros where there are no sales.
I have part of this now; I can make a sequence containing all dates:
first_date=as.Date(min(dates))
last_date=as.Date(max(dates))
all_dates=seq(first_date, by=1, to=last_date)
And I can aggregate the sales data by sale date:
quantitybydate=aggregate(quantity, by=list(as.Date(dates)), sum)
But not sure what to do next. If this were python I'd loop through one of the dates arrays, setting or getting the related quantity. But this being R I suspect there's a better way.
Make a dataframe with the all_dates as a column, then merge with quantitybydate using the by variable columns as the by.y, and all.x=TRUE. Then replace the NA's by 0.