my reference file has 500 rows and 11 columns (years: 2007:2017) with date in those column as value.
i have to creat a dummy dataframe of 5000 rows and 11 column (years: 2007:2017). i want to put the radom date within one month from a reference file. i think following function would creat a random date within a month.
reviewdate$x2007_1 <- as.Date(reviewdate$X2007, format = "%d/%m/%Y") + sample((-15:15), 1)
i need to creat a for loop so i can run selected column in my datafram so i can creat random date for 2007 to 2017 period.
my second question is about my reference file has only 500 records for each year and want to creat 5000 records for each year? how can i generate radom date for 5000 rows for each year using reference file which 500 rows for each year?
Related
I have (df) has (ID), (Adm_Date), (ICD_10), (points). and it has 1,000,000 rows.
(Points) represent value for (ICD_10)
(ID): each one has many rows
(Adm_Date) from 2010-01-01 to 2018-01-01.
I want the sum (points) without duplicate for filter rows starting from (Adm_date) to 2 years previous back from (Adm_Date) by (ID).
The periods like these:
01-01-2010 to 31-01-2012,
01-02-2010 to 29-02-2012,
01-03-2010 to 31-03-2012,...... so on to the last date 01-12-2016 to 31-12-2018.
my problem is with the filter of the dates. It does not filter the rows based on period date. It does sum (points) for each (ID) without duplicates for all data from the 2010 to 2018 period instead of summing them per period date for each (ID).
I used these codes
start.date= seq(as.Date (df$Adm_Date))
end.date = seq(as.Date (df$Adm_Date+ years(-2)))
Sum_df<- df %>% dplyr::filter(Adm_Date >=start.date & Adm_Date<=end.date) %>%
group_by(ID) %>%
mutate(sum_points = sum(points*!duplicated(ICD_10)))
but the filiter did not work, because it does sum (points) for each (ID) from all dates from the 2010 to 2018 instead of summing them per period date for each (ID).
sum_points will start from 01-01-2012, any Adm_Date >= 01-01-2012 I need to get their sum.
If I looked at the patient has ID=11. I will sum points from row 3 to row 23, Also I need to ignore repeat ICD_10 (e.g. G81, and I69 have repeated in this period). so results show like this
ID(11), Adm_Date(07-05-2012), sum_points(17), while the sum points for the same patient at Adm_Date(13-06-2013) I will sum from row 11 to row 27 because look back for 2 years from Adm_Date. So,
ID(11), Adm_Date(13-06-2013), sum_points(14.9)
I have about a half million of ID and more than a million rows.
I hope I explained it well. Thank you
enter image description here
My goal is to administer both variables and their values in a spreadsheet.
Basically I want to be able to add the new values for a new year in a new column and load them into R.
I then want to assign the variables named in the first column with the corresponding value in either one of the second or third column.
Input spreadsheet:
Variable
Year2013
Year2018
age
12
17
pets
c(cat,dog,elephant)
c(dog,mouse)
cars
cars$name
cars$name
Desired Output:
For year 2013
import("dataspreadsheet.csv")
derived from this -->
age <- 12
pets <- c(cat,dog,elephant)
cars <- cars$name
Is there any way to tell R to make this assignment?
I want to create a new variable called REF_YEARCPI that aggregates the CPIs for all 12 months within the year. In the table, there is a variable called REF_MONTHCPI but I need to transform this variable into an annual variable (called REF_YEARCPI) that aggregates 12 of the CPI values within the year. In the image, I have 2 columns: REF_MONTHCPI stores the monthly reference periods and CPI_RESTAURANT which stores the CPI for the month.
I don't know the name of the dataframe you have so I will assume it as df.
df$REF_YEARCPI <- df$REF_MONTHCPI * 12
You can replace df in the above code with the name of your dataframe.
I'm trying to figure out how can I add something to a data frame df, based on a variable (i.e. a date), ending up with a data frame named df_17 if variable is equal to 2017 for example.
The reason why I want this is because I'm importing datasets from several years and quarters, and I would like to make sure that they are named according to the year variable they have. Each dataset only has 1 date. I know I can do it manually but it would take me less time to automate it.
I know how to do it with columns and rows, but I can't figure it out for objects.
EDIT:
Example 1:
Data frame name "df"
A B Date
1 4 2017
2 3 2017
New data frame name "df_2017"
Example 2:
Data frame name "df"
A B Date
1 4 2016
2 3 2016
New data frame name - "df_2016 "
The assign function should do what you want. A solution could look like
assign(paste0("df_", year), dataframe_read_from_file, pos = 1)
If you use assign inside a function oder a loop, make sure that you set the pos option correctly.
I'm trying to merge together two data-frames by matching their Tickers. However, in my first data-frame, some companies are listed with two tickers (as below)
Company Ticker Returns
1800-Flowers FLWS, FWC .01
First Busey Corp BUSE .02
First Bancshare Inc FBSI 0
In the second data frame, there is only one ticker.
Ticker Other Info
FLWS 50
BUSE 60
FBSI 20
How can I merge these two files together in R so that it will recognize that the data from FLWS belongs to the first row in data frame 1 because it contains the Ticker?
Please note that in data frame 1, most companies have only 1 ticker listed, many have 2, and some companies have 3 or 4 tickers.