Currently I have the timestamp column in time format with values such as :
1041592
1040583
1048448
and when I am applying the datetime18. format on this column I am getting the following values
27OCT78:16:21:21
15SEP78:16:21:21
09AUG79:08:58:39
Could you please help me where I am going wrong , I want it to be displayed in date and time format , CURRENTLY iam getting the above values.
You need show your work. I don't get what you say you get.
122 data _null_;
123 input dt ##;
124 put 'NOTE: ' (3*dt) (=best. =datetime. =time12.);
125 cards;
NOTE: dt=1041592 dt=13JAN60:01:19:52 dt=289:19:52
NOTE: dt=1040583 dt=13JAN60:01:03:03 dt=289:03:03
NOTE: dt=1048448 dt=13JAN60:03:14:08 dt=291:14:08
I have two tables: One with participants and one with an encoding of scores based on birth dates.
The score table looks like this:
score_table
Key | Value
--------------------
01/01/1900 | 15
01/01/1940 | 25
01/01/1950 | 30
All participants with birth dates between 01/01/1900 and 01/01/1940 should get a score of 15.
Participants born between 01/01/1940 and 01/01/1950 should get a score of 25, etc.
My participants' table looks like this:
participant_table
BirthDate | Gender
-----------------------
05/05/1930 | M
02/07/1954 | V
01/11/1941 | U
I would like to add a score to get the output table:
BirthDate | Gender | Score
------------------------------------
05/05/1930 | M | 15
02/07/1954 | V | 30
01/11/1941 | U | 25
I built several solutions for similar problems when the exact values are in the score table (using dplyr::left_join or base::match) or for numbers which can be rounded to another value. Here, the intervals are irregular and the dates arbitrary.
I know I can build a solution by iterating through the score table, using this method:
as.Date("05/05/1930", format="%d/%m/%Y) < as.Date("01/01/1900", format="%d/%m/%Y)
Which returns a Boolean and thus allows me to walk through the scores until I find a date which is bigger and then use the last score. However, there must be a better way to do this.
Maybe I can create some sort of bins from the dataframe, as such:
Bin 1 | Bin 2 | Bin 3
Date 1 : Date 2 | Date 2 : Date 3 | Date 3 : inf
But I don't yet see how. Does anyone see an efficient way to create such bins from a dataframe, so that I can efficiently retrieve scores from this table?
MRE:
Score table:
structure(list(key=c("1/1/1900", "2/1/2013", "2/1/2014","2/1/2015", "4/1/2016", "4/1/2017"), value=c(65,65,67,67,67,68)), row.names=1:6, class="data.frame")
Participant File:
structure(list(birthDate=c("10/10/1968", "6/5/2015","10/10/2017"), Gender=c("M", "U", "F")), row.names=1:3, class="data.frame")
Goal File:
structure(list(birthDate=c("10/10/1968", "6/5/2015","10/10/2017"), Gender=c("M", "U", "F"), Score = c(65,67,68)), row.names=1:3, class="data.frame")
Here is an approach using lag() along with sqldf:
score_table$Key2 <- as.Date(lead(score_table$Key), format="%d/%m/%Y")
score_table$Key <- as.Date(score_table$Key, format="%d/%m/%Y")
names(score_table) <- c("Date1", "Value", "Date2")
participant_table$BirthDate <- as.Date(participant_table$BirthDate, format="%d/%m/%Y")
sql <- "SELECT p.BirthDate, p.Gender, s.Value AS Score
FROM participant_table p
INNER JOIN score_table s
ON (p.BirthDate >= s.Date1 OR s.Date1 IS NULL) AND
(p.BirthDate < s.Date2 OR s.Date2 IS NULL)"
participant_table <- sqldf(sql)
The logic here is to join the participant to the score table using a range of matching dates in the latter. For the edge cases of the first and last rows of the score table, we allow a missing date in either column to represent any date whatsoever. For example, in the last row of the score table, the only requirement for a match is that a date be greater than the lower portion of the range.
I actually do not have R running locally at the moment, but here is a demo link to SQLite showing that the SQL logic works correctly:
Demo
I have found a very simple solution using only arithmetic.
In order to retrieve a score, I check how many numbers are superseded by the input date:
rownum <- sum(as.Date(input_date, format="%d/%m/%Y") >
as.Date(score_table$Key, format="%d/%m/%Y"))
Then, the corresponding key can be found using:
score <- score_table[["Value"]][rownum]
Thus, the spacing of the dates becomes irrelevant and it works quite fast.
I thought I'd share my solution in case it might be of use.
Thanks everyone for the effort and responses!
I am trying to write a script that loops through month-end dates and compares associated fields, but I am unable to find a way to way to do this.
I have my data in a flatfile and subset based on 'TheDate'
For instance I have:
date.range <- subset(raw.data, observation_date == theDate)
Say TheDate = 2007-01-31
I want to find the next month included in my data flatfile which is 2007-02-28. How can I reference this in my loop?
I currently have:
date.range.t1 <- subset(raw.data, observation_date == theDate+1)
This doesnt work obviously as my data is not daily.
EDIT:
To make it more clear, my data is like below
ticker observation_date Price
ADB 31/01/2007 1
ALS 31/01/2007 2
ALZ 31/01/2007 3
ADB 28/02/2007 2
ALS 28/02/2007 5
ALZ 28/02/2007 1
I am using a loop so I want to skip from 31/01/2007 to 29/02/2007 by recognising it is the next date, and use that value to subset my data
First get unique values of date like so:
unique_dates<-unique(raw.data$observation_date)
The sort these unique dates:
unique_dates_ordered<-unique_dates[order(as.Date(unique_dates, format="%Y-%m-%d"))]
Now you can subset based on the index of unique_dates_ordered i.e.
subset(raw.data, raw.data$observation_date == unique_dates_ordered[i])
Where i = 1 for the first value, i = 2 for the second value etc.
listed below multiple salesmen values comes in single report and Single Procedure
==========Salesman1==== Months ===== Percentage
M1 25 25%
M2 30 30%
M3 45 45%
------- how to get the % value based on total of
Total 100
==========
Salesman2 === Months ==== Percentage
M1 12.5 25%
M2 15 30%
M3 22.5 45%
-------
Total 50
i Calling some Value based on the Procdeure and added on the crystal report based on Count . i need the percentage of the particular salesman on the on the percentage but my parameter value brings all the count of all the salesman i cannot take the percentage based on the parameter value . instead of changing my procedure is there anyway to get the percentage of the given group by value
MY store Procedure Parameter are Months and salesmen
thanks in advance
In crystal reports you can create a formula field. Once you open the Formula Editor there is an inbuilt function of "PercentageBySum" which you can use and then drop this field besides your month. It will give you percentage for all values based on group total
thank you for the help. I am attempting to write an equation that uses values selected from an .csv file. It looks something like this, let's call it df.
df<-read.csv("SiteTS.csv", header=TRUE,sep=",")
df
Site TS
1 H4A1 -42.75209
2 H4A2 -43.75101
3 H4A3 -41.75318
4 H4C3 -46.76770
5 N1C1 -42.68940
6 N1C2 -36.95200
7 N1C3 -43.16750
8 N2A2 -38.58040
9 S4C1 -35.32000
10 S4C2 -34.52420
My equation requires the value in the TS column for each site. I am attempting to create a new column called SigmaBS with the results of the equation using TS.
df["SigmaBS"]<-10^(subset(df, Site=="H4A1"/10)
Which is where I am running into issues, as the subset function returns all columns that correlate with the Site column = H4A1
subset(df, Site =="H4A1")
Site TS
1 2411 -42.75209
But again, I only need the value -42.75209.
I apologize if this is a simple question, but I would very much appreciate any help you may be able to offer.
If you insist on using the subset function, it has a select argument:
subset(df, Site=="H4A1", select="TS")
A better option is to use [] notation:
df[df$Site=="H4A1", "TS"]
Or the $ operator:
subset(df, Site=="H4A1")$TS
You can use this simple command:
df$SigmaBS <- 10 ^ (df$TS / 10)
It sounds like you're trying to create a new column called SigmaBS where the values in each row are 10^(value of TS) / 10
If so, this code should work:
SigmaBS <- sapply(df$TS, function(x) 10^(x/10))
df$SigmaBS <- SigmaBS