My data file doesn't have any columns classification and the row1 looks like this:
AB365960091120112011311260000005311300000001ES020000040036ES1400N
I know that characters from 1 to 8 data refer to ID, from 9 to 15 refer to year of birth, from 16 to 28 refer to year of dead and so on. How can I create a table separate according to the character position? What is the way to indicate that ID = character from 1 to 8, for example in R lenguage?
I want my table to look like this:
ID birth date death date
AB36596 9112011 201131126
You can use read_fwf from readr package.
library(readr)
library(dplyr)
df <- read_fwf(file = "test.txt", fwf_widths(c(9, 7, 9))) %>%
`colnames<-`(c("id", "birth date", "death date"))
df
Output is:
id `birth date` `death date`
1 AB3659600 9112011 201131126
Sample data:
test.txt having
AB365960091120112011311260000005311300000001ES020000040036ES1400N
Here a solution based on your example:
Input data:
x<-"AB365960091120112011311260000005311300000001ES020000040036ES1400N"
Split the string in each variable and add them in a data.frame
df<-data.frame(ID=substr(x,1,7),
birth_date=substr(x,10,16),
death_date=substr(x,17,25))
Your desired output
df
ID birth_date death_date
1 AB36596 9112011 201131126
Using the same approach and substr function you will be able to extract all information.
Related
How can you use age_calc function in Rstudio to calculate the age of employees (in years), provided you have a large data set (more than 19 000 entries) where "Date of Birth " is a column name, and then add the ages as a new column to the data set?.
1.Create reproducible example data
data <- data.frame(ID = 1:5,
Date_of_Birth = c("1999-03-05", "1999-05-19", "1999-07-01", "1999-02-27", "1999-06-11"))
2.Convert date string column to actual date using as.Date
data$Date_of_Birth <- as.Date(data$Date_of_Birth, "%Y-%m-%d")
3.Define calc_age_days function that takes dates as argument and returns the number of days since that date
calc_age_days <- function(date_value) {
return(Sys.Date() - date_value)
}
4.Use calc_age_days function to calculate Age column
data$Age <- calc_age_days(date_value=data$Date_of_Birth)
I have daily stock data from 1970 to 2019 subset by year. My goal is to get the date that the minimum occurred in each year in one column and the minimum value in another column. I can iterate through the subsets to get the minimum of each year using "foreach", ".combine=rbind", "lapply" and "which.min" to build a list of the rows that I want but I can't get the date out of the index this way.
mindates <- foreach(i = 1:length(GSPC_yearly), .combine=rbind) %do% {
# I would like to be able to attach the corresponding row label date from the index to this code
spyannmin<-as.numeric(lapply(GSPC_yearly[[i]]$GSPC.Low,min))
spyannmindate<-(lapply(GSPC_yearly[[i]]$GSPC.Low,which.min))
# Or be able to bind this code row wise because its output already includes the row label date from the index of the source data. This is only giving me the result of the last [[i]], i want a table of rows with all the [[i]]'s
spyannmindexdate<-GSPC_yearly[[i]][spyannmindate,3]
result.data<-c(spyannmin, spyannmindate,spyannmindexdate)
}
head(mindates,3)
spyannmindexdate
This gives me output such as...
GSPC.Low
result.1 68.61 101 68.61
result.2 89.34 227 89.34
result.3 100.87 2 100.87
# But I would like the date to appear where the result.# appears or in a new column, I'm not sure which would be better.
GSPC.Low
2019-01-03 2443.96
# Or I would like this data exactly, but pasted with the respective row from each yearly subset of the larger source data. Again I want output that includes the row for each year.
If I use just "which.min" to get the row index then I CAN get the date and the minimum value but without "lapply" I don't know how to build the table with ".combine=rbind"
So I have two approaches to my problem but am missing a key ingredient in each. I would appreciate a solution to either. Solutions to both would help advance my understanding of r programing. Thank you in advance.
Here's one quick-and-dirty way:
library(dplyr)
df <-
tribble(
~year, ~date, ~price,
2018, "2018-12-30", 30,
2018, "2018-12-31", 32,
2019, "2019-01-01", 34,
2019, "2019-01-02", 36,
2019, "2019-01-03", 35
)
df %>%
group_by(year) %>%
filter(price == min(price)) %>%
ungroup()
# A tibble: 2 x 3
year date price
<dbl> <chr> <dbl>
1 2018 2018-12-30 30
2 2019 2019-01-01 34
I am trying to convert a column in dataset that has time format in HMS into seconds.
Below is how my dataset looks like:
Participant Event ID Event_start Event_time
Joe 1 3 1:49:52
Arya 1 2 1:37:39
Cynthia 1 1 1:40:17
I used this
dataset %>%
mutate(Timeinsec = period_to_seconds(hms("Event_time")))
it gives me warning.
The warning is because Event_time is quoted. Try it without quotes:
dataset %>%
mutate(Timeinsec = hms(Event_time))
If you want seconds as an integer, use period_to_seconds:
dataset %>%
mutate(Sec = period_to_seconds(hms(Event_time)))
I have an excel file, in the date column, it shows from 1/1/15 to 12/31/15. I want to change all 15(year) to 14, so that all Date looks like from 1/1/14 to 12/31/14. How to do that in R? Right now I just use replace function manually changed the date. But there are 150000 more records....
If you don't want to convert to 'Date' class and keep the same format, one option would be sub. Here we match the last two characters that are 14 and replace it with 15.
sub('14$', '15', v1)
#[1] "1/1/15" "12/31/15" "1/1/15"
data
v1 <- c('1/1/15', '12/31/15', '1/1/14')
You could use lubridate where you can just subtract 'x' number of years.
library(lubridate)
# some random 2015 dates
df <- data.frame(dates = mdy("01/13/2015", "02/25/2015"))
# subtract 1 year
df$dates <- with(df, dates - years(1))
df
dates
1 2014-01-13
2 2014-02-25
I have a data frame
> df
Age year sex
12 80210 F
13 9123 M
I want to convert the year 80210 as 26june1982. How can I do this that the new data frame contains year in day month year formate from Julian days.
You can convert Julian dates to dates using as.Date and specifying the appropriate origin:
as.Date(8210, origin=as.Date("1960-01-01"))
#[1] "1982-06-24"
However, 80210 needs an origin pretty long ago.
You should substract the origin from the year column.
as.Date(c(80210,9123)-80210,origin='1982-06-26')
[1] "1982-06-26" "1787-11-08"
There are some options for doing this job in the R package date.
See for example on page 4, the function date.mmddyy, which says:
Given a vector of Julian dates, this returns them in the form “10/11/89”, “28/7/54”, etc.
Try this code:
age = c(12,13)
year = c(8210,9123)
sex = c("F","M")
df = data.frame(cbind(age,year,sex))
library(date)
date = date.mmddyy(year, sep = "/")
df2 = transform(df,year=date) #hint provided by jilber
df2
age year sex
1 12 6/24/82 F
2 13 12/23/84 M