I have the problem following, I have a lot of numbers:
x <- c(200103, 200106,200109)
Actually those are dates and I want them in format 2001.03, 2001.06, 2001.09 etc., i.e. I want to add dot after four first numbers. Is there any simple way how can we do that in r?
You can capture data in two groups. 1st 4 characters and next 2 and add "." in between them.
x <- c(200103, 200106,200109)
sub('(.{4})(.{2})', '\\1.\\2', x)
#[1] "2001.03" "2001.06" "2001.09"
The standard way would be to convert to date and use format to get data in required format.
format(as.Date(paste0(x, 1), '%Y%m%d'), '%Y.%m')
We could convert to yearmon class and then use format
library(zoo)
format(as.yearmon(as.character(x), "%Y%m"), "%Y.%m")
#[1] "2001.03" "2001.06" "2001.09"
Related
I know this question has probably been answered in different ways, but still struggling with this. I am working with a dataset where the dates format for date1 is '2/1/2000', '5/12/2000', '6/30/2015' where the class() is character. And the second column of dates date2 in the format '2015-07-06', '2015-08-01', '2017-10-09' where the class() is "POSIXct" "POSIXt" .
I am attempting to standardize both columns so I can compute the difference in days between them using something like this
abs(difftime(date1 ,date2 , units = c("days")))
I have tried numerous ways in converting the first date1 into the same class using strtime, lubridate etc. What's the best way to move forward for me to be able to standardize both and compute the difference in days?
sample data
x <- c('2/1/2000', '5/12/2000', '6/30/2015')
y <- as.POSIXct(c('2015-07-06', '2015-08-01', '2017-10-09'))
code
#make both posixct
x2 <- as.POSIXct(x, format = "%m/%d/%Y")
abs(x2 - y)
# Time differences in days
# [1] 5633.958 5559.000 832.000
I have a dataset that all of it’s date variables are messed up. All of the columns are characters. They look like this:
name <- c(“Ana”, “Maria”, “Rachel”, “Julia”)
date_of_birth <- c(“9/8/1997”, “22/3/1966”, “24/10/1969”, “25/6/2019”)
data <- as.data.frame(cbind(name, date_of_bieth))
I need to turn those dates into dd/mm/yyyy format. They are already in this order, but I need to add zero when dd or mm has only one digit.
For example, “9/8/1997” should be “09/08/1997”.
We can try this
> format(as.Date(date_of_birth, format = "%d/%m/%Y"), "%d/%m/%Y")
[1] "09/08/1997" "22/03/1966" "24/10/1969" "25/06/2019"
I have columns that are named "X1.1.21", "X12.31.20" etc.
I can get rid of all the "X"s by using the substring function:
names(df) <- substring(names(df), 2, 8)
I've been trying many different methods to change "1.1.21" into a date format in R, but I'm having no luck so far. How can I go about this?
R doesn't like column names that start with numbers (hence you get X in front of them). However, you can still force R to allow column names that start with number by using check.names = FALSE while reading the data.
If you want to include date format as column names, you can use :
df <- data.frame(X1.1.21 = rnorm(5), X12.31.20 = rnorm(5))
names(df) <- as.Date(names(df), 'X%m.%d.%y')
names(df)
#[1] "2021-01-01" "2020-12-31"
However, note that they look like dates but are still of type 'character'
class(names(df))
#[1] "character"
So if you are going to use the column names for some date calculation you need to change it to date type first.
as.Date(names(df))
i am working with csv file and i have a column with name "statistics_lastLocatedTime" as shown in
csv file image
i would like to subtract second row of "statistics_lastLocatedTime" from first row; third row from second row and so on till the last row and then store all these differences in a separate column and then combine this column to the other related columns as shown in the code given below:
##select related features
data <- read.csv("D:/smart tech/store/2016-10-11.csv")
(columns <- data[with(data, macAddress == "7c:11:be:ce:df:1d" ),
c(2,10,11,38,39,48,50) ])
write.csv(columns, file = "updated.csv", row.names = FALSE)
## take time difference
date_data <- read.csv("D:/R/data/updated.csv")
(dates <- date_data[1:40, c(2)])
NROW(dates)
for (i in 1:NROW(dates)) {
j <- i+1
r1 <- strptime(paste(dates[i]),"%Y-%m-%d %H:%M:%S")
r2 <- strptime(paste(dates[j]),"%Y-%m-%d %H:%M:%S")
diff <- as.numeric(difftime(r1,r2))
print (diff)
}
## combine time difference with other related columns
combine <- cbind(columns, diff)
combine
now the problem is that i am able to get the difference of rows but not able to store these values as a column and then combine that column with other related columns. please help me. thanks in advance.
This is a four-liner:
Define a custom class 'myDate', and a converter function for your custom datetime, as per Specify custom Date format for colClasses argument in read.table/read.csv
Read in the datetimes as actual datetimes; no need to repeatedly convert later.
Simply use the vectorized diff operator on your date column (it sees their type, and automatically dispatches a diff function for POSIXct Dates). No need for for-loops:
.
setClass('myDate') # this is not strictly necessary
setAs('character','myDate', function(from) {
as.POSIXct(from, format='%d-%m-%y %H:%S', tz='UTC') # or whatever timezone
})
data <- read.csv("D:/smart tech/store/2016-10-11.csv",
colClasses=c('character','myDate','myDate','numeric','numeric','integer','factor'))
# ...
data$date_diff <- c(NA, diff(data$statistics_lastLocatedTime))
Note that diff() produces a result of length one shorter than vector that we diff'ed. Hence we have to pad it (e.g. with a leading NA, or whatever you want).
Consider directly assigning the diff variable using vapply. Also, there is no need for the separate date_data df as all operations can be run on the columns df. Notice too the change in time format to align to the format currently in dataframe:
columns$diff <- vapply(seq(nrow(columns)), function(i){
r1 <- strptime(paste(columns$statistics_lastLocatedTime[i]),"%d-%m-%y %H:%M")
r2 <- strptime(paste(columns$statistics_lastLocatedTime[i+1]),"%d-%m-%y %H:%M")
diff <- difftime(r1, r2)
}, numeric(1))
I know this is a really stupid problem, but it's driving me nuts.
I'm trying to combine two columns in a dataframe.
One column is year, with numbers such as 2006, 2007, etc.
The other column is month, with numbers from 1-12.
I want to create a column called date that looks like this:
2012 and 12 becomes 201212
2012 and 4 becomes 201204
This should be really simple, but I can't seem to get the 0 between the 2012 and 4!!!!!!
The dataframe is called x. I have tried a number of variations of this:
attach(x)
x$mymonth <- as.character(mymonth)
x[!(mymonth=="10"|mymonth=="11"|mymonth=="12"),]$mymonth <- paste0("0",x[!(mymonth=="10"|mymonth=="11"|mymonth=="12"),]$mymonth)
x$mymonth <- as.character(mymonth)
x$date <- paste0(as.character(year),as.character(mymonth),"")
detach(x)
This doesn't work.
We can use sprintf and specify the appropriate fmt.
df1$date <- sprintf("%04d%02d", df1$year, df1$month)
df1$date
#[1] "201501" "201502" "201503" "201504" "201505" "201506" "201507" "201508"
#[9] "201509" "201510" "201511" "201512"
Or another option would be str_pad from library(stringr) and then paste the columns
library(stringr)
paste0(df1$year, str_pad(df1$month, width=2, pad=0))
NOTE: It is not recommended to use attach. Instead we can use with, within etc.
data
df1 <- data.frame(year=2015, month=1:12)