I am trying to import a excel file into R, and one of the columns is %m.%Y. In excel it looks like this:
5.2017
5.2017
5.2017
5.2017
2.2017
9.2017
and when I import it into R using readxlsx it comes out as a character variable:
5.2016999999999998
5.2016999999999998
5.2016999999999998
5.2016999999999998
2.2017000000000002
9.2017000000000007
Any ideas on how to import the data so that I can format the column as %m.%Y?
To create dates you can for example use the lubridate package:
library(lubridate)
library(dplyr)
df <- data.frame(date = c(5.2016999999999998 5.2016999999999998 5.2016999999999998 5.2016999999999998 2.2017000000000002 9.2017000000000007))
df %>%
mutate(date = dmy(paste0("01.", round(date, 4))))
Related
I was working on an assignment,
library(tidyverse)
library(quantmod)
library(lubridate)
macro <- c("GDPC1", "CPIAUCSL","DTB3", "DGS10", "DAAA", "DBAA", "UNRATE", "INDPRO", "DCOILWTICO")
rm(macro_factors)
for (i in 1:length(macro)){
getSymbols(macro[i], src = "FRED")
data <- as.data.frame(get(macro[i]))
data$date <- as.POSIXlt.character(rownames(data))
rownames(data) <- NULL
colnames(data)[1] <- "macro_value"
data$quarter <- as.yearqtr(data$date)
data$macro_ticker <- rep(macro[i], dim(data)[1])
data <- data%>%
mutate(date = ymd(date))%>%
group_by(quarter)%>%
top_n(1,date) %>%
filter(date >= "1980-01-01", date <= "2019-12-31") %>%
if(i == 1){macro_factors <- data} else {macro_factors <- rbind(macro_factors, data)}
}
but this came out
Error in as.POSIXlt.character(rownames(data)) :
character string is not in a standard unambiguous format
I try follow the online tutorial of using as.POSIXct() by convert the data from charater to numeric first, but it did not work for my case, and I check the class of the data and the data shown like "year-month-day", and is in the class of character, supposedly the function as.POSIXlt() will work right?
There are several problems:
POSIXlt class should not be used in data frames. Also do not use POSIXct for dates since you can get into needless time zone problems.
to convert an xts object, such as the object produced by getSymbols , to a data frame use fortify.zoo
depending on what you want to do you might not need to convert from xts to a data frame in the first place. Suggest reading about xts and zoo in the documentation of those packages.
This gives a list of data frames L and then a long data frame DF containing them all.
library(dplyr, exclude = c("filter", "lag"))
library(quantmod) # also brings in xts and zoo
macro <- c("GDPC1", "CPIAUCSL")
getData <- function(symb) symb %>%
getSymbols(src = "FRED", auto.assign = FALSE) %>%
aggregate(as.yearqtr, tail, 1) %>%
window(start = "1980q1", end = "2019q4") %>%
fortify.zoo
L <- Map(getData, macro)
DF <- bind_rows(L, .id = "id")
I have 33 csv files that have dates such as 1/01/2020...31/01/2020. I have to import the files using tidyverse library's readr packages read_csv() and assign them in a tibble. So my code looks like this:
main_df <- read_csv("./data/202001.csv",
skip = 7,
col_types = cols(
Date = col_date(format = "%d/%m/%Y")
)
)
Date is the column name here. When I try to run this code I gives an NA value in the tibble because in the "%d" in format always recognizes 2 digit dates which is why dates like "1/01/2020" is generating "NA". How can I overcome this issue? TIA.
Use as.POSIXct:
main_df <- read_csv("./data/202001.csv", skip = 7)
main_df['Date'] = as.POSIXct(main_df$Date, format="%d/%m/%Y")
main_df
Or if the above does't work, try stringr library with the str_pad function.
Also use as.POSIXct:
library(stringr)
main_df <- read_csv("./data/202001.csv", skip = 7)
main_df['Date'] = as.POSIXct(str_pad(main_df$Date, 10, pad = "0"), format="%d/%m/%Y")
main_df
I am trying to convert a character class column to a date class.
The original character class format is %d/%m/%Y which I am trying to convert to date class preserving the same format.
Here's my attempt:
library(dplyr)
testing_df <- data.frame(mes=c('01/02/2021',
'01/01/2021', '01/12/2020',
'01/11/2020', '01/10/2020'))
test_pipeline <- testing_df %>%
dplyr::filter(mes %in% c('01/02/2021',
'01/01/2021', '01/12/2020',
'01/11/2020', '01/10/2020')) %>%
dplyr::mutate(mes=format(as.Date(mes, format='%Y-%m-%d'), '%d/%m/%Y'))
Which returns a column composed of NA.
The idea is that class(test_pipeline$mes) returns Date while mes column preserves the same date format.
How could I accomplish this task?
I am assuming here that your date format is day/month/year. Is this what you are looking for? You should only nee one as.Date() statement.
library(dplyr)
testing_df <- data.frame(mes=c('01/02/2021',
'01/01/2021', '01/12/2020',
'01/11/2020', '01/10/2020'))
test_pipeline <- testing_df %>%
mutate(mes = as.Date(mes, format = '%d/%m/%Y'))
We can use dmy from lubridate
library(lubridate)
testing_df$mes <- dmy(testing_df$mes)
I have a large dataset that I'm importing from a txt file that has multiple date variables that are being formatted as number values 20190101, is there a way to assign a date format as part of import? There is no header in the file and I'm assigning names and lengths sample code below.
df <- read_fwf("file name",
fwf_cols(id = 8,
update_date = 8,
name = 35),
skip = 0)
Or is there a way to convert multiple values in one statement vs one at a time?
df$update_date <- as.Date(as.character(df$update_date), "%Y%m%d")
Here is a way to convert multiple values in one statement into Dates
(assuming yyyy mm dd). Here we target all columns that end with "date" in their name.
library(dplyr)
df <- data.frame(update_date = c(20190101, 20190102, 20190103),
end_date = c(20200101, 20200102, 20200103))
df %>% mutate_at(vars(ends_with("date")), ~as.Date(as.character(.x),format="%Y%m%d"))
You might similarly use
mutate_at(vars(starts_with("date"))
or
mutate_at(vars(c(update_date, end_date)
I am a beginner user in R and have been compiling a code to create a custom function to execute a specific task on some data that I possess. The custom function is structured to identify missing data in a csv file and patch this using the mean value. Thereafter, I want to summarize the data by year and month and export this as a csv file. I have multiple csv files that are sitting in a folder and would like to perform this task on each of these files. Thus far, I am able to get the code to perform the task at hand but don't know how to write a unique output for each csv file that has been processed and save these to a new folder. I would also like to retain the original file name in the processed output but have the words "_processed" appended to it. Additionally, any suggestions on how this code can be improved are most welcome. Thanks in advance.
# Load all packages required by the script
library(tidyverse) # data science package
library(lubridate) # work with dates
library(dplyr) # data manipulation (filter, summarize, mutate)
library(ggplot2) # graphics
library(gridExtra) # tile several plots next to each other
library(scales)
# Set the working directory #
setwd("H:/Shaeden_Post_Doc/Genus_Exchange/GEE_Data/MODIS_Product_Data_Raw/Cold_Temperate_Moist")
#create a function to summarize data by year and month
#patch missing values using the average
summarize_by_month = function(df){
# counting unique, missing and mean values in the ET column
df %>% summarise(n = n_distinct(ET),
na = sum(is.na(ET)),
med = mean(ET, na.rm = TRUE))
# assign mean values to the missing data and modify the dataframe
df = df %>%
mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
df
#separate data into year, month and day
df$date = as.Date(df$date,format="%Y/%m/%d")
#summarize by year and month
df %>%
mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
group_by(year, month) %>%
summarise(mean_monthly = mean(ET))
}
#import all files and execute custom function for each
file_list = list.files(pattern="AET", full.names=TRUE)
file_list
my_AET_files = lapply(file_list, read_csv)
monthly_AET = lapply(my_AET_files, summarize_by_month)
monthly_AET
A link to the sample datasets is provided below
https://drive.google.com/drive/folders/1pLHt-vT87lxzW2We-AS1PwVcne3ALP2d?usp=sharing
You can read, manipulate data and write the csv in the same function :
library(dplyr)
summarize_by_month = function(file) {
df <- readr::read_csv(file)
# assign mean values to the missing data and modify the dataframe
df = df %>% mutate(ET = replace(ET,is.na(ET),mean(ET, na.rm = TRUE)))
#separate data into year, month and day
df$date = as.Date(df$date,format="%Y/%m/%d")
#summarize by year and month
new_df <- df %>%
mutate(year = format(date, "%Y"), month = format(date, "%m")) %>%
group_by(year, month) %>%
summarise(mean_monthly = mean(ET))
write.csv(new_df, sprintf('output_folder/%s_processed.csv',
tools::file_path_sans_ext(basename(file))), row.names = FALSE)
}
monthly_AET = lapply(file_list, summarize_by_month)
path<-"your_peferred_path/" #set a path to were you want to save the files
x<-list.files(pattern= "your_pattern") # create a list of your file names
name<-str_sub(x, start=xL, end=yL) #x & y being the part of the name you want to keep
for (i in 1:length(monthly_AET)){
write_excel_csv(monthly_AET[i], paste0(path, name, "_processed.csv")) # paste0 allows to create custom names from variables and static strings
}
note: this is only an assumption and may have to be tweaked to suit your needs