I am importing some data in to R and want the code to stop running if there is no file or there is no data in the file. I'm using base R and readxl. Please can you help with the syntax?
I've tried
if (dim(Llatest) == NULL) {stop('STOP NO DATA')}
if (dim(Llatest)[1] == 0) + stop('STOP NO DATA')}
if (isTRUE(dim(Llatest) == NULL)) {stop('STOP NO DATA')}
Some data imported from Sep19import.xlsx
ID Code Received Actioned Decision
1 123 Jul 01 2019 Sep 02 2019 Hold
2 456 Jul 11 2019 Sep 13 2019 No action
3 789 Nov 26 2018 Sep 25 2019 Investigate
4 321 Sep 12 2019 Sep 12 2019 Await decision
5 654 Aug 30 2019 Sep 26 2019 Hold
6 987 Feb 22 2019 Sep 02 2019 Investigate
Obtain list of files for import
LFiles <- list.files(path = "C:/Projects/Sep/code", pattern = "*import.xlsx", full.names = TRUE)
***I wish to stop here if LFiles is empty
Identify the latest file
Llatest <- subset(LFiles, LFiles == max(LFiles))
Extract data from file
LMonthly <- read_excel(Llatest)
***I wish to stop here if LMonthly is empty
Error Messages received - no non-missing arguments, returning NA
I expect the output to be 'STOP NO DATA'
Related
I have a list of dfs and a list of annual budgets.
Each df represents one business year, and each budget represents a total spend for that year.
# the business year starts from Feb and ends in Jan.
# the budget column is first populated with the % of annual budget allocation
df <- data.frame(monthly_budget=c(0.06, 0.13, 0.07, 0.06, 0.1, 0.06, 0.06, 0.09, 0.06, 0.06, 0.1, 0.15),
month=month.abb[c(2:12, 1)])
# dfs for 3 years
df2019_20 <- df
df2020_21 <- df
df2021_22 <- df
# budgets for 3 years
budget2019_20 <- 6000000
budget2020_21 <- 7000000
budget2021_22 <- 8000000
# into lists
df_list <- list(df2019_20, df2020_21, df2021_22)
budget_list <- list(budget2019_20, budget2020_21, budget2021_22)
I've written the following function to both apply the right year to Jan and fill in the rest by deparsing the respective dfs name.
It works perfectly if I supply a single df and a single budget.
budget_func <- function(df, budget){
df_name <- deparse(substitute(df))
df <- df %>%
mutate(year=ifelse(month=="Jan",
as.numeric(str_sub(df_name, -2)) + 2000,
as.numeric(str_extract(df_name, "\\d{4}(?=_)")))
)
for (i in 1:12){
df[i,1] <- df[i,1] * budget
i <- i+1
}
return(df)
}
To speed things up I want to pass both lists as arguments to mapply. However I don't get the results I want - what am I doing wrong?
final_budgets <- mapply(budget_func, df_list, budget_list)
Instead of using deparse/substitute (which works when we are passing a single dataset, and is different in the loop because the object passed is not the object name), we may add a new argument to pass the names. In addition, when we create the list, it should have the names as well. We can either use list(df2019_20 = df2019_20, ...) or use setNames or an easier option is dplyr::lst which does return with the name of the object passed
budget_func <- function(df, budget, nm1){
df_name <- nm1
df <- df %>%
mutate(year=ifelse(month=="Jan",
as.numeric(str_sub(df_name, -2)) + 2000,
as.numeric(str_extract(df_name, "\\d{4}(?=_)")))
)
for (i in 1:12){
df[i,1] <- df[i,1] * budget
i <- i+1
}
return(df)
}
-testing
df_list <- dplyr::lst(df2019_20, df2020_21, df2021_22)
budget_list <- list(budget2019_20, budget2020_21, budget2021_22)
Map(budget_func, df_list, budget_list, names(df_list))
-output
$df2019_20
monthly_budget month year
1 360000 Feb 2019
2 780000 Mar 2019
3 420000 Apr 2019
4 360000 May 2019
5 600000 Jun 2019
6 360000 Jul 2019
7 360000 Aug 2019
8 540000 Sep 2019
9 360000 Oct 2019
10 360000 Nov 2019
11 600000 Dec 2019
12 900000 Jan 2020
$df2020_21
monthly_budget month year
1 420000 Feb 2020
2 910000 Mar 2020
3 490000 Apr 2020
4 420000 May 2020
5 700000 Jun 2020
6 420000 Jul 2020
7 420000 Aug 2020
8 630000 Sep 2020
9 420000 Oct 2020
10 420000 Nov 2020
11 700000 Dec 2020
12 1050000 Jan 2021
$df2021_22
monthly_budget month year
1 480000 Feb 2021
2 1040000 Mar 2021
3 560000 Apr 2021
4 480000 May 2021
5 800000 Jun 2021
6 480000 Jul 2021
7 480000 Aug 2021
8 720000 Sep 2021
9 480000 Oct 2021
10 480000 Nov 2021
11 800000 Dec 2021
12 1200000 Jan 2022
My project:
I am looping through shapefiles in a folder, and running some calculations to add new columns with new values in the output shapefile
My problem:
The calculations are correct for the first iteration. However these values are then added as columns to every subsequent shapefile (rather than doing new calculations per iteration). Below is the code. The final columns resulting from this code running are: final_year, final_month, final_day, final_date.
My code:
library(rgdal)
library(tidyverse)
library(magrittr)
library(dplyr)
input_path<- "/Users/JohnDoe/Desktop/Zone_Fixup/Z4/Z4_Split/"
output_path<- "/Users/JohnDoe/Desktop/Zone_Fixup/Z4/Z4_Split_Out/"
files<- list.files(input_path, pattern = "[.]shp$")
for(f in files){
ifile<- list.files(input_path, f)
shp_paste<- paste(input_path, ifile, sep = "")
tryCatch({shp0<- readOGR(shp_paste, verbose=FALSE)}, error = function(e){print("Error1.")})
#Order shapefile by filename
shp1<- as.data.frame(shp0)
shp2<- shp1[order(shp1$filename),]
#Sort final dates by relative length values.
#If it's increasing, it's day1; if it's decreasing it's day3, etc.
shp2$final_day1<- ifelse(lag(shp2$Length1)<shp2$Length1, paste0(shp2$day1), paste0(shp2$day3))
shp2$final_month1<- ifelse(lag(shp2$Length1)<shp2$Length1, paste0(shp2$month1), paste0(shp2$month3))
shp2$final_year1<- ifelse(lag(shp2$Length1)<shp2$Length1, paste0(shp2$year1), paste0(shp2$year3))
#Remove first NA value of each column
if(is.na(shp2$final_day1[1])){
ex1<- shp2$day1[1]
ex2<- as.character(ex1)
ex3<- as.numeric(ex2)
shp2$final_day1[1]<- ex2
}
if(is.na(shp2$final_month1[1])){
ex4<- shp2$month1[1]
ex5<- as.character(ex4)
ex6<- as.numeric(ex5)
shp2$final_month1[1]<- ex5
}
if(is.na(shp2$final_year1[1])){
ex7<- shp2$year1[1]
ex8<- as.character(ex7)
ex9<- as.numeric(ex8)
shp2$final_year1[1]<- ex9
}
#Add final dates to shapefile as new columns
shp0$final_year<- shp2$final_year1
shp0$final_month<- shp2$final_month1
shp0$final_day<- shp2$final_day1
final_paste<- paste(shp0$final_year, "_", shp0$final_month, "_", shp0$final_day, sep = "")
shp0$final_date<- final_paste
#Create new shapefile for write out
shp44<- shp0
#Write out shapefile
ifile1<- substring(ifile, 1, nchar(ifile)-4)
#tryCatch({writeOGR(shp44, output_path, layer = ifile1, driver = "ESRI Shapefile", overwrite_layer = TRUE)}, error = function(e){print("Error2.")})
test1<- head(shp44)
print(test1)
}
My results:
Here are two head() tables. The first table is correct. The second table is not correct. Notice that the final_year, final_month, final_day, and final_year columns are identical in the two tables. NOTE: These columns are the last four in the table
Table 1:
coordinates Length1 Bathy Vector filename zone year1 year2 year3 month1 month2 month3 day1 day2 day3 final_year final_month final_day final_date
1 (-477786.3, 1110917) 29577.64 -6.455580 0 Zone4_2000_02_05_2000_02_15_2000_02_24 Zone4 2000 2000 2000 02 02 02 05 15 24 1997 02 15 1997_02_15
2 (-477786.3, 1110917) 29577.64 -6.455580 0 Zone4_2000_02_24_2000_03_10_2000_03_17 Zone4 2000 2000 2000 02 03 03 24 10 17 1997 03 26 1997_03_26
3 (-477848.2, 1113468) 27025.88 -2.100153 0 Zone4_2000_03_24_2000_04_03_2000_04_10 Zone4 2000 2000 2000 03 04 04 24 03 10 1997 04 19 1997_04_19
4 (-477871, 1114406) 26087.98 -4.700025 0 Zone4_2006_03_10_2006_03_27_2006_04_03 Zone4 2006 2006 2006 03 03 04 10 27 03 1998 02 08 1998_02_08
5 (-477876.1, 1114616) 25877.25 -7.598877 0 Zone4_2008_03_06_2008_03_16_2008_03_25 Zone4 2008 2008 2008 03 03 03 06 16 25 1998 03 28 1998_03_28
6 (-477878.8, 1114730) 25764.14 -7.598877 0 Zone4_2008_03_30_2008_04_09_2008_04_23 Zone4 2008 2008 2008 03 04 04 30 09 23 1998 04 21 1998_04_21
Table 2:
coordinates Length1 Bathy Vector filename zone year1 year2 year3 month1 month2 month3 day1 day2 day3 final_year final_month final_day final_date
1 (-477813.5, 1110939) 29612.26 -6.455580 1 Zone4_2000_02_05_2000_02_15_2000_02_24 Zone4 2000 2000 2000 02 02 02 05 15 24 1997 02 15 1997_02_15
2 (-477813.5, 1110939) 29612.26 -6.455580 1 Zone4_2000_02_24_2000_03_10_2000_03_17 Zone4 2000 2000 2000 02 03 03 24 10 17 1997 03 26 1997_03_26
3 (-477883.4, 1113392) 27158.05 -2.100153 1 Zone4_2000_03_24_2000_04_03_2000_04_10 Zone4 2000 2000 2000 03 04 04 24 03 10 1997 04 19 1997_04_19
4 (-477909.9, 1114319) 26230.17 -4.700025 1 Zone4_2006_03_10_2006_03_27_2006_04_03 Zone4 2006 2006 2006 03 03 04 10 27 03 1998 02 08 1998_02_08
5 (-477916.7, 1114558) 25991.57 -7.598877 1 Zone4_2008_03_06_2008_03_16_2008_03_25 Zone4 2008 2008 2008 03 03 03 06 16 25 1998 03 28 1998_03_28
6 (-477920.1, 1114678) 25871.39 -7.598877 1 Zone4_2008_03_30_2008_04_09_2008_04_23 Zone4 2008 2008 2008 03 04 04 30 09 23 1998 04 21 1998_04_21
It looks like my code is taking the column values from the first iteration and adding them to shapefiles in subsequent iterations. How can my code be modified to run new calculations with each iteration, and add those unique values to their respective shapefiles?
Thank you
I think your problem may be with the start of your for loop.
files<- list.files(input_path, pattern = "[.]shp$") #keep this line to get your files
for (f in 1:length(files)){ # change this to the length of files to iterate over files one by one
ifile<- list.files(input_path, f) #delete this line from your code
shp_paste<-paste(input_path,files[f],sep="") # use this line to iterate over each shp file
keep the rest of you code as it is and see if this helps..
Thank you for your help, everyone, I found the problem. A tad embarrassing, I wasn't sorting the filename by ascending order before adding the new columns in. Therefore it seemed like the values in the new columns were wrong, because they weren't matched to the correct rows. A clumsy error on my part, thanks to all who offered advice.
I have the following data frames(month columns with NA value) : df
Vehicle Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec <br/>
123X<br/>
435y<br/>
...<br/>
where number of row is defined by the number of unique vehicle number. Number of column is 13
the following data frame is generated in for loop for each vehicle to find out the number of breakdown of each vehicle in each month
[for a single vehicle]
occurrences:<br/>
Month Freq<br/>
Jan 1<br/>
Mar 3<br/>
Jul 5<br/>
May 3<br/>
each time the occurrence data frame is generated i want to plug in the freq value into the df data frame by the month name.
I have tried using for loop, which is becoming very very complex.
Is there any easier way plug in the value from the occurrences data frame into the df data frame?
I am expecting the following result: for df
Vehicle Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
123X 1 na 3 na 3 na 5 na na na na na
435y
...
I've got some data. I want to add a column, but not in the regular way.
data <- data.frame(month_num = 1:12, month_name = month.abb)
data
month_num month_name
1 1 Jan
2 2 Feb
3 3 Mar
4 4 Apr
5 5 May
6 6 Jun
7 7 Jul
8 8 Aug
9 9 Sep
10 10 Oct
11 11 Nov
12 12 Dec
Now, I want to add a third column to this data. For example I want to make the following vector a column within data:
sentiment = c(rep("cold", 3), rep("hot", 6), rep("cold", 3)
What I would normally do (in baseR) is one of the following:
Add it using $
data$sentiment <- sentiment
Add it via column index creation
data[,3] <- sentiment
Add it in initial creation
data.frame(month_num = 1:12, month_name = month.abb, sentiment = sentiment)
Yes, data.table also has this nicely done within its reference semantics.
data <- data.table(month_num = 1:12, month_name = month.abb)
data[,`:=`(sentiment = sentiment)]
data
month_num month_name sentiment
1: 1 Jan cold
2: 2 Feb cold
3: 3 Mar cold
4: 4 Apr hot
5: 5 May hot
6: 6 Jun hot
7: 7 Jul hot
8: 8 Aug hot
9: 9 Sep hot
10: 10 Oct cold
11: 11 Nov cold
12: 12 Dec cold
However, I don't want to add it in this way. I want to use dplyr related functions to do this task. Is there any function within dplyr that will let me perform this task of column creation?
NOTE: mutate() will not work! (or as I know of it right now).
data%>%
mutate(sentiment = sentiment)
month_num month_name V3 sentiment
1 1 Jan cold cold
2 2 Feb cold cold
3 3 Mar cold cold
4 4 Apr hot hot
5 5 May hot hot
6 6 Jun hot hot
7 7 Jul hot hot
8 8 Aug hot hot
9 9 Sep hot hot
10 10 Oct cold cold
11 11 Nov cold cold
12 12 Dec cold cold
As you can see the column is duplicated and I'm not really sure why that's happening. Perhaps it has to do with the number of unique values in sentiment?
All in all, is there a way to accomplish this within dplyr using mutate() or other related functions?
The simpliest way I know is using the function case_when:
data <- data.frame(month_num = 1:12, month_name = month.abb)
data
sentiment = c(rep("cold", 3), rep("hot", 6), rep("cold", 3)
data <- data %>%
mutate(sentiment=case_when(
month_num<=3 | month_num>=10 ~ "cold",
month_num>=4 & month_num<=9 ~ "hot"
))
The txt is like
#---*----1----*----2----*---
Name Time.Period Value
A Jan 2013 10
B Jan 2013 11
C Jan 2013 12
A Feb 2013 9
B Feb 2013 11
C Feb 2013 15
A Mar 2013 10
B Mar 2013 8
C Mar 2013 13
I tried to use read.table with readLines and count.field as shown belows:
> path <- list.files()
> data <- read.table(text=readLines(path)[count.fields(path, blank.lines.skip=FALSE) == 4])
Warning message:
In readLines(path) : incomplete final line found on 'data1.txt'
> data
V1 V2 V3 V4
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
The problem is that it give four attributes instead of three. Therefore i manipulate my data as below which seeking a alternative.
> library(zoo)
> data$Name <- as.character(data$V1)
> data$Time.Period <- as.yearmon(paste(data$V2, data$V3, sep=" "))
> data$Value <- as.numeric(data$V4)
> DATA <- data[, 5:7]
> DATA
Name Time.Period Value
1 A Jan 2013 10
2 B Jan 2013 11
3 C Jan 2013 12
4 A Feb 2013 9
5 B Feb 2013 11
6 C Feb 2013 15
7 A Mar 2013 10
8 B Mar 2013 8
9 C Mar 2013 13
You can use read.fwf to read fixed width files. You need to correctly specify the width of each column, in spaces.
data <- read.fwf(path, widths=c(-12, 8, -4, 2), header=T)
The key there is how you specify the width. Negative means skip that many places, positive means read that many. I am assuming entries in the last column have only 2 digits. Change widths accordingly if this is not the case. You will probably also have to fix the column names.
You will have to change the indices if the file format changes, or come up with some clever regexp to read it from the first few rows. A better solution would be to enclose your strings in " or, even better, avoid the format altogether.
?count.fields
As the R Documentation states count.fields counts the number of fields, as separated by sep, in each of the lines of file read, when you set count.fields(path, blank.lines.skip=FALSE) == 4 it will skip the header row which actually has three fields.