Related
I have a DF with account numbers and the balance on a monthly basis. For several months there is no value (NAs).
I want to fill in all NAs with 0 until R finds the first month with a value. When found the first observation it should fill in the last value found in the DF. For the end of the DF I want to use the last found observation. There is a sample of the DF, to make it easier to understand
Test = structure(list(Account = c("9876", "9876", "9876", "9876", "9876",
"9876", "9876", "9876", "9876", "9876", "9876", "9876", "9876",
"9876", "9876", "9876", "9876", "9876", "9876", "9876", "9876",
"9876", "9876", "9876", "9876", "9876", "9876", "9876", "1234",
"1234", "1234", "1234", "1234", "1234", "1234", "1234", "1234",
"1234", "1234", "1234", "1234", "1234", "1234", "1234", "1234",
"1234", "1234", "1234", "1234", "1234", "1234", "1234", "1234",
"1234", "1234", "1234"), Date = structure(c(17409, 17439, 17470,
17500, 17531, 17562, 17590, 17621, 17651, 17682, 17712, 17743,
17774, 17804, 17835, 17865, 17896, 17927, 17955, 17986, 18016,
18047, 18077, 18108, 18139, 18169, 18200, 18230, 17409, 17439,
17470, 17500, 17531, 17562, 17590, 17621, 17651, 17682, 17712,
17743, 17774, 17804, 17835, 17865, 17896, 17927, 17955, 17986,
18016, 18047, 18077, 18108, 18139, 18169, 18200, 18230), class = "Date"),
Balance = c(NA, NA, NA, 0, NA, -0.0025, -0.0025, NA, 0, 0,
NA, NA, 0, NA, 0, NA, NA, 0, NA, 0, 0, 0, NA, NA, NA, NA,
0, 0, NA, -2097.774, -2097.774, NA, NA, -3339.004, NA, NA,
NA, -5791.112, NA, NA, 0, 0, 0, NA, NA, 0, 0, 0, NA, 0, -90.30116,
-90.30116, NA, NA, NA, -474.4858), `First Observation` = c(NA,
NA, NA, 1L, NA, 0L, 0L, NA, 0L, 0L, NA, NA, 0L, NA, 0L, NA,
NA, 0L, NA, 0L, 0L, 0L, NA, NA, NA, NA, 0L, 0L, NA, 1L, 0L,
NA, NA, 0L, NA, NA, NA, 0L, NA, NA, 0L, 0L, 0L, NA, NA, 0L,
0L, 0L, NA, 0L, 0L, 0L, NA, NA, NA, 0L)), class = c("data.table",
"data.frame"), row.names = c(NA, -56L))
Account number 9876:
So my desired output is that I get 0s in the balance for Date 2017-08 until 2017-10 for until the first observation on the 2017-11 is reached
2017-12 I would like to get the observation before (2017-11) with a balance of 0
2018-03, 2018-06, 2018-07 I would like to get the observation before (2018-02 and 2018-05) with a balance of -0.00250
From 2019-11 until the end of my time series I would like to get the last observation (here it is 2019-11, which is as well the end of my time series, but I cut the data. The original one reaches until 2022-08)
Account 1234:
For 2017-08 I would like to get 0 until I reach my first observation with a value
For 2017-11 and 12 I would like to get -2097.774 (which is the balance of 2017-10)
So basically I would like to have 0s for the beginning of my time series in the balance until I reach my first account balance, then take the last known balance for all other dates and then have the same value of my last balance until I reach the end of my time series.
Thank You!
The function fill from the tidyr package allows you to fill missing values with the preceding or following entry.
After filling the values downwards, NAs before the first observation can then be replace with 0 using replace_na.
library(tidyr)
library(dplyr)
Test %>%
group_by(Account) %>%
fill(Balance, .direction = 'down') %>%
replace_na(list(Balance = 0))
I have a data set that looks like this:
structure(list(Date2 = structure(c(18428, 18438, 18428, 18438,
18428, 18438, 18428, 18438, 18428, 18438, 18428, 18438, 18428,
18438, 18428, 18438, 18428, 18438, 18428, 18438, 18428, 18438,
18428, 18438, 18428, 18438, 18428, 18438, 18428, 18438, 18428,
18438, 18428, 18438, 18428, 18438, 18428, 18438, 18428, 18438,
18428, 18438, 18428, 18438), class = "Date"), Fish_ID = c("Fork1",
"Fork1", "Fork10", "Fork10", "Fork12", "Fork12", "Fork13", "Fork13",
"Fork14", "Fork14", "Fork15", "Fork15", "Fork16", "Fork16", "Fork17",
"Fork17", "Fork18", "Fork18", "Fork19", "Fork19", "Fork2", "Fork2",
"Fork20", "Fork20", "Fork21", "Fork21", "Fork22", "Fork22", "Fork23",
"Fork23", "Fork3", "Fork3", "Fork4", "Fork4", "Fork5", "Fork5",
"Fork6", "Fork6", "Fork7", "Fork7", "Fork8", "Fork8", "Fork9",
"Fork9"), Lat2 = c(32.9394, NA, 32.92935, NA, NA, 32.9047333333333,
NA, 32.9093833333333, NA, 32.9509833333333, 32.9160666666667,
NA, NA, 32.9074333333333, NA, 32.9029, NA, 32.90775, NA, 32.9094,
NA, NA, 32.9455166666667, 32.9459166666667, 32.9431, 32.9437666666667,
32.90365, 32.9044333333333, 32.9056166666667, 32.90585, NA, 32.9475333333333,
32.94325, NA, 32.9288833333333, NA, NA, NA, 32.9297, NA, NA,
NA, 32.9303, NA), Long2 = c(-95.6334, NA, -95.6406, NA, NA, -95.6531666666667,
NA, -95.6486, NA, -95.6252333333333, -95.648, NA, NA, -95.6391166666667,
NA, -95.64155, NA, -95.6393666666667, NA, -95.63895, NA, NA,
-95.6391166666667, -95.6389333333333, -95.6365, -95.6401333333333,
-95.6535666666667, -95.6532833333333, -95.6560333333333, -95.6575166666667,
NA, -95.63015, -95.6334333333333, NA, -95.6395, NA, NA, NA, -95.6398833333333,
NA, NA, NA, -95.6425166666667, NA), lag.Lat2 = c(NA, 32.9394,
NA, 32.92935, NA, NA, NA, NA, NA, NA, NA, 32.9160666666667, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32.9455166666667, NA,
32.9431, NA, 32.90365, NA, 32.9056166666667, NA, NA, NA, 32.94325,
NA, 32.9288833333333, NA, NA, NA, 32.9297, NA, NA, NA, 32.9303
), lag.Long2 = c(NA, -95.6334, NA, -95.6406, NA, NA, NA, NA,
NA, NA, NA, -95.648, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, -95.6391166666667, NA, -95.6365, NA, -95.6535666666667, NA,
-95.6560333333333, NA, NA, NA, -95.6334333333333, NA, -95.6395,
NA, NA, NA, -95.6398833333333, NA, NA, NA, -95.6425166666667)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -44L), groups = structure(list(
Fish_ID = c("Fork1", "Fork10", "Fork12", "Fork13", "Fork14",
"Fork15", "Fork16", "Fork17", "Fork18", "Fork19", "Fork2",
"Fork20", "Fork21", "Fork22", "Fork23", "Fork3", "Fork4",
"Fork5", "Fork6", "Fork7", "Fork8", "Fork9"), .rows = structure(list(
1:2, 3:4, 5:6, 7:8, 9:10, 11:12, 13:14, 15:16, 17:18,
19:20, 21:22, 23:24, 25:26, 27:28, 29:30, 31:32, 33:34,
35:36, 37:38, 39:40, 41:42, 43:44), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -22L), .drop = TRUE))
Lat2 and Long2 are the locations of each fish and lag values are for the location of each fish the prior time they were located.
I am trying to calculate the distance between each Long2 Lat2 value and each lag.Long2 lag.Long2 value so that I can calculate the distance traveled from the last time each fish was located. I know how to do this by hand for each one using the geosphere package, but I'm wondering if there is a for loop I could write to do this calculation for each individual fish so I could automate the process?
Thanks!
using your data, we're looking for fish that moved, fishx and fishy, as matrix, with long before lat, so perhaps:
complete <- which(complete.cases(fish_df)==TRUE)
fishx <- as.matrix(fish_df[complete, 4:3])
fishx
Long2 Lat2
24 -95.63893 32.94592
26 -95.64013 32.94377
28 -95.65328 32.90443
30 -95.65752 32.90585
fishy <- as.matrix(fish_df[complete, 6:5])
geosphere::distm(fishx, fishy, fun = geosphere::distGeo)
[,1] [,2] [,3] [,4]
[1,] 47.55876 386.4671 4883.23872 4746.9509
[2,] 216.11520 347.7147 4623.08009 4484.7115
[3,] 4745.03178 4566.5523 90.82777 288.8099
[4,] 4723.80758 4574.9848 442.81633 141.1616
diag(geosphere::distm(fishx, fishy, fun = geosphere::distGeo))
[1] 47.55876 347.71468 90.82777 141.16164
# presumably in meters
You know your study area so this may or may not be correct... Which is to say, no need for loop, just get your matrices.
I have data.frame that looks like this:
I want to quickly reshape it so I will only one record for each ID, something that is looks like this:
df can be build using codes:
df<-structure(list(ID = structure(c("05-102", "05-102", "05-102",
"01-103", "01-103", "01-103", "08-104", "08-104", "08-104", "05-105",
"05-105", "05-105", "02-106", "02-106", "02-106", "05-107", "05-107",
"05-107", "08-108", "08-108", "08-108", "02-109", "02-109", "02-109",
"05-111", "05-111", "05-111", "07-115", "07-115", "07-115"), label = "Unique Subject Identifier", format.sas = "$"),
EXSTDTC1 = structure(c(NA, NA, NA, 17022, NA, NA, 17024,
NA, NA, 17032, NA, NA, 17038, NA, NA, 17092, NA, NA, 17108,
NA, NA, 17155, NA, NA, 17247, NA, NA, 17333, NA, NA), class = "Date"),
EXSTDTC6 = structure(c(NA, 16885, NA, NA, NA, 17031, NA,
NA, 17032, NA, NA, 17041, NA, NA, 17047, NA, NA, 17100, NA,
NA, 17116, NA, 17164, NA, NA, NA, 17256, NA, 17342, NA), class = "Date"),
EXSTDTC3 = structure(c(NA, NA, 16881, NA, 17027, NA, NA,
17029, NA, NA, 17037, NA, NA, 17043, NA, NA, 17097, NA, NA,
17113, NA, NA, NA, 17160, NA, 17252, NA, NA, NA, 17338), class = "Date"),
EXDOSEA1 = c("73.8+147.6", NA, NA, "64.5+129", NA, NA, "62.7+125.4",
NA, NA, "114+57", NA, NA, "60+117.5", NA, NA, "48.6+97.2",
NA, NA, "61.2+122.4", NA, NA, "47.7+95.4", NA, NA, "51.6+103.2",
NA, NA, "68+136", NA, NA), EXDOSEA6 = c(NA, "100", NA, NA,
NA, "86", NA, NA, "83.5", NA, NA, "76", NA, NA, "39.2", NA,
NA, "32", NA, NA, "81.5", NA, "69.6", NA, NA, NA, "68", NA,
"91", NA), EXDOSEA3 = c(NA, NA, "1600", NA, "4302", NA, NA,
"4185", NA, NA, "3900", NA, NA, "3921", NA, NA, "3300", NA,
NA, "4080", NA, NA, NA, "3183", NA, "3300", NA, NA, NA, "1514"
)), row.names = c(NA, -30L), class = c("tbl_df", "tbl", "data.frame"
))
right now I have my codes as:
df %>%
group_by(ID) %>%
summarise(across(EXSTDTC1:EXDOSEA3, na.omit))
But it seems remove the 05-102 as it did not have value on EXSTDTC1. I would like to see how we can address this. Is it possible to keep across still?
Many thanks.
We could use an if/else condition to address those cases where there is only NA
library(dplyr)
df %>%
group_by(ID) %>%
summarise(across(EXSTDTC1:EXDOSEA3,
~ if(all(is.na(.))) NA else .[complete.cases(.)]), .groups = 'drop')
-output
# A tibble: 10 x 7
# ID EXSTDTC1 EXSTDTC6 EXSTDTC3 EXDOSEA1 EXDOSEA6 EXDOSEA3
# <chr> <date> <date> <date> <chr> <chr> <chr>
# 1 01-103 2016-08-09 2016-08-18 2016-08-14 64.5+129 86 4302
# 2 02-106 2016-08-25 2016-09-03 2016-08-30 60+117.5 39.2 3921
# 3 02-109 2016-12-20 2016-12-29 2016-12-25 47.7+95.4 69.6 3183
# 4 05-102 NA 2016-03-25 2016-03-21 73.8+147.6 100 1600
# 5 05-105 2016-08-19 2016-08-28 2016-08-24 114+57 76 3900
# 6 05-107 2016-10-18 2016-10-26 2016-10-23 48.6+97.2 32 3300
# 7 05-111 2017-03-22 2017-03-31 2017-03-27 51.6+103.2 68 3300
# 8 07-115 2017-06-16 2017-06-25 2017-06-21 68+136 91 1514
# 9 08-104 2016-08-11 2016-08-19 2016-08-16 62.7+125.4 83.5 4185
#10 08-108 2016-11-03 2016-11-11 2016-11-08 61.2+122.4 81.5 4080
I am trying to figure out how to reshape a dataset of the names of political parties from wide to long using dplyr and pivot_longer.
For each Party_ID, there is a number of constant columns attached (Party_Name_Short, Party_Name, Country, Party_in_orig_title) and a number of time changing factors as well: election, Date, Rename, Reason, Party_Title, alliance, member_parties, split, parent_party, merger, child_party, successor, predecessor. The time changing factors were recorded up to 11 times for each party, as reflected by the index in the colname.
In order to provide a sample I selected the first three time changing columns for each party and a sample of 5 random rows:
structure(list(Party_Name_Short = c("LZJ-PS", "ZiZi", "MNR",
"MDP", "E200"), Party_Name = c("Lista Zorana Jankovica – Pozitivna Slovenija",
"Živi zid", "Mouvement national républicain", "Movimento Democrático Português",
"Erakond Eesti 200"), Country = c("SVN", "HRV", "FRA", "PRT",
"EST"), Party_ID = c(1987, 2612, 1263, 1281, 2720), Party_in_orig_title = c(0,
0, 0, 0, 0), Date1 = c(2011, NA, 1999, 1987, NA), Rename1 = c("Lista Zorana Jankovica – Pozitivna Slovenija",
NA, "Mouvement national républicain", "ID", NA), Reason1 = c("foundation",
NA, "split from FN", "split", NA), Party_Title1 = c(0, NA, 0,
0, NA), alliance1 = c(0, NA, 0, 0, NA), member_parties1 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
split1 = c(0, NA, 1, 1, NA), parent_party1 = c(NA, NA, "FN",
"MDP", NA), merger1 = c(0, NA, 0, 0, NA), child_party1 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), successor1 = c(0, NA, 0, 0, NA), predecessor1 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), Date2 = c(2012, NA, NA, NA, NA), Rename2 = c("Pozitivna Slovenija",
NA, NA, NA, NA), Reason2 = c("renamed", NA, NA, NA, NA),
Party_Title2 = c(0, NA, NA, NA, NA), alliance2 = c(0, NA,
NA, NA, NA), member_parties2 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), split2 = c(0,
NA, NA, NA, NA), parent_party2 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), merger2 = c(0,
NA, NA, NA, NA), child_party2 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), successor2 = c(0,
NA, NA, NA, NA), predecessor2 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), Date3 = c(2014,
NA, NA, NA, NA), Rename3 = c("ZaAB", NA, NA, NA, NA), Reason3 = c("split",
NA, NA, NA, NA), Party_Title3 = c(0, NA, NA, NA, NA), alliance3 = c(0,
NA, NA, NA, NA), member_parties3 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), split3 = c(1,
NA, NA, NA, NA), parent_party3 = c("LZJ-PS", NA, NA, NA,
NA), merger3 = c(0, NA, NA, NA, NA), child_party3 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), successor3 = c(0, NA, NA, NA, NA), predecessor3 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), election1 = structure(c(15309, 16740, 11839, 6390, 17956
), class = "Date"), election2 = structure(c(16252, NA, NA,
NA, NA), class = "Date"), election3 = structure(c(16344,
NA, NA, NA, NA), class = "Date")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
I would like the data to follow a "long" structure where each party_id and the constant factors are repeated 11 times and there are single columns for the time changing factors. Following the top-rated answer formulated here I tried different variations of the following command:
pivot_longer(cols = starts_with(c("election", "Date", "Rename", "Reason", "Party_Title",
"alliance", "member_parties", "split", "parent_party",
"merger", "child_party", "successor", "predecessor")),
names_to = c(".value", "election", "Date", "Rename", "Reason", "Party_Title",
"alliance", "member_parties", "split", "parent_party",
"merger", "child_party", "successor", "predecessor"), names_sep = "_") %>%
select(-matches("election[1-9]"), -matches("Date[1-9]"), -matches("Rename[1-9]"),
-matches("Reason[1-9]"), -matches("alliance[1-9]"), -matches("member_parties[1-9]"),
-matches("split[1-9]"), -matches("parent_party[1-9]"), -matches("merger[1-9]"),
-matches("child_party[1-9]"), -matches("successor[1-9]"), -matches("predecessor[1-9]"),
-matches("Party_Title[1-9]"), -matches("election1[0-2]"), -matches("Date1[0-2]"), -matches("Rename1[0-2]"),
-matches("Reason1[0-2]"), -matches("alliance1[0-2]"), -matches("member_parties1[0-2]"),
-matches("split1[0-2]"), -matches("parent_party1[0-2]"), -matches("merger1[0-2]"),
-matches("child_party1[0-2]"), -matches("successor1[0-2]"), -matches("predecessor1[0-2]"),
-matches("Party_Title1[0-2]"))
However, for some reason, I get a lot of missing values and do not achieve the shape of the data I would like to have. I'd appreciate any hint if you have an idea of how to do this. Thanks!
Update:
I would like the final output to look something like:
structure(list(Party_Name_Short = c("LZJ-PS", "ZiZi", "MNR",
"MDP", "E200", "LZJ-PS", "ZiZi", "MNR", "MDP", "E200", "LZJ-PS",
"ZiZi", "MNR", "MDP", "E200"), Party_Name = c("Lista Zorana Jankovica – Pozitivna Slovenija",
"Živi zid", "Mouvement national républicain", "Movimento Democrático Português",
"Erakond Eesti 200", "Lista Zorana Jankovica – Pozitivna Slovenija",
"Živi zid", "Mouvement national républicain", "Movimento Democrático Português",
"Erakond Eesti 200", "Lista Zorana Jankovica – Pozitivna Slovenija",
"Živi zid", "Mouvement national républicain", "Movimento Democrático Português",
"Erakond Eesti 200"), Country = c("SVN", "HRV", "FRA", "PRT",
"EST", "SVN", "HRV", "FRA", "PRT", "EST", "SVN", "HRV", "FRA",
"PRT", "EST"), Party_ID = c(1987, 2612, 1263, 1281, 2720, 1987,
2612, 1263, 1281, 2720, 1987, 2612, 1263, 1281, 2720), Party_in_orig_title = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), time = c(1, 1, 1,
1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3), Date = c(2011, NA, 1999,
1987, NA, 2012, NA, NA, NA, NA, 2014, NA, NA, NA, NA), Rename = c("Lista Zorana Jankovica – Pozitivna Slovenija",
NA, "Mouvement national républicain", "ID", NA, "Pozitivna Slovenija",
NA, NA, NA, NA, "ZaAB", NA, NA, NA, NA), Reason = c("foundation",
NA, "split from FN", "split", NA, "renamed", NA, NA, NA, NA,
"split", NA, NA, NA, NA), Party_Title = c(0, NA, 0, 0, NA, 0,
NA, NA, NA, NA, 0, NA, NA, NA, NA), alliance = c(0, NA, 0, 0,
NA, 0, NA, NA, NA, NA, 0, NA, NA, NA, NA), member_parties = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), split = c(0,
NA, 1, 1, NA, 0, NA, NA, NA, NA, 1, NA, NA, NA, NA), parent_party = c(NA,
NA, "FN", "MDP", NA, NA, NA, NA, NA, NA, "LZJ-PS", NA, NA, NA,
NA), merger = c(0, NA, 0, 0, NA, 0, NA, NA, NA, NA, 0, NA, NA,
NA, NA), child_party = c(NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA), successor = c(0, NA, 0, 0, NA, 0, NA,
NA, NA, NA, 0, NA, NA, NA, NA), predecessor = c(NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), election = structure(c(1322697600,
1446336000, 1022889600, 552096000, 1551398400, 1404172800, NA,
NA, NA, NA, 1412121600, NA, NA, NA, NA), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
Notice: the newly added time column and notice that this is only for example purposes, with three time changing factors, whereas in fact there are 11 in the data.
Using pivot_longer with names_sep to split between a non-digit and a digit at the end of the string
library(tidyr)
library(dplyr)
df1 %>%
pivot_longer(cols = matches('\\d+$'), names_to = c(".value", 'time'),
names_sep="(?<=\\D)(?=\\d+$)") %>%
arrange(time)
# A tibble: 15 x 19
# Party_Name_Short Party_Name Country Party_ID Party_in_orig_t… time Date Rename Reason Party_Title alliance member_parties split
# <chr> <chr> <chr> <dbl> <dbl> <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl>
# 1 LZJ-PS Lista Zor… SVN 1987 0 1 2011 Lista… found… 0 0 <NA> 0
# 2 ZiZi Živi zid HRV 2612 0 1 NA <NA> <NA> NA NA <NA> NA
# 3 MNR Mouvement… FRA 1263 0 1 1999 Mouve… split… 0 0 <NA> 1
# 4 MDP Movimento… PRT 1281 0 1 1987 ID split 0 0 <NA> 1
# 5 E200 Erakond E… EST 2720 0 1 NA <NA> <NA> NA NA <NA> NA
# 6 LZJ-PS Lista Zor… SVN 1987 0 2 2012 Pozit… renam… 0 0 <NA> 0
# 7 ZiZi Živi zid HRV 2612 0 2 NA <NA> <NA> NA NA <NA> NA
# 8 MNR Mouvement… FRA 1263 0 2 NA <NA> <NA> NA NA <NA> NA
# 9 MDP Movimento… PRT 1281 0 2 NA <NA> <NA> NA NA <NA> NA
#10 E200 Erakond E… EST 2720 0 2 NA <NA> <NA> NA NA <NA> NA
#11 LZJ-PS Lista Zor… SVN 1987 0 3 2014 ZaAB split 0 0 <NA> 1
#12 ZiZi Živi zid HRV 2612 0 3 NA <NA> <NA> NA NA <NA> NA
#13 MNR Mouvement… FRA 1263 0 3 NA <NA> <NA> NA NA <NA> NA
#14 MDP Movimento… PRT 1281 0 3 NA <NA> <NA> NA NA <NA> NA
#15 E200 Erakond E… EST 2720 0 3 NA <NA> <NA> NA NA <NA> NA
# … with 6 more variables: parent_party <chr>, merger <dbl>, child_party <chr>, successor <dbl>, predecessor <chr>, election <date>
I am working on a project in R which is fairly code heavy at least compared to my previous R projects. The code is using multiple ifelse statements on previous columns data then creating a new column with the results. As the data I am using is a 5 minute timeframe, therefore I have to write a new line of code for every 5 minute slice of time. The data I have is from 09:30 to 16:00 so that is a lot of lines of code, around 75 by my calculations. Example of my data;
Date Open High Low Close doy
1 2015-09-21 09:30:00 164.6700 164.7100 164.3700 164.5300 264
2 2015-09-21 09:35:00 164.5300 164.9000 164.5300 164.6400 264
3 2015-09-21 09:40:00 164.6600 164.8900 164.6000 164.8900 264
4 2015-09-21 09:45:00 164.9100 165.0900 164.9100 164.9736 264
5 2015-09-21 09:50:00 164.9399 165.0980 164.8200 164.8200 264
This data is then filtered onto a table like this;
data <- structure(list(doy = c(264, 265, 266, 267, 268, 271, 272, 11,12, 13), Date = structure(c(1442824200, 1442910600, 1442997000,1443083400, 1443169800, 1443429000, 1443515400, 1452504600, 1452591000,1452677400), class = c("POSIXct", "POSIXt"), tzone = ""), Or_High = c(164.71,162.96, 163.38, 161.37, 163.91, 162.06, 160.22, 164.5, 165.23,165.84), OR_Low = c(164.37, 162.62, 162.98, 161.06, 163.57, 161.66,159.7, 164.06, 164.84, 165.4), HOD = c(165.56, 163.36, 163.38,162.24, 164.43, 162.06, 160.96, 164.5, 165.78, 165.84), LOD = c(165.22,163.1, 162.98, 161.95, 164.24, 161.66, 160.75, 164.06, 165.56,165.4), Close = c(164.92, 163.02, 162.58, 161.85, 162.94, 159.84,160.19, 163.83, 165.02, 161.38), Range = c(0.340000000000003,0.260000000000019, 0.400000000000006, 0.29000000000002, 0.189999999999998,0.400000000000006, 0.210000000000008, 0.439999999999998, 0.219999999999999,0.439999999999998), `A-val` = c(NA, NA, NA, NA, NA, NA, NA, 0.0673439999999994,0.0659639999999996, 0.0729499999999996), `A-up` = c(NA, NA, NA,NA, NA, NA, NA, 164.567344, 165.295964, 165.91295), `A-down` = c(NA,NA, NA, NA, NA, NA, NA, 163.992656, 164.774036, 165.32705), `09:35` = structure(c(NA,NA, NA, NA, NA, NA, NA, 0, 0, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `09:40` = structure(c(NA, NA, NA, NA, NA,NA, NA, -1, 1, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL,"Low")), `09:45` = structure(c(NA, NA, NA, NA, NA, NA, NA,0, 1, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")),`09:50` = structure(c(NA, NA, NA, NA, NA, NA, NA, -1, 1,0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `09:55` = structure(c(NA,NA, NA, NA, NA, NA, NA, -1, 0, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:00` = structure(c(NA, NA, NA, NA,NA, NA, NA, -1, 0, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:05` = structure(c(NA, NA, NA, NA,NA, NA, NA, -1, 0, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:10` = structure(c(NA, NA, NA, NA,NA, NA, NA, -1, 0, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:15` = structure(c(NA, NA, NA, NA,NA, NA, NA, -2, 0, -1), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:20` = structure(c(NA, NA, NA, NA,NA, NA, NA, 0, 0, -1), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:25` = structure(c(NA, NA, NA, NA,NA, NA, NA, -2, -1, -1), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:30` = structure(c(NA, NA, NA, NA,NA, NA, NA, 0, 0, -1), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:35` = structure(c(NA, NA, NA, NA,NA, NA, NA, 0, 0, -1), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:40` = structure(c(NA, NA, NA, NA,NA, NA, NA, 0, -1, -2), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:45` = structure(c(NA, NA, NA, NA,NA, NA, NA, 0, -1, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:50` = structure(c(NA, NA, NA, NA,NA, NA, NA, -1, -1, -2), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low")), `10:55` = structure(c(NA, NA, NA, NA,NA, NA, NA, -1, -1, 0), .Dim = c(10L, 1L), .Dimnames = list(NULL, "Low"))), .Names = c("doy", "Date", "Or_High","OR_Low", "HOD", "LOD", "Close", "Range", "A-val", "A-up", "A-down","09:35", "09:40", "09:45", "09:50", "09:55", "10:00", "10:05","10:10", "10:15", "10:20", "10:25", "10:30", "10:35", "10:40","10:45", "10:50", "10:55"), row.names = c(1L, 2L, 3L, 4L, 5L,6L, 7L, 78L, 79L, 80L), class = "data.frame")
This is what the lines of code looks like;
data[,14] <- ifelse(df %>% filter(hour(Date) == 09 & minute(Date) == 45) %>% select(Low) > data[,10], 1, ifelse(df %>% filter(hour(Date) == 09 & minute(Date) == 45) %>% select(High) < data[,11], -1, 0))
Then the next line of code would look like;
data[,15] <- ifelse(df %>% filter(hour(Date) == 09 & minute(Date) == 50) %>% select(Low) > data[,10], 1, ifelse(df %>% filter(hour(Date) == 09 & minute(Date) == 50) %>% select(High) < data[,11], -1, 0))
And the next like this etc;
data[,16] <- ifelse(df %>% filter(hour(Date) == 09 & minute(Date) == 55) %>% select(Low) > data[,10], 1, ifelse(df %>% filter(hour(Date) == 09 & minute(Date) == 55) %>% select(High) < data[,11], -1, 0))
As you can see with each new line of code only certain parts of the code are changed, such as the hours, minutes and column references for summing. Perhaps the below example will make it clearer.
Example;
colnames(data)[14] <- "09:45"
colnames(data)[15] <- "09:50"
colnames(data)[16] <- "09:55"
colnames(data)[17] <- "10:00"
colnames(data)[18] <- "10:05"
In this code would there be anyway to change the [#col ref#] and times without individually changing each line of code by hand? I realise that copy and paste can be used with notepad but that still means having write the individual changes. My main concern is not about the time taken to write this but moreover the risk of errors from human input.
If anyone has any tips or tricks as to how this can be done, or another way of achieving the same without using multiple if statements on the structure of my existing code I would be most grateful for your help. This question is related to previous question I posted here and may add clarity for what I am trying to achieve.
Thanks.
As vanao veneri mentioned it is better to use a text editor for writing bulk code quickly.
I found that Sublime 3 with Text Pastry add-on did exactly what I needed using the insert nuns command.
Thanks for the help.