How to remove some variables in R with names that are dates - r

I have a dataset in R where some of the variable names are dates, see a simplified example of the input data below (in Excel):
What I want to do with this data is to remove some of the columns with names that are dates that are older than or equal to a certain date, e.g. 2019-01-31. See a simplified example of the desired output data below (in Excel):
Now, I am able to achieve this by transposing the data, filtering out rows with a date lower than or equal to 31 January 2019 and finally transposing the data back. However I am wondering whether there is a different way to do this using just the column names without pivoting back and forth?
# Example data to copy and paste into R for easy reproduction of problem:
df <- data.frame (id = c("apples", "pears", "grapes", "tomatoes", "carrots", "cucumber", "rabbit", "cat", "dog"),
type = c("fruit", "fruit", "fruit", "veggies", "veggies", "veggies", "pets", "pets", "pets"),
color = c("red", "green", "purple", "red", "orange", "green", "grey", "black", "brown"),
'2019-04-30' = c(353, 91, 270, 2029, 107, 62, 30, 61, 137),
'2019-03-31' = c(349, 90, 267, 2028, 104, 60, 29, 59, 133),
'2019-02-28' = c(345, 89, 264, 2027, 101, 58, 28, 57, 129),
'2019-01-31' = c(341, 88, 261, 2026, 98, 56, 27, 55, 125),
'2018-12-31' = c(337, 87, 258, 2025, 95, 54, 26, 53, 121),
'2018-11-30' = c(333, 86, 255, 2024, 92, 52, 25, 51, 117),
check.names = FALSE)

We can do this in base R. Your dates are conveniently in YYYY-MM-DD format, which means they will be ordered correctly by the >= and <= operators. We can also use a simple regex to preserve any columns that are not in date format:
df[!grepl('\\d{4}-\\d{2}-\\d{2}', colnames(df)) | colnames(df) >= '2019-02-28']
id type color 2019-04-30 2019-03-31 2019-02-28
1 apples fruit red 353 349 345
2 pears fruit green 91 90 89
3 grapes fruit purple 270 267 264
4 tomatoes veggies red 2029 2028 2027
5 carrots veggies orange 107 104 101
6 cucumber veggies green 62 60 58
7 rabbit pets grey 30 29 28
8 cat pets black 61 59 57
9 dog pets brown 137 133 129

The approach is as follows:
extract the column names
transform to Date if possible and NA if not date like
create boolean vector to filter too old dates and non dates (i.e. NAs in the step before) columns
Sample Data
## sample data frame
m <- matrix(1, 3, 10)
colnames(m) <- c("a", "b", as.character(seq.Date(as.Date("2021-1-1"), length.out = 8, by = "days")))
(d <- as.data.frame(m))
# a b 2021-01-01 2021-01-02 2021-01-03 2021-01-04 2021-01-05 2021-01-06 2021-01-07 2021-01-08
# 1 1 1 1 1 1 1 1 1 1 1
# 2 1 1 1 1 1 1 1 1 1 1
# 3 1 1 1 1 1 1 1 1 1 1
Filter
r <- vapply(names(d), as.Date, numeric(1), optional = TRUE)
d[, is.na(r) | r <= as.Date("2021-1-3")]
# a b 2021-01-01 2021-01-02 2021-01-03
# 1 1 1 1 1 1
# 2 1 1 1 1 1
# 3 1 1 1 1 1
r <- vapply(names(df), as.Date, numeric(1), optional = TRUE)
df[, is.na(r) | r >= as.Date("2019-1-31")]
# id type color 2019-04-30 2019-03-31 2019-02-28 2019-01-31
# 1 apples fruit red 353 349 345 341
# 2 pears fruit green 91 90 89 88
# 3 grapes fruit purple 270 267 264 261
# 4 tomatoes veggies red 2029 2028 2027 2026
# 5 carrots veggies orange 107 104 101 98
# 6 cucumber veggies green 62 60 58 56
# 7 rabbit pets grey 30 29 28 27
# 8 cat pets black 61 59 57 55
# 9 dog pets brown 137 133 129 125

Description
One can re-shape the data to the long format and filter based on the date column.
Data
Same data as provided in the example
df <- data.frame (id = c("apples", "pears", "grapes", "tomatoes", "carrots", "cucumber", "rabbit", "cat", "dog"),
type = c("fruit", "fruit", "fruit", "veggies", "veggies", "veggies", "pets", "pets", "pets"),
color = c("red", "green", "purple", "red", "orange", "green", "grey", "black", "brown"),
'2019-04-30' = c(353, 91, 270, 2029, 107, 62, 30, 61, 137),
'2019-03-31' = c(349, 90, 267, 2028, 104, 60, 29, 59, 133),
'2019-02-28' = c(345, 89, 264, 2027, 101, 58, 28, 57, 129),
'2019-01-31' = c(341, 88, 261, 2026, 98, 56, 27, 55, 125),
'2018-12-31' = c(337, 87, 258, 2025, 95, 54, 26, 53, 121),
'2018-11-30' = c(333, 86, 255, 2024, 92, 52, 25, 51, 117),
check.names = FALSE)
Solution
library(dplyr)
library(tidyr)
df %>%
tidyr::pivot_longer(cols = !c(id, type, color), names_to = 'date', values_to = 'value') %>%
dplyr::mutate(date = as.Date(date, format = '%Y-%m-%d')) %>%
dplyr::filter( date >= as.Date('2019-01-31')) %>%
tidyr::pivot_wider(names_from = 'date', values_from = 'value')
Desired output
id type color `2019-04-30` `2019-03-31` `2019-02-28` `2019-01-31`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 apples fruit red 353 349 345 341
2 pears fruit green 91 90 89 88
3 grapes fruit purple 270 267 264 261
4 tomatoes veggies red 2029 2028 2027 2026
5 carrots veggies orange 107 104 101 98
6 cucumber veggies green 62 60 58 56
7 rabbit pets grey 30 29 28 27
8 cat pets black 61 59 57 55
9 dog pets brown 137 133 129 125

Related

Find the Maximum Value with respect to another within two data frames (VLOOKUP which returns Max Value) in R

I am trying to find the do a function which is similar to a vlookup in excel but which returns the maximum value and the other values in the same row.
The data frame looks like this:
The data frame which I am dealing with are given below:
dput(Book3)
structure(list(Item = c("ABA", "ABB", "ABC", "ABD", "ABE", "ABF"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-6L))
dput(Book4)
structure(list(Item = c("ABA", "ABB", "ABC", "ABD", "ABE", "ABF",
"ABA", "ABB", "ABC", "ABD", "ABE", "ABF", "ABA", "ABB", "ABC",
"ABD", "ABE", "ABF"), Max1 = c(12, 68, 27, 17, 74, 76, 78, 93,
94, 98, 46, 90, 5, 58, 67, 64, 34, 97), Additional1 = c(40, 66,
100, 33, 66, 19, 8, 70, 21, 93, 48, 34, 44, 89, 74, 20, 0, 47
), Additional2 = c(39, 31, 85, 58, 0, 2, 57, 28, 31, 32, 15,
22, 93, 41, 57, 81, 95, 46)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L))
The Expected output for this is given below:
You are looking for slice_max:
library(dplyr)
Book4 %>%
group_by(Item) %>%
slice_max(Max1)
# Item Max1 Additional1 Additional2
# 1 ABA 78 8 57
# 2 ABB 93 70 28
# 3 ABC 94 21 31
# 4 ABD 98 93 32
# 5 ABE 74 66 0
# 6 ABF 97 47 46
Using base R
subset(Book4, Max1 == ave(Max1, Item, FUN = max))
-output
# A tibble: 6 × 4
Item Max1 Additional1 Additional2
<chr> <dbl> <dbl> <dbl>
1 ABE 74 66 0
2 ABA 78 8 57
3 ABB 93 70 28
4 ABC 94 21 31
5 ABD 98 93 32
6 ABF 97 47 46
An alternative base solution that is more resilient to floating-point precision problems (c.f., Why are these numbers not equal?, https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f). It also allows two behavior options if there are duplicate max-values:
if you want all of them, use ties.method = "min";
if you want the first (or just one) of them, then ties.method = "first".
Book4[ave(Book4$Max1, Book4$Item, FUN = function(z) rank(-z, ties.method = "first")) == 1,]
# # A tibble: 6 x 4
# Item Max1 Additional1 Additional2
# <chr> <dbl> <dbl> <dbl>
# 1 ABE 74 66 0
# 2 ABA 78 8 57
# 3 ABB 93 70 28
# 4 ABC 94 21 31
# 5 ABD 98 93 32
# 6 ABF 97 47 46
Using R base aggregate + max + merge
> merge(Book4, aggregate(Max1~Item, data = Book4, max), by = c("Item", "Max1"))
Item Max1 Additional1 Additional2
1 ABA 78 8 57
2 ABB 93 70 28
3 ABC 94 21 31
4 ABD 98 93 32
5 ABE 74 66 0
6 ABF 97 47 46

sum cells across similar data frames within a list in R, by data frames names

I have a list of data frames that look like this:
df1_BC <- data.frame(name=c("name1", "name2", "name3"),
year1=c(23, 45, 54),
year2=c(54, 23, 79),
year3=c(67, 29, 76))
df2_BC <- data.frame(name=c("name1", "name2", "name3"),
year1=c(93, 32, 56),
year2=c(82, 96, 72),
year3=c(54, 76, 19))
df3_BC <- data.frame(name=c("name1", "name2", "name3"),
year1=c(83, 41, 92),
year2=c(76, 73, 65),
year3=c(63, 62, 95))
df1_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
year1=c(23, 35, 54, 41),
year2=c(84, 23, 79, 69),
year3=c(97, 29, 76, 0))
df2_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
year1=c(93, 32, 56, 64),
year2=c(82, 96, 53, 0),
year3=c(54, 76, 19, 3))
df3_BA <- data.frame(name=c("name1", "name2", "name3", "name4"),
year1=c(83, 41, 92, 5),
year2=c(76, 3, 65, 82),
year3=c(3, 62, 95, 6))
list_dfs <- list(df1_BC, df2_BC, df3_BC, df1_BA, df2_BA, df3_BA)
As you can see, dataframes with the same sufix ('BA' or 'BC') have the same columns and number of rows.
What I want to do is to sum across the cells of the two groups of dataframes (the ones with the 'AB' suffix and the ones with the 'BC' suffix).
If I do it on the dataframes alone, without listing them, I get the expected result:
result_BA <- df1_BA[,-1] + df2_BA[,-1] + df3_BA[,-1]
result_BC <- df1_BC[,-1] + df2_BC[,-1] + df3_BC[,-1]
print(result_BA)
year1 year2 year3
1 199 242 154
2 108 122 167
3 202 197 190
4 110 151 9
As you can also see, is necessary to keep the name column away to do the sum. EDIT: Then I would like to put it back. Something like this:
result_BA <- cbind(df1_BA[,-1], result_BA)
To have column of names added back to each corresponding dataframe in the list.
This is a simplified example from much larger lists, so doing it as a list and matching the dataframes to add up by suffix really simplifies the task.
Thanks!
The list didn't have any names. We need to construct with names one option is to create a named list, split the list by the substring of the names, and use Reduce to + the inner list elements
list_dfs <- list(df1_BC = df1_BC, df2_BC = df2_BC, df3_BC = df3_BC,
df1_BA = df1_BA, df2_BA = df2_BA, df3_BA = df3_BA)
lapply(split(list_dfs, sub(".*_", "", names(list_dfs))),
\(x) Reduce(`+`, lapply(x, `[`, -1)))
-output
$BA
year1 year2 year3
1 199 242 154
2 108 122 167
3 202 197 190
4 110 151 9
$BC
year1 year2 year3
1 199 212 184
2 118 192 167
3 202 216 190
Or this may be done with tidyverse using a group by approach
library(dplyr)
library(tidyr)
library(data.table)
list_dfs <- lst(df1_BC, df2_BC, df3_BC, df1_BA, df2_BA, df3_BA)
bind_rows(list_dfs, .id = 'name') %>%
separate(name, into = c("name1", "name2")) %>%
mutate(grp = rowid(name1, name2)) %>%
group_by(name2, grp) %>%
summarise(across(where(is.numeric), sum), .groups = "drop") %>%
select(-grp)
-output
# A tibble: 7 × 4
name2 year1 year2 year3
<chr> <dbl> <dbl> <dbl>
1 BA 199 242 154
2 BA 108 122 167
3 BA 202 197 190
4 BA 110 151 9
5 BC 199 212 184
6 BC 118 192 167
7 BC 202 216 190

Counting instances of a string in one dataframe, then attaching the result to a row in another dataframe with a matching device name?

So here is part of an example dataset I'm working with:
`D1` `D2` 'D3' `D4` `D5` `D6` `D7`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 921 917 935 457 462 451 465
2 898 E9 914 446 452 440 455
3 817 806 814 407 412 398 411
4 644 632 624 321 327 314 324
5 E9 399 385 207 213 200 206
6 136 127 127 69 72 66 66
7 223 233 209 117 106 117 118
8 475 E9 443 239 234 238 246
9 684 685 665 340 341 337 348
10 816 814 828 406 409 400 412
...
This is after I've worked with it a bit, and you can see the first two columns have a couple instances of "E9" in them, which is what I'm looking to count by running this:
df2 <- df %>% select(-c(Time))
devices$Exclusions <- str_count(df2, "E9")
Here is my final result:
Device ID Exclusions
<chr> <int> <int>
1 D4 145287 14
2 D5 145286 16
3 D6 145285 0
4 D7 145284 0
5 D1 145280 0
6 D2 145277 0
7 D3 145278 0
So this leads me to my problem. The devices aren't necessarily in the same order and when it counts the instances of "E9" it is simply attaching them to the other dataframe in the order those devices are in, rather than matching them up with their names. What can I add in order to add that str_count from the D1 column to the D1 row in the other dataframe, rather than just the top row?
Here's a solution in the tidyverse.
Solution
library(tidyverse)
# ...
# Code to generate 'df'.
# ...
df_counts <- df %>%
# Homogenize columns as text.
mutate(across(everything(), as.character)) %>%
# Pivot columns into a 'Device | Code' format.
pivot_longer(everything(), names_to = "Device", values_to = "Code") %>%
# For each device...
group_by(Device) %>%
# ...count how many times "E9" appears among its codes.
summarize(Exclusions = sum(Code == "E9"))
Speculating about the structure of your devices dataset, I can enrich the result with those IDs from your sample output:
# ...
# Code to generate 'devices'.
# ...
devices <- devices %>%
full_join(df_counts, by = "Device", keep = FALSE)
Result
Given a df dataset like your example
df <- structure(
list(
D1 = c("921", "898", "817", "644", "E9", "136", "223", "475", "684", "816"),
D2 = c("917", "E9", "806", "632", "399", "127", "233", "E9", "685", "814"),
D3 = c(935, 914, 814, 624, 385, 127, 209, 443, 665, 828),
D4 = c(457, 446, 407, 321, 207, 69, 117, 239, 340, 406),
D5 = c(462, 452, 412, 327, 213, 72, 106, 234, 341, 409),
D6 = c(451, 440, 398, 314, 200, 66, 117, 238, 337, 400),
D7 = c(465, 455, 411, 324, 206, 66, 118, 246, 348, 412)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -10L)
)
this workflow should yield a result for df_counts like this:
# A tibble: 7 x 2
Device Exclusions
<chr> <int>
1 D1 1
2 D2 2
3 D3 0
4 D4 0
5 D5 0
6 D6 0
7 D7 0
Furthermore, given a devices dataset like your example
devices <- structure(
list(
Device = c("D4", "D5", "D6", "D7", "D1", "D2", "D3"),
ID = c(145287L, 145286L, 145285L, 145284L, 145280L, 145277L, 145278L)
),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -7L)
)
this solution should yield a devices dataset like this:
# A tibble: 7 x 3
Device ID Exclusions
<chr> <int> <int>
1 D4 145287 0
2 D5 145286 0
3 D6 145285 0
4 D7 145284 0
5 D1 145280 1
6 D2 145277 2
7 D3 145278 0

R - Weeks of supply

I am trying to calculate the number of weeks the inventory on hand will last given the sales projections for a dataset with 10s of million of rows. I have listed the expected output in the last column of the data structure given below. I also attached the implementation of this in Excel.
Logic
Weeksofsupply = Number of weeks the current inventory on hand will last.
example - in the attached image (SKU_CD 222, STORE_CD 33), the inventory on hand is 19, the sales values are
WK1 + WK2 = 15, Wk1 + Wk2 + Wk3 = 24, Which is greater than 19,
So we are picking 2, which the count of Weeks the current inventory will last.
Expected output in the last column
Data = structure(list(
SKU_CD = c(111, 111, 111, 111, 111, 111, 111,111, 111, 111, 111, 111, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222, 222),
STORE_CD = c(22, 22, 22, 22, 22, 22, 22,22, 22, 22, 22, 22, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33),
FWK_CD = c(201627, 201628, 201629, 201630, 201631, 201632,201633, 201634, 201635, 201636, 201637, 201638, 201627, 201628, 201629, 201630, 201631, 201632, 201633, 201634, 201635, 201636, 201637, 201638),
SALES = c(5, 2, 2, 2, 1, 3, 2, 2, 3, 2, 3, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 7, 5),
INVENTORY = c(29, 27, 25, 23, 22, 19, 17, 15, 12, 10, 25, 1, 19, 17, 15, 13, 12,9, 7, 5, 2, 0, 25, 18),
WeeksofSupply = c("11", "10", "9", "8", "8", "6", "5", "4", "3", "2", "Inventory More", "Inventory Less", "2", "2", "1", "1", "1", "Inventory Less", "Inventory Less", "Inventory Less", "Inventory Less", "Inventory Less", "Inventory More", "Inventory More")),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -24L),
.Names = c("SKU_CD", "STORE_CD", "FWK_CD", "SALES", "INVENTORY", "WeeksofSupply"))
Current Excel Code: (Here the weeks are shown in columns, but it should be rows like shown in the expected output.)
=IF(A2<SUM(B2:K2),SUMPRODUCT(--(SUBTOTAL(9,OFFSET(B2:K2,,,,COLUMN(B2:K2)-
COLUMN(B2)+1))<=A2))+LOOKUP(0,SUBTOTAL(9,OFFSET(B2:K2,,,,COLUMN(B2:K2)-
COLUMN(B2)+1))-B2:K2-A2,(A2-(SUBTOTAL(9,OFFSET(B2:K2,,,,COLUMN(B2:K2)-
COLUMN(B2)+1))-B2:K2))/B2:K2),IF(A2=SUM(B2:K2),COUNT(B2:K2),"Inventory
exceeds forecast"))
I would appreciate any input to implement this efficiently in R. Many Thanks for your time!
For your revised data in long format, you can do the following...
library(dplyr) #for the grouping functionality
#define a function to calculate weeks Supply from Sales and Inventory
weekSup <- function(sales,inv){
sales <- unlist(sales)
inv <- unlist(inv)
n <- length(sales)
weeksup <- rep(NA,n)
for(i in 1:n){
if(i==n | inv[i]<sales[i]){
weeksup[i] <- ifelse(inv[i]>sales[i],NA,inv[i]/sales[i])
} else {
weeksup[i] <- approxfun(cumsum(sales[i:n]),1:(n-i+1))(inv[i])
}
}
#Your 'inventory more' is coded as -1 (a number) to avoid whole column being forced to a character vector
weeksup <- replace(weeksup,is.na(weeksup),-1)
return(weeksup) #for whole weeks, change this to `return(floor(weeksup))`
}
Data2 <- Data %>% group_by(SKU_CD,STORE_CD) %>% mutate(weekSup=weekSup(SALES,INVENTORY))
head(Data2,20)
SKU_CD STORE_CD FWK_CD SALES INVENTORY WeeksofSupply weekSup
<dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl>
1 111 22 201627 5 29 11 11.3333333
2 111 22 201628 2 27 10 10.8333333
3 111 22 201629 2 25 9 9.8333333
4 111 22 201630 2 23 8 8.8333333
5 111 22 201631 1 22 8 8.0000000
6 111 22 201632 3 19 6 6.6666667
7 111 22 201633 2 17 5 5.8333333
8 111 22 201634 2 15 4 4.8333333
9 111 22 201635 3 12 3 3.6666667
10 111 22 201636 2 10 2 2.8333333
11 111 22 201637 3 25 Inventory More -1.0000000
12 111 22 201638 6 1 Inventory Less 0.1666667
13 222 33 201627 7 19 2 2.4444444
14 222 33 201628 8 17 2 2.0000000
15 222 33 201629 9 15 1 1.6000000
16 222 33 201630 10 13 1 1.2727273
17 222 33 201631 11 12 1 1.0833333
18 222 33 201632 12 9 Inventory Less 0.7500000
19 222 33 201633 13 7 Inventory Less 0.5384615
20 222 33 201634 14 5 Inventory Less 0.3571429
Here is one way to do it, using the linear interpolation method approxfun...
data$WeeksSupply <- sapply(1:nrow(data),function(i)
approxfun(cumsum(as.vector(c(data[i,2:11]))),1:10)(data$Inventory[i]))
data$WeeksSupply <- replace(data$WeeksSupply,is.na(data$WeeksSupply),
"Inventory Exceeds Forecast")
data
# A tibble: 2 x 12
Inventory Wk1 Wk2 Wk3 Wk4 Wk5 Wk6 Wk7 Wk8 Wk9 Wk10 WeeksSupply
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 200 20 15 25 40 35 45 30 50 45 55 6.66666666666667
2 2000 20 15 25 40 35 45 30 50 45 55 Inventory Exceeds Forecast

Merge two rows in a dataframe in R

I am trying to merge rows in my data.frame based on <NA> value.
Here is my data frame.
new <- data.frame (
Location = c(rep("Loc 1", 4), rep("Loc 2", 4)),
Place = c("Powder Springs_Original", "Bridge_Other County", "Airport", "County1", "City 4 - Duplicated", "South", "County2", "Formal place"),
Val1 = c(109, 123, NA, 117, 143, NA, 151, 142),
Val2 = c(102, 115, NA, 45, 135, NA, 144, 125),
Val3 = c(99, 112, NA, 26, 127, NA, 140, 132),
Val4 = c(90, 103, NA, 57, 125, NA, 135, 201))
I am expecting something like,
Location Place Val1 Val2 Val3 Val4
Loc 1 Powder Springs - Original 109 102 99 90
Loc 1 Bridge _ Other County 123 115 112 103
Loc 1 Airport County1 117 45 26 57
Loc 2 City 4 - Duplicated 143 135 127 125
Loc 2 South County2 151 144 140 135
Loc 2 Formal place 142 125 132 201
I want to remove the NA rows and merge data with the next row. Location for these values is same. Can someone please help me here.
Thanks in advance.
First off, you shouldn't be using new as your variable name since it's a built-in R function. Second, you could do something like this:
# Find which rows are NA
na_rows <- which(apply(new, 1, function(x) all("NA" == (x[paste0('Val', 1:4)]))))
# Set correct place names
new$Place <- as.character(new$Place)
new$Place[na_rows + 1] <- paste(new$Place[na_rows], new$Place[na_rows + 1])
# Remove NAs
new <- new[-na_rows, ]
# Location Place Val1 Val2 Val3 Val4
# 1 Loc 1 Powder Springs_Original 109 102 99 90
# 2 Loc 1 Bridge_Other County 123 115 112 103
# 4 Loc 1 Airport County1 117 45 26 57
# 5 Loc 2 City 4 - Duplicated 143 135 127 125
# 7 Loc 2 South County2 151 144 140 135
# 8 Loc 2 Formal place 142 125 132 201
(edited as the initial answer was incomplete)
nu <- data.frame (
Location = c(rep("Loc 1", 4), rep("Loc 2", 4)),
Place = c("Powder Springs_Original", "Bridge_Other County", "Airport", "County1", "City 4 - Duplicated", "South", "County2", "Formal place"),
Val1 = c(109, 123, NA, 117, 143, NA, 151, 142),
Val2 = c(102, 115, NA, 45, 135, NA, 144, 125),
Val3 = c(99, 112, NA, 26, 127, NA, 140, 132),
Val4 = c(90, 103, NA, 57, 125, NA, 135, 201), stringsAsFactors=FALSE)
# notice stringsAsFactors = FALSE
# if there was justice in the world, it should be FALSE by default in R
# in any case, nu$Place should be character rather than factor so in real data
# you may need to do nu$Place <- as.character(nu$Place)
ic <- which(!complete.cases(nu))
nu$Place[ic-1] <- paste(nu$Place[ic-1], nu$Place[ic])
nu <- nu[-ic,]
Does this do what you need?
Thanks for your help and support. After lot of trails, I got the below required output. (As suggested by #Robert Krzyzanowski, I renamed my data.frame to Test).
This is what I did. Please suggest, if anything weird is observed.
> new_DF <- subset(Test, is.na(Test$Val1))
> new_DF
Location Place Val1 Val2 Val3 Val4
3 Loc 1 Airport NA NA NA NA
6 Loc 2 South NA NA NA NA
>
> row.names(new_DF)
[1] "3" "6"
> x.num <- as.numeric(row.names(new_DF))
>
> Test$Place <- as.character(Test$Place)
> Test$Place[x.num + 1] <- paste(Test$Place[x.num], Test$Place[x.num + 1])
> Test <- Test[-x.num, ]
> Test
Location Place Val1 Val2 Val3 Val4
1 Loc 1 Powder Springs_Original 109 102 99 90
2 Loc 1 Bridge_Other County 123 115 112 103
4 Loc 1 Airport County1 117 45 26 57
5 Loc 2 City 4 - Duplicated 143 135 127 125
7 Loc 2 South County2 151 144 140 135
8 Loc 2 Formal place 142 125 132 201
Once again, thank you all for your support and your time for looking into this.

Resources