Make the leading column value NA if condition is met using R

Make the leading column value NA if condition is met using R - r

I got a df such as
structure(list(id = c(15305, 15305, 15305, 6224, 6224), transfer = c(0,
1, 0, 1, 0), hosp = c(2182, 2452, 2846, 1474, 1476), out = c(2183,
NA, 2857, NA, 1486), Insti = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
And I want to insert NA in the leading "hosp" column if the lagging "out" and lagging "Insti" columns are NA AND the "transfer" column == 1
I want the df to look like this
structure(list(id2 = c(15305, 15305, 15305, 6224, 6224), transfer2 = c(0,
1, 0, 1, 0), hosp2 = c(2182, 2452, NA, 1474, NA), out2 = c(2183,
NA, 2857, NA, 1486), Insti2 = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))

You can use the following solution:
library(dplyr)
df %>%
mutate(hosp = case_when(
is.na(lag(out)) & is.na(lag(Insti)) & lag(transfer) == 1 ~ NA_real_,
TRUE ~ hosp
))
id transfer hosp out Insti
1 15305 0 2182 2183 NA
2 15305 1 2452 NA NA
3 15305 0 NA 2857 NA
4 6224 1 1474 NA NA
5 6224 0 NA 1486 NA

To get the "lag" you may remove last value and add NA as first value. Here a base R solution using ifelse.
transform(df,
hosp=ifelse(is.na(c(NA, out[-nrow(df)])) & is.na(c(NA, Insti[-nrow(df)])) &
c(NA, Insti[-nrow(df)]) == 1, NA, hosp))
# id transfer hosp out Insti
# 1 15305 0 NA 2183 NA
# 2 15305 1 2452 NA NA
# 3 15305 0 NA 2857 NA
# 4 6224 1 1474 NA NA
# 5 6224 0 NA 1486 NA

Related

How to set missing some columns and their corresponding columns in data frame in R

I have a longitudinal data with three follow-up. The columns 2,3 and 4
I want to set the value 99 in the columns v_9, v_01, and v_03 to NA, but I want to set their corresponding columns (columns "d_9", "d_01","d_03" and "a_9", "a_01","a_03") as NA as well. As an example for ID 101 as below:
How can I do this for all the individuals and my whole data set in R? thanks in advance for the help.
"id" "v_9" "v_01" "v_03" "d_9" "d_01" "d_03" "a_9" "a_01" "a_03"
101 12 NA 10 2015-03-23 NA 2003-06-19 40.50650 NA 44.1065
structure(list(id = c(101, 102, 103, 104), v_9 = c(12, 99, 16,
25), v_01 = c(99, 12, 16, NA), v_03 = c(10, NA, 99, NA), d_9 = structure(c(16517,
17613, 16769, 10667), class = "Date"), d_01 = structure(c(13291,
NA, 13566, NA), class = "Date"), d_03 = structure(c(12222, NA,
12119, NA), class = "Date"), a_9 = c(40.5065, 40.5065, 30.19713,
51.40862), a_01 = c(42.5065, 41.5112, 32.42847, NA), a_03 = c(44.1065,
NA, 35.46543, NA)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))

Try this function:
fn <- function(df){
for(s in c("_9" , "_01" , "_03")){
i <- which(`[[`(df,paste0("v",s)) == 99)
df[i, paste0("v",s)] <- NA
df[i, paste0("d",s)] <- NA
df[i, paste0("a",s)] <- NA
}
df
}
df <- fn(df)
Output
# A tibble: 4 × 10
id v_9 v_01 v_03 d_9 d_01 d_03 a_9 a_01 a_03
<dbl> <dbl> <dbl> <dbl> <date> <date> <date> <dbl> <dbl> <dbl>
1 101 12 NA 10 2015-03-23 NA 2003-06-19 40.5 NA 44.1
2 102 NA 12 NA NA NA NA NA 41.5 NA
3 103 16 16 NA 2015-11-30 2007-02-22 NA 30.2 32.4 NA
4 104 25 NA NA 1999-03-17 NA NA 51.4 NA NA

Determine range of time where measurements are not NA

I have a dataset with hundreds of thousands of measurements taken from several subjects. However, the measurements are only partially available, i.e., there may be large stretches with NA. I need to establish up front, for which timespan positive data are available for each subject.
Data:
df
timestamp C B A starttime_ms
1 00:00:00.033 NA NA NA 33
2 00:00:00.064 NA NA NA 64
3 00:00:00.066 NA 0.346 NA 66
4 00:00:00.080 47.876 0.346 22.231 80
5 00:00:00.097 47.876 0.346 22.231 97
6 00:00:00.099 47.876 0.346 NA 99
7 00:00:00.114 47.876 0.346 NA 114
8 00:00:00.130 47.876 0.346 NA 130
9 00:00:00.133 NA 0.346 NA 133
10 00:00:00.147 NA 0.346 NA 147
My (humble) solution so far is (i) to pick out the range of timestamp values that are not NA and to select the first and last such timestamp for each subject individually. Here's the code for subject C:
NotNA_C <- df$timestamp[which(!is.na(df$C))]
range_C <- paste(NotNA_C[1], NotNA_C[length(NotNA_C)], sep = " - ")
range_C
[1] "00:00:00.080" "00:00:00.130"
That doesn't look elegant and, what's more, it needs to be repeated for all other subjects. Is there a more efficient way to establish the range of time for which non-NA values are available for all subjects in one go?
EDIT
I've found a base R solution:
sapply(df[,2:4], function(x)
paste(df$timestamp[which(!is.na(x))][1],
df$timestamp[which(!is.na(x))][length(df$timestamp[which(!is.na(x))])], sep = " - "))
C B A
"00:00:00.080 - 00:00:00.130" "00:00:00.066 - 00:00:00.147" "00:00:00.080 - 00:00:00.097"
but would be interested in other solutions as well!
Reproducible data:
df <- structure(list(timestamp = c("00:00:00.033", "00:00:00.064",
"00:00:00.066", "00:00:00.080", "00:00:00.097", "00:00:00.099",
"00:00:00.114", "00:00:00.130", "00:00:00.133", "00:00:00.147"
), C = c(NA, NA, NA, 47.876, 47.876, 47.876, 47.876, 47.876,
NA, NA), B = c(NA, NA, 0.346, 0.346, 0.346, 0.346,
0.346, 0.346, 0.346, 0.346), A = c(NA, NA, NA, 22.231, 22.231, NA, NA, NA, NA,
NA), starttime_ms = c(33, 64, 66, 80, 97, 99, 114, 130, 133,
147)), row.names = c(NA, 10L), class = "data.frame")

dplyr solution
library(tidyverse)
df <- structure(list(timestamp = c("00:00:00.033", "00:00:00.064",
"00:00:00.066", "00:00:00.080", "00:00:00.097", "00:00:00.099",
"00:00:00.114", "00:00:00.130", "00:00:00.133", "00:00:00.147"
), C = c(NA, NA, NA, 47.876, 47.876, 47.876, 47.876, 47.876,
NA, NA), B = c(NA, NA, 0.346, 0.346, 0.346, 0.346,
0.346, 0.346, 0.346, 0.346), A = c(NA, NA, NA, 22.231, 22.231, NA, NA, NA, NA,
NA), starttime_ms = c(33, 64, 66, 80, 97, 99, 114, 130, 133,
147)), row.names = c(NA, 10L), class = "data.frame")
df %>%
pivot_longer(-c(timestamp, starttime_ms)) %>%
group_by(name) %>%
drop_na() %>%
summarise(min = timestamp %>% min(),
max = timestamp %>% max())
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 3 x 3
#> name min max
#> <chr> <chr> <chr>
#> 1 A 00:00:00.080 00:00:00.097
#> 2 B 00:00:00.066 00:00:00.147
#> 3 C 00:00:00.080 00:00:00.130
Created on 2021-02-15 by the reprex package (v0.3.0)

You could look at the cumsum of differences where there's no NA, coerce them to logical and subset first and last element.
lapply(data.frame(apply(rbind(0, diff(!sapply(df[c("C", "B", "A")], is.na))), 2, cumsum)),
function(x) c(df$timestamp[as.logical(x)][1], rev(df$timestamp[as.logical(x)])[1]))
# $C
# [1] "00:00:00.080" "00:00:00.130"
#
# $B
# [1] "00:00:00.066" "00:00:00.147"
#
# $A
# [1] "00:00:00.080" "00:00:00.097"

Iterate through columns' suffixes in a for loop. R

I am trying to modify my dataset with a for loop. I want to modify certain cells of some columns depending on the value of its "paired" column. My dataset could be:
data1989 <- data.frame("date" = c("1987-01-01", "1987-01-03", "1987-01-19"),
"NDVI_1" = c(NA, 0.589, 0.120),
"NDVI_3" = c(NA, 0.447, NA),
"NDVI_4" = c(NA, NA, NA),
"pixelQA_1" = c(NA, 66.897,90.599),
"pixelQA_3" = c(NA, 66.097,NA),
"pixelQA_4" = c(NA, NA, NA),
stringsAsFactors = FALSE)
> data1989
date NDVI_1 NDVI_3 NDVI_4 pixelQA_1 pixelQA_3 pixelQA_4
1 1987-01-01 NA NA NA NA NA NA
2 1987-01-03 0.589 0.447 NA 66.897 66.097 NA
3 1987-01-19 0.120 NA NA 90.599 NA NA
Columns are "paired" by the suffix of each column, so NDVI_1 is paired with pixelQA_1, and so on. I want to modify the values under NDVI's columns depending on it's "paired" values on pixelQA column, following:
if PixelQa is NA -> then NDVI should be also NA.
if Pixel Qa is 66±0.5 OR 130±0.5 -> then NDVI remains the same value.
if Pixel Qa is different to 66±0.5 OR 130±0.5 -> then NDVI value is set to NA (this is bad quality data which needs to be ignored).
Applying these very simple rules my data should look like:
data1989clean <- data.frame("date" = c("1987-01-01", "1987-01-03", "1987-01-19"),
"NDVI_1" = c(NA, NA, NA),
"NDVI_3" = c(NA, 0.447, NA),
"NDVI_4" = c(NA, NA, NA),
"pixelQA_1" = c(NA, 66.897,90.599),
"pixelQA_3" = c(NA, 66.097,NA),
"pixelQA_4" = c(NA, NA, NA),
stringsAsFactors = FALSE)
> data1989clean
date NDVI_1 NDVI_3 NDVI_4 pixelQA_1 pixelQA_3 pixelQA_4
1 1987-01-01 NA NA NA NA NA NA
2 1987-01-03 NA 0.447 NA 66.897 66.097 NA
3 1987-01-19 NA NA NA 90.599 NA NA
To reach my goal I am trying the following for loop:
for(i in 1:4){
data1989$NDVI_[i] <- ifelse(data1989$pixelQA_[i] < 66.5 & data1989$pixelQA_[i] > 65.5 |
data1989$pixelQA_[i] < 130.5 & data1989$pixelQA_[i] > 129.5,
data1989$NDVI_[i], NA)
}
But so far it is not working, as the dataset output looks exactly the same as the original one. Any suggestion will be welcomed.

As suggested by #George Savva, you can achieve this by pivoting longer, correcting the data, and pivoting back wider. So, using the tidyverse, that gives:
library(tidyverse)
newdd1 <-
#
data1989 %>%
#
pivot_longer(cols = -date,
names_to = c(".value", "set"),
names_sep = "_") %>%
#
mutate(NDVI = case_when(is.na(pixelQA) ~ NA_real_,
between(pixelQA, 65.5, 66.5) ~ NDVI,
between(pixelQA, 129.5, 130.5) ~ NDVI,
TRUE ~ NA_real_)) %>%
#
pivot_wider(names_from = set,
values_from = c(NDVI, pixelQA))

Why do I need to call an object returned with invisible() twice for it to print?

From the documentation I read that invisible() returns a (temporarily) invisible copy of an object. Now when I use invisible I always need to call the object twice before it is actually printed.
I use data.table and would like my function to return an invisible copy of the object given that a certain condition is met (i.e premature abortion of function).
I've noticed that this behaviour of "needing double/two calls" also applies if the invisibly returned object is used inside another function, making its use seemingly unusable. What causes this behaviour? Am I doing something wrong? How do I get the function to return invisibly, and printed on the first call?
Please see sample code below:
example <- function(DT) {
if (!(1 %in% DT$RSI.verticalBottom) | !(1 %in% DT$RSI.top)) {
# abort if there is no buy or sell signal
DT[, `:=`(pos = NA,
return = NA
)]
return(invisible(DT))
}
> example(sample.data)
> sample.data
> sample.data
conm tic datadate cshoq gind year month yearmon fdateq pdateq fyr fyearq fqtr
1: NS GROUP INC NSS.1 2000-01-31 NA 101010 2000 1 2000_1 NA <NA> NA NA NA
2: NS GROUP INC NSS.1 2000-02-29 NA 101010 2000 2 2000_2 NA <NA> NA NA NA
3: NS GROUP INC NSS.1 2000-03-31 21.533 101010 2000 3 2000_3 NA <NA> 9 2000 2
4: NS GROUP INC NSS.1 2000-04-30 NA 101010 2000 4 2000_4 NA <NA> NA NA NA
5: NS GROUP INC NSS.1 2000-05-31 NA 101010 2000 5 2000_5 NA <NA> NA NA NA
6: NS GROUP INC NSS.1 2000-06-30 22.008 101010 2000 6 2000_6 NA <NA> 9 2000 3
req epspiq epspxq ajexq saleq saley ivncfy gsubind dpq ibmiiq ibq iby oiadpq
1: NA NA NA NA NA NA NA NA NA NA NA NA NA
2: NA NA NA NA NA NA NA NA NA NA NA NA NA
3: -58.396 -0.38 -0.38 1 100.107 186.733 10.77 10101020 5.517 NA -8.231 -21.165 -5.617
4: NA NA NA NA NA NA NA NA NA NA NA NA NA
5: NA NA NA NA NA NA NA NA NA NA NA NA NA
6: -63.168 -0.19 -0.23 1 73.652 260.385 20.90 10101020 NA NA -5.048 -26.213 NA
oiadpy oibdpq oibdpy xiq xoprq cogsy dlcchy wcapchy QEBIT.adep YEBIT.adep QEBIT.bdep
1: NA NA NA NA NA NA NA NA NA NA NA
2: NA NA NA NA NA NA NA NA NA NA NA
3: -16.924 -0.1 -5.57 0 100.207 177.826 -0.394 NA -0.05610996 -0.09063208 -0.0009989311
4: NA NA NA NA NA NA NA NA NA NA NA
5: NA NA NA NA NA NA NA NA NA NA NA
6: NA NA NA 0 NA NA -0.394 NA NA NA NA
YEBIT.bdep QEBT YEBT f_id I.QSales IWA.QEBIT IWA.QEBT I.YSales IWA.YEBIT
1: NA NA NA NA NA NA NA NA NA
2: NA NA NA NA NA NA NA NA NA
3: -0.000535524 -0.08222202 -0.1133437 2000Q2 19344.53 0.08160277 0.03577741 196223.7 0.08329726
4: NA NA NA NA NA NA NA NA NA
5: NA NA NA NA NA NA NA NA NA
6: NA -0.06853853 -0.1006702 2000Q3 19798.64 0.10680607 0.06096211 196223.7 0.08329726
IWA.YEBT QSales.pc YSales.pc RSI_QEBIT RSI_QEBT RSI_IWA.QEBIT RSI_IWA.QEBT adj.factor
1: NA NA NA NA NA NA NA 1
2: NA NA NA NA NA NA NA 1
3: 0.03875869 0.005174952 0.0009516334 41.45963 32.93934 29.96487 18.23527 1
4: NA NA NA NA NA NA NA 1
5: NA NA NA NA NA NA NA 1
6: 0.03875869 0.003720053 0.0013269806 49.83110 34.64800 37.58678 24.75847 1
dvpsxm cshtrm curcdm close high low trfm trt1m close.unAdj mktcap close.div
1: NA 4557500 USD 8.8750 10.1250 6.7500 1.0409 16.3934 8.8750 NA 8.8750
2: NA 4506100 USD 11.6875 12.1250 8.0625 1.0409 31.6901 11.6875 NA 11.6875
3: NA 4146200 USD 16.3125 16.8125 11.3750 1.0409 39.5722 16.3125 351.2571 16.3125
4: NA 3215400 USD 15.8750 16.3750 12.8750 1.0409 -2.6820 15.8750 NA 15.8750
5: NA 2948800 USD 18.3125 19.3750 16.0625 1.0409 15.3543 18.3125 NA 18.3125
6: NA 4296100 USD 20.9375 21.0000 17.7500 1.0409 14.3345 20.9375 460.7925 20.9375
RSI_close RSI.verticalBottom RSI.top return pos
1: NA NA NA NA NA
2: NA NA NA NA NA
3: NA NA NA NA NA
4: NA NA NA NA NA
5: NA NA NA NA NA
6: NA NA NA NA NA
Sample data
> dput(sample.data)
structure(list(conm = c("NS GROUP INC", "NS GROUP INC", "NS GROUP INC",
"NS GROUP INC", "NS GROUP INC", "NS GROUP INC"), tic = c("NSS.1",
"NSS.1", "NSS.1", "NSS.1", "NSS.1", "NSS.1"), datadate = structure(c(10987,
11016, 11047, 11077, 11108, 11138), class = "Date"), cshoq = c(NA,
NA, 21.533, NA, NA, 22.008), gind = c(101010L, 101010L, 101010L,
101010L, 101010L, 101010L), year = c(2000, 2000, 2000, 2000,
2000, 2000), month = c(1, 2, 3, 4, 5, 6), yearmon = c("2000_1",
"2000_2", "2000_3", "2000_4", "2000_5", "2000_6"), fdateq = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), pdateq = structure(c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), fyr = c(NA, NA, 9L, NA,
NA, 9L), fyearq = c(NA, NA, 2000L, NA, NA, 2000L), fqtr = c(NA,
NA, 2L, NA, NA, 3L), req = c(NA, NA, -58.396, NA, NA, -63.168
), epspiq = c(NA, NA, -0.38, NA, NA, -0.19), epspxq = c(NA, NA,
-0.38, NA, NA, -0.23), ajexq = c(NA, NA, 1, NA, NA, 1), saleq = c(NA,
NA, 100.107, NA, NA, 73.652), saley = c(NA, NA, 186.733, NA,
NA, 260.385), ivncfy = c(NA, NA, 10.77, NA, NA, 20.9), gsubind = c(NA,
NA, 10101020L, NA, NA, 10101020L), dpq = c(NA, NA, 5.517, NA,
NA, NA), ibmiiq = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), ibq = c(NA, NA, -8.231, NA, NA, -5.048), iby = c(NA,
NA, -21.165, NA, NA, -26.213), oiadpq = c(NA, NA, -5.617, NA,
NA, NA), oiadpy = c(NA, NA, -16.924, NA, NA, NA), oibdpq = c(NA,
NA, -0.1, NA, NA, NA), oibdpy = c(NA, NA, -5.57, NA, NA, NA),
xiq = c(NA, NA, 0, NA, NA, 0), xoprq = c(NA, NA, 100.207,
NA, NA, NA), cogsy = c(NA, NA, 177.826, NA, NA, NA), dlcchy = c(NA,
NA, -0.394, NA, NA, -0.394), wcapchy = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), QEBIT.adep = c(NA,
NA, -0.0561099623402959, NA, NA, NA), YEBIT.adep = c(NA,
NA, -0.0906320789576561, NA, NA, NA), QEBIT.bdep = c(NA,
NA, -0.000998931143676266, NA, NA, NA), YEBIT.bdep = c(NA,
NA, -0.000535523983441598, NA, NA, NA), QEBT = c(NA, NA,
-0.0822220224359935, NA, NA, -0.0685385325585184), YEBT = c(NA,
NA, -0.113343651095414, NA, NA, -0.100670161491637), f_id = c(NA,
NA, "2000Q2", NA, NA, "2000Q3"), I.QSales = c(NA, NA, 19344.526,
NA, NA, 19798.641), IWA.QEBIT = c(NA, NA, 0.0816027748625115,
NA, NA, 0.10680606815387), IWA.QEBT = c(NA, NA, 0.0357774080378087,
NA, NA, 0.0609621135107203), I.YSales = c(NA, NA, 196223.665,
NA, NA, 196223.665), IWA.YEBIT = c(NA, NA, 0.0832972567299668,
NA, NA, 0.0832972567299668), IWA.YEBT = c(NA, NA, 0.0387586889685299,
NA, NA, 0.0387586889685299), QSales.pc = c(NA, NA, 0.00517495233535316,
NA, NA, 0.00372005331072976), YSales.pc = c(NA, NA, 0.000951633433204909,
NA, NA, 0.00132698061673652), RSI_QEBIT = c(NA, NA, 41.4596290506163,
NA, NA, 49.8310957229999), RSI_QEBT = c(NA, NA, 32.939339100869,
NA, NA, 34.6480049470139), RSI_IWA.QEBIT = c(NA, NA, 29.9648696052066,
NA, NA, 37.5867809473848), RSI_IWA.QEBT = c(NA, NA, 18.2352737965041,
NA, NA, 24.7584711404174), adj.factor = c(1, 1, 1, 1, 1,
1), dvpsxm = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), cshtrm = c(4557500, 4506100, 4146200, 3215400,
2948800, 4296100), curcdm = c("USD", "USD", "USD", "USD",
"USD", "USD"), close = c(8.875, 11.6875, 16.3125, 15.875,
18.3125, 20.9375), high = c(10.125, 12.125, 16.8125, 16.375,
19.375, 21), low = c(6.75, 8.0625, 11.375, 12.875, 16.0625,
17.75), trfm = c(1.0409, 1.0409, 1.0409, 1.0409, 1.0409,
1.0409), trt1m = c(16.3934, 31.6901, 39.5722, -2.682, 15.3543,
14.3345), close.unAdj = c(8.875, 11.6875, 16.3125, 15.875,
18.3125, 20.9375), mktcap = c(NA, NA, 351.2570625, NA, NA,
460.7925), close.div = c(8.875, 11.6875, 16.3125, 15.875,
18.3125, 20.9375), RSI_close = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), RSI.verticalBottom = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), RSI.top = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), return = c(NA,
NA, NA, NA, NA, NA), pos = c(NA, NA, NA, NA, NA, NA)), .Names = c("conm",
"tic", "datadate", "cshoq", "gind", "year", "month", "yearmon",
"fdateq", "pdateq", "fyr", "fyearq", "fqtr", "req", "epspiq",
"epspxq", "ajexq", "saleq", "saley", "ivncfy", "gsubind", "dpq",
"ibmiiq", "ibq", "iby", "oiadpq", "oiadpy", "oibdpq", "oibdpy",
"xiq", "xoprq", "cogsy", "dlcchy", "wcapchy", "QEBIT.adep", "YEBIT.adep",
"QEBIT.bdep", "YEBIT.bdep", "QEBT", "YEBT", "f_id", "I.QSales",
"IWA.QEBIT", "IWA.QEBT", "I.YSales", "IWA.YEBIT", "IWA.YEBT",
"QSales.pc", "YSales.pc", "RSI_QEBIT", "RSI_QEBT", "RSI_IWA.QEBIT",
"RSI_IWA.QEBT", "adj.factor", "dvpsxm", "cshtrm", "curcdm", "close",
"high", "low", "trfm", "trt1m", "close.unAdj", "mktcap", "close.div",
"RSI_close", "RSI.verticalBottom", "RSI.top", "return", "pos"
), sorted = c("conm", "tic", "datadate", "cshoq", "gind", "year",
"month", "yearmon"), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x102806978>)

Construct new column from last non-NA values for each row [duplicate]

This question already has answers here:
Select last non-NA value in a row, by row
(3 answers)
Closed last month.
I have a data frame Depth which consist of LON and LAT with corresponding depths temperature data. For each coordinate (LON and LAT) I would like to pull out last record of each depth corresponding to the coordinates into a new data frame,
> Depth<-read.csv('depthdata.csv')
> head(Depth)
LAT LON X150 X175 X200 X225 X250 X275 X300 X325 X350 X375 X400 X425 X450
1 -78.375 -163.875 -1.167 -1.0 NA NA NA NA NA NA NA NA NA NA NA
2 -78.125 -168.875 -1.379 -1.3 -1.259 -1.6 -1.476 -1.374 -1.507 NA NA NA NA NA NA
3 -78.125 -167.625 -1.700 -1.7 -1.700 -1.7 NA NA NA NA NA NA NA NA NA
4 -78.125 -167.375 -2.100 -2.2 -2.400 -2.3 -2.200 NA NA NA NA NA NA NA NA
5 -78.125 -167.125 -1.600 -1.6 -1.600 -1.6 NA NA NA NA NA NA NA NA NA
6 -78.125 -166.875 NA NA NA NA NA NA NA NA NA NA NA NA NA
so that I will have this;
LAT LON
-78.375 -163.875 -1
-78.125 -168.875 -1.507
-78.125 -167.625 -1.7
-78.125 -167.375 -2.2
-78.125 -167.125 -1.6
-78.125 -166.875 NA
I tried the tail() function but I don't have the desirable result.

As I understand it, you want the last non-NA value in each row, for all columns except the first two.
We can use max.col() along with is.na() with our relevant columns to get us the column number for the last non-NA value. 2 is added (shown by + 2L) to compensate for the removal of the first two columns (shown by [-(1:2)]).
idx <- max.col(!is.na(Depth[-(1:2)]), ties.method = "last") + 2L
We can use idx in cbind() to create an index matrix for retrieving the values.
Depth[cbind(seq_len(nrow(Depth)), idx)]
# [1] -1.000 -1.507 -1.700 -2.200 -1.600 NA
Bind this together with the first two columns of the original data with cbind() and we're done.
cbind(Depth[1:2], LAST = Depth[cbind(seq_len(nrow(Depth)), idx)])
# LAT LON LAST
# 1 -78.375 -163.875 -1.000
# 2 -78.125 -168.875 -1.507
# 3 -78.125 -167.625 -1.700
# 4 -78.125 -167.375 -2.200
# 5 -78.125 -167.125 -1.600
# 6 -78.125 -166.875 NA
Data:
Depth <- structure(list(LAT = c(-78.375, -78.125, -78.125, -78.125, -78.125,
-78.125), LON = c(-163.875, -168.875, -167.625, -167.375, -167.125,
-166.875), X150 = c(-1.167, -1.379, -1.7, -2.1, -1.6, NA), X175 = c(-1,
-1.3, -1.7, -2.2, -1.6, NA), X200 = c(NA, -1.259, -1.7, -2.4,
-1.6, NA), X225 = c(NA, -1.6, -1.7, -2.3, -1.6, NA), X250 = c(NA,
-1.476, NA, -2.2, NA, NA), X275 = c(NA, -1.374, NA, NA, NA, NA
), X300 = c(NA, -1.507, NA, NA, NA, NA), X325 = c(NA, NA, NA,
NA, NA, NA), X350 = c(NA, NA, NA, NA, NA, NA), X375 = c(NA, NA,
NA, NA, NA, NA), X400 = c(NA, NA, NA, NA, NA, NA), X425 = c(NA,
NA, NA, NA, NA, NA), X450 = c(NA, NA, NA, NA, NA, NA)), .Names = c("LAT",
"LON", "X150", "X175", "X200", "X225", "X250", "X275", "X300",
"X325", "X350", "X375", "X400", "X425", "X450"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Make the leading column value NA if condition is met using R - r

You can use the following solution: library(dplyr) df %>% mutate(hosp = case_when( is.na(lag(out)) & is.na(lag(Insti)) & lag(transfer) == 1 ~ NA_real_, TRUE ~ hosp )) id transfer hosp out Insti 1 15305 0 2182 2183 NA 2 15305 1 2452 NA NA 3 15305 0 NA 2857 NA 4 6224 1 1474 NA NA 5 6224 0 NA 1486 NA

Related

How to set missing some columns and their corresponding columns in data frame in R

Determine range of time where measurements are not NA

Iterate through columns' suffixes in a for loop. R

Why do I need to call an object returned with invisible() twice for it to print?

Construct new column from last non-NA values for each row [duplicate]

Categories

Resources