Push up and tighten Dataframe. General solution - r

I want to push up (metaphorically) the dataframe in ordner to get rid of the spaces (NA-Values)
My Data:
> dput(df1)
structure(list(ID = c("CN1-1", "CN1-1", "CN1-1", "CN1-10", "CN1-10",
"CN1-10", "CN1-11", "CN1-11", "CN1-11", "CN1-12", "CN1-12", "CN1-12",
"CN1-13", "CN1-13", "CN1-13"), v1 = c(0.37673, NA, NA, 1.019972,
NA, NA, 0.515152, NA, NA, 0.375139, NA, NA, 0.508125, NA, NA),
v2 = c(NA, 0.732, NA, NA, 0, NA, NA, 0.748, NA, NA, 0.466,
NA, NA, 0.57, NA), v2 = c(NA, NA, 0.357, NA, NA, 0.816, NA,
NA, 0.519, NA, NA, 0.206, NA, NA, 0.464)), .Names = c("ID",
"v1", "v2", "v2"), row.names = c(NA, 15L), class = "data.frame")
>
Looks like:
ID v1 v2 v2
1 CN1-1 0.376730 NA NA
2 CN1-1 NA 0.732 NA
3 CN1-1 NA NA 0.357
4 CN1-10 1.019972 NA NA
5 CN1-10 NA 0.000 NA
6 CN1-10 NA NA 0.816
7 CN1-11 0.515152 NA NA
8 CN1-11 NA 0.748 NA
9 CN1-11 NA NA 0.519
10 CN1-12 0.375139 NA NA
11 CN1-12 NA 0.466 NA
12 CN1-12 NA NA 0.206
13 CN1-13 0.508125 NA NA
14 CN1-13 NA 0.570 NA
15 CN1-13 NA NA 0.464
Please note: I'm not sure if the pattern is consistent over all rows. It could also be possible, that one or more variables are prominent 2+ times per ID Group.
Desired output:
ID v1 v2 v2
1 CN1-1 0.376730 0.732 0.357
2 CN1-10 1.019972 0.000 0.816
...
My idea was to melt then get rid of all NA values and then dcast. Any better approach?
EDIT:
duplicated could look like this.
16 CN1-x 0.508125 NA NA
17 CN1-x NA 0.570 NA
18 CN1-x NA NA 0.464
19 CN1-x NA NA 0.134

do.call(rbind,
lapply(split(df1, df1$ID), function(a)
data.frame(ID = a$ID[1], lapply(a[-1], sum, na.rm = TRUE))))
# ID v1 v2 v2.1
#CN1-1 CN1-1 0.376730 0.732 0.357
#CN1-10 CN1-10 1.019972 0.000 0.816
#CN1-11 CN1-11 0.515152 0.748 0.519
#CN1-12 CN1-12 0.375139 0.466 0.206
#CN1-13 CN1-13 0.508125 0.570 0.464

Related

Make the leading column value NA if condition is met using R

I got a df such as
structure(list(id = c(15305, 15305, 15305, 6224, 6224), transfer = c(0,
1, 0, 1, 0), hosp = c(2182, 2452, 2846, 1474, 1476), out = c(2183,
NA, 2857, NA, 1486), Insti = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
And I want to insert NA in the leading "hosp" column if the lagging "out" and lagging "Insti" columns are NA AND the "transfer" column == 1
I want the df to look like this
structure(list(id2 = c(15305, 15305, 15305, 6224, 6224), transfer2 = c(0,
1, 0, 1, 0), hosp2 = c(2182, 2452, NA, 1474, NA), out2 = c(2183,
NA, 2857, NA, 1486), Insti2 = c(NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-5L))
You can use the following solution:
library(dplyr)
df %>%
mutate(hosp = case_when(
is.na(lag(out)) & is.na(lag(Insti)) & lag(transfer) == 1 ~ NA_real_,
TRUE ~ hosp
))
id transfer hosp out Insti
1 15305 0 2182 2183 NA
2 15305 1 2452 NA NA
3 15305 0 NA 2857 NA
4 6224 1 1474 NA NA
5 6224 0 NA 1486 NA
To get the "lag" you may remove last value and add NA as first value. Here a base R solution using ifelse.
transform(df,
hosp=ifelse(is.na(c(NA, out[-nrow(df)])) & is.na(c(NA, Insti[-nrow(df)])) &
c(NA, Insti[-nrow(df)]) == 1, NA, hosp))
# id transfer hosp out Insti
# 1 15305 0 NA 2183 NA
# 2 15305 1 2452 NA NA
# 3 15305 0 NA 2857 NA
# 4 6224 1 1474 NA NA
# 5 6224 0 NA 1486 NA

Modify columns' values depending on other columns

My dataset contains NDVI values and NDVI-QualityDescriptor values(PixelQa) for different areas in different dates. I basically want to erase (setting to NA) the NDVI values that are related to bad quality descriptor (PixelQa). The number suffix of the column names relates both data: PixelQa_1 is related to NDVI_1 and so on.
Therefore to "clean" my data I have to check PixelQa values in order to assess if I have to change its related NDVI value. There is 3 possible situations:
PixelQa is NA -> then NDVI should be also NA.
Pixel Qa is 66±0.5 OR 130±0.5 -> then NDVI remains the same value.
Pixel Qa is different to 66±0.5 OR 130±0.5 -> then NDVI value is set to NA (this is bad quality data which needs to be ignored).
My dataset could be:
DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)
DataNDVI_split
X21feb1987_NDVI1 X21feb1987_PixelQa1 X18jul1987_NDVI2 X21feb1987_PixelQa2
1 0.123 66.30 0.223 66.30
2 NA NA NA NA
3 0.192 66.00 0.230 66.00
4 0.234 79.87 0.334 79.87
5 NA NA NA NA
And "clean" it should look like:
DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)
DataNDVI_split
X21feb1987_NDVI1 X21feb1987_PixelQa1 X18jul1987_NDVI2 X21feb1987_PixelQa2
1 0.123 66.30 0.223 66.30
2 NA NA NA NA
3 0.192 66.00 0.230 66.00
4 NA 79.87 NA 79.87
5 NA NA NA NA
Here's a tentative solution.
First, I'd split up the data into two separate dataframes, thus:
df_ndvi <- DataNDVI[grepl("NDVI", DataNDVI$Data), ]
df_ndvi
Data X21feb1987 X18jul1987
1 NDVI1 0.123 0.223
2 NDVI2 NA NA
3 NDVI3 0.192 0.230
4 NDVI4 0.234 0.334
5 NDVI5 NA NA
df_pixel <- DataNDVI[!grepl("NDVI", DataNDVI$Data), ]
df_pixel
Data X21feb1987 X18jul1987
6 PixelQa1 66.30 66.00
7 PixelQa2 NA NA
8 PixelQa3 66.00 124.23
9 PixelQa4 79.87 86.00
10 PixelQa5 NA NA
To perform the desired changes, there are many possible ways. One way is by using a forloop through all the columns in df_ndvi (except the first!) and defining an ifelse statement to see whether or not the conditions hold true and to define actions to be taken in either case:
for(i in 2:3){
df_ndvi[,i] <- ifelse(df_pixel[,i] < 65.5 | df_pixel[,i] > 66.5, NA, df_ndvi[,i])
}
This results in these corrections in df_ndvi:
df_ndvi
Data X21feb1987 X18jul1987
1 NDVI1 0.123 0.223
2 NDVI2 NA NA
3 NDVI3 0.192 NA
4 NDVI4 NA NA
5 NDVI5 NA NA
EDIT:
If you prefer to split-up the data in this way:
DataNDVI_split <- data.frame("21feb1987_NDVI" = c(0.123, NA, 0.192, 0.234, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), "18jul1987_NDVI" = c(0.223, NA, 0.230, 0.334, NA), "21feb1987_PixelQa" = c(66.30, NA, 66.00, 79.87, NA), stringsAsFactors = FALSE)
DataNDVI_split
X21feb1987_NDVI X21feb1987_PixelQa X18jul1987_NDVI X21feb1987_PixelQa.1
1 0.123 66.30 0.223 66.30
2 NA NA NA NA
3 0.192 66.00 0.230 66.00
4 0.234 79.87 0.334 79.87
5 NA NA NA NA
then the for loop could be adapted thus:
for(i in c(1,3)){
DataNDVI_split[,i] <- ifelse(DataNDVI_split[,i+1] < 65.5 | DataNDVI_split[,i+1] > 66.5, NA, DataNDVI_split[,i])
}
The result is this:
DataNDVI_split
X21feb1987_NDVI X21feb1987_PixelQa X18jul1987_NDVI X21feb1987_PixelQa.1
1 0.123 66.30 0.223 66.30
2 NA NA NA NA
3 0.192 66.00 0.230 66.00
4 NA 79.87 NA 79.87
5 NA NA NA NA

Why do I need to call an object returned with invisible() twice for it to print?

From the documentation I read that invisible() returns a (temporarily) invisible copy of an object. Now when I use invisible I always need to call the object twice before it is actually printed.
I use data.table and would like my function to return an invisible copy of the object given that a certain condition is met (i.e premature abortion of function).
I've noticed that this behaviour of "needing double/two calls" also applies if the invisibly returned object is used inside another function, making its use seemingly unusable. What causes this behaviour? Am I doing something wrong? How do I get the function to return invisibly, and printed on the first call?
Please see sample code below:
example <- function(DT) {
if (!(1 %in% DT$RSI.verticalBottom) | !(1 %in% DT$RSI.top)) {
# abort if there is no buy or sell signal
DT[, `:=`(pos = NA,
return = NA
)]
return(invisible(DT))
}
> example(sample.data)
> sample.data
> sample.data
conm tic datadate cshoq gind year month yearmon fdateq pdateq fyr fyearq fqtr
1: NS GROUP INC NSS.1 2000-01-31 NA 101010 2000 1 2000_1 NA <NA> NA NA NA
2: NS GROUP INC NSS.1 2000-02-29 NA 101010 2000 2 2000_2 NA <NA> NA NA NA
3: NS GROUP INC NSS.1 2000-03-31 21.533 101010 2000 3 2000_3 NA <NA> 9 2000 2
4: NS GROUP INC NSS.1 2000-04-30 NA 101010 2000 4 2000_4 NA <NA> NA NA NA
5: NS GROUP INC NSS.1 2000-05-31 NA 101010 2000 5 2000_5 NA <NA> NA NA NA
6: NS GROUP INC NSS.1 2000-06-30 22.008 101010 2000 6 2000_6 NA <NA> 9 2000 3
req epspiq epspxq ajexq saleq saley ivncfy gsubind dpq ibmiiq ibq iby oiadpq
1: NA NA NA NA NA NA NA NA NA NA NA NA NA
2: NA NA NA NA NA NA NA NA NA NA NA NA NA
3: -58.396 -0.38 -0.38 1 100.107 186.733 10.77 10101020 5.517 NA -8.231 -21.165 -5.617
4: NA NA NA NA NA NA NA NA NA NA NA NA NA
5: NA NA NA NA NA NA NA NA NA NA NA NA NA
6: -63.168 -0.19 -0.23 1 73.652 260.385 20.90 10101020 NA NA -5.048 -26.213 NA
oiadpy oibdpq oibdpy xiq xoprq cogsy dlcchy wcapchy QEBIT.adep YEBIT.adep QEBIT.bdep
1: NA NA NA NA NA NA NA NA NA NA NA
2: NA NA NA NA NA NA NA NA NA NA NA
3: -16.924 -0.1 -5.57 0 100.207 177.826 -0.394 NA -0.05610996 -0.09063208 -0.0009989311
4: NA NA NA NA NA NA NA NA NA NA NA
5: NA NA NA NA NA NA NA NA NA NA NA
6: NA NA NA 0 NA NA -0.394 NA NA NA NA
YEBIT.bdep QEBT YEBT f_id I.QSales IWA.QEBIT IWA.QEBT I.YSales IWA.YEBIT
1: NA NA NA NA NA NA NA NA NA
2: NA NA NA NA NA NA NA NA NA
3: -0.000535524 -0.08222202 -0.1133437 2000Q2 19344.53 0.08160277 0.03577741 196223.7 0.08329726
4: NA NA NA NA NA NA NA NA NA
5: NA NA NA NA NA NA NA NA NA
6: NA -0.06853853 -0.1006702 2000Q3 19798.64 0.10680607 0.06096211 196223.7 0.08329726
IWA.YEBT QSales.pc YSales.pc RSI_QEBIT RSI_QEBT RSI_IWA.QEBIT RSI_IWA.QEBT adj.factor
1: NA NA NA NA NA NA NA 1
2: NA NA NA NA NA NA NA 1
3: 0.03875869 0.005174952 0.0009516334 41.45963 32.93934 29.96487 18.23527 1
4: NA NA NA NA NA NA NA 1
5: NA NA NA NA NA NA NA 1
6: 0.03875869 0.003720053 0.0013269806 49.83110 34.64800 37.58678 24.75847 1
dvpsxm cshtrm curcdm close high low trfm trt1m close.unAdj mktcap close.div
1: NA 4557500 USD 8.8750 10.1250 6.7500 1.0409 16.3934 8.8750 NA 8.8750
2: NA 4506100 USD 11.6875 12.1250 8.0625 1.0409 31.6901 11.6875 NA 11.6875
3: NA 4146200 USD 16.3125 16.8125 11.3750 1.0409 39.5722 16.3125 351.2571 16.3125
4: NA 3215400 USD 15.8750 16.3750 12.8750 1.0409 -2.6820 15.8750 NA 15.8750
5: NA 2948800 USD 18.3125 19.3750 16.0625 1.0409 15.3543 18.3125 NA 18.3125
6: NA 4296100 USD 20.9375 21.0000 17.7500 1.0409 14.3345 20.9375 460.7925 20.9375
RSI_close RSI.verticalBottom RSI.top return pos
1: NA NA NA NA NA
2: NA NA NA NA NA
3: NA NA NA NA NA
4: NA NA NA NA NA
5: NA NA NA NA NA
6: NA NA NA NA NA
Sample data
> dput(sample.data)
structure(list(conm = c("NS GROUP INC", "NS GROUP INC", "NS GROUP INC",
"NS GROUP INC", "NS GROUP INC", "NS GROUP INC"), tic = c("NSS.1",
"NSS.1", "NSS.1", "NSS.1", "NSS.1", "NSS.1"), datadate = structure(c(10987,
11016, 11047, 11077, 11108, 11138), class = "Date"), cshoq = c(NA,
NA, 21.533, NA, NA, 22.008), gind = c(101010L, 101010L, 101010L,
101010L, 101010L, 101010L), year = c(2000, 2000, 2000, 2000,
2000, 2000), month = c(1, 2, 3, 4, 5, 6), yearmon = c("2000_1",
"2000_2", "2000_3", "2000_4", "2000_5", "2000_6"), fdateq = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), pdateq = structure(c(NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_), class = "Date"), fyr = c(NA, NA, 9L, NA,
NA, 9L), fyearq = c(NA, NA, 2000L, NA, NA, 2000L), fqtr = c(NA,
NA, 2L, NA, NA, 3L), req = c(NA, NA, -58.396, NA, NA, -63.168
), epspiq = c(NA, NA, -0.38, NA, NA, -0.19), epspxq = c(NA, NA,
-0.38, NA, NA, -0.23), ajexq = c(NA, NA, 1, NA, NA, 1), saleq = c(NA,
NA, 100.107, NA, NA, 73.652), saley = c(NA, NA, 186.733, NA,
NA, 260.385), ivncfy = c(NA, NA, 10.77, NA, NA, 20.9), gsubind = c(NA,
NA, 10101020L, NA, NA, 10101020L), dpq = c(NA, NA, 5.517, NA,
NA, NA), ibmiiq = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), ibq = c(NA, NA, -8.231, NA, NA, -5.048), iby = c(NA,
NA, -21.165, NA, NA, -26.213), oiadpq = c(NA, NA, -5.617, NA,
NA, NA), oiadpy = c(NA, NA, -16.924, NA, NA, NA), oibdpq = c(NA,
NA, -0.1, NA, NA, NA), oibdpy = c(NA, NA, -5.57, NA, NA, NA),
xiq = c(NA, NA, 0, NA, NA, 0), xoprq = c(NA, NA, 100.207,
NA, NA, NA), cogsy = c(NA, NA, 177.826, NA, NA, NA), dlcchy = c(NA,
NA, -0.394, NA, NA, -0.394), wcapchy = c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), QEBIT.adep = c(NA,
NA, -0.0561099623402959, NA, NA, NA), YEBIT.adep = c(NA,
NA, -0.0906320789576561, NA, NA, NA), QEBIT.bdep = c(NA,
NA, -0.000998931143676266, NA, NA, NA), YEBIT.bdep = c(NA,
NA, -0.000535523983441598, NA, NA, NA), QEBT = c(NA, NA,
-0.0822220224359935, NA, NA, -0.0685385325585184), YEBT = c(NA,
NA, -0.113343651095414, NA, NA, -0.100670161491637), f_id = c(NA,
NA, "2000Q2", NA, NA, "2000Q3"), I.QSales = c(NA, NA, 19344.526,
NA, NA, 19798.641), IWA.QEBIT = c(NA, NA, 0.0816027748625115,
NA, NA, 0.10680606815387), IWA.QEBT = c(NA, NA, 0.0357774080378087,
NA, NA, 0.0609621135107203), I.YSales = c(NA, NA, 196223.665,
NA, NA, 196223.665), IWA.YEBIT = c(NA, NA, 0.0832972567299668,
NA, NA, 0.0832972567299668), IWA.YEBT = c(NA, NA, 0.0387586889685299,
NA, NA, 0.0387586889685299), QSales.pc = c(NA, NA, 0.00517495233535316,
NA, NA, 0.00372005331072976), YSales.pc = c(NA, NA, 0.000951633433204909,
NA, NA, 0.00132698061673652), RSI_QEBIT = c(NA, NA, 41.4596290506163,
NA, NA, 49.8310957229999), RSI_QEBT = c(NA, NA, 32.939339100869,
NA, NA, 34.6480049470139), RSI_IWA.QEBIT = c(NA, NA, 29.9648696052066,
NA, NA, 37.5867809473848), RSI_IWA.QEBT = c(NA, NA, 18.2352737965041,
NA, NA, 24.7584711404174), adj.factor = c(1, 1, 1, 1, 1,
1), dvpsxm = c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_,
NA_real_), cshtrm = c(4557500, 4506100, 4146200, 3215400,
2948800, 4296100), curcdm = c("USD", "USD", "USD", "USD",
"USD", "USD"), close = c(8.875, 11.6875, 16.3125, 15.875,
18.3125, 20.9375), high = c(10.125, 12.125, 16.8125, 16.375,
19.375, 21), low = c(6.75, 8.0625, 11.375, 12.875, 16.0625,
17.75), trfm = c(1.0409, 1.0409, 1.0409, 1.0409, 1.0409,
1.0409), trt1m = c(16.3934, 31.6901, 39.5722, -2.682, 15.3543,
14.3345), close.unAdj = c(8.875, 11.6875, 16.3125, 15.875,
18.3125, 20.9375), mktcap = c(NA, NA, 351.2570625, NA, NA,
460.7925), close.div = c(8.875, 11.6875, 16.3125, 15.875,
18.3125, 20.9375), RSI_close = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), RSI.verticalBottom = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), RSI.top = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), return = c(NA,
NA, NA, NA, NA, NA), pos = c(NA, NA, NA, NA, NA, NA)), .Names = c("conm",
"tic", "datadate", "cshoq", "gind", "year", "month", "yearmon",
"fdateq", "pdateq", "fyr", "fyearq", "fqtr", "req", "epspiq",
"epspxq", "ajexq", "saleq", "saley", "ivncfy", "gsubind", "dpq",
"ibmiiq", "ibq", "iby", "oiadpq", "oiadpy", "oibdpq", "oibdpy",
"xiq", "xoprq", "cogsy", "dlcchy", "wcapchy", "QEBIT.adep", "YEBIT.adep",
"QEBIT.bdep", "YEBIT.bdep", "QEBT", "YEBT", "f_id", "I.QSales",
"IWA.QEBIT", "IWA.QEBT", "I.YSales", "IWA.YEBIT", "IWA.YEBT",
"QSales.pc", "YSales.pc", "RSI_QEBIT", "RSI_QEBT", "RSI_IWA.QEBIT",
"RSI_IWA.QEBT", "adj.factor", "dvpsxm", "cshtrm", "curcdm", "close",
"high", "low", "trfm", "trt1m", "close.unAdj", "mktcap", "close.div",
"RSI_close", "RSI.verticalBottom", "RSI.top", "return", "pos"
), sorted = c("conm", "tic", "datadate", "cshoq", "gind", "year",
"month", "yearmon"), class = c("data.table", "data.frame"), row.names = c(NA,
-6L), .internal.selfref = <pointer: 0x102806978>)

If there are a certain number of consecutive NAs in a column, then replace the values

I have a tibble with a column called meanSR_strong and another called meanSR_weak. If there are 10 or more consecutive NAs in the meanSR_strong column, I would like to replace the values with values from the meanSR_weak column, even if those replaced values are also NA. If there are under consecutive NAs in the meanSR_strong column, then I don't need to do any replacing.
For example, rows 3-6 are all NA, but that is only four consecutive, so it doesn't matter. However rows 15-28 are all NA (and that is more than 10 in a row), so I want to sub in values from the meanSR_weak column.
I know how to replace all the NAs, but I haven't figured out a nice way of coding this!
Here is my data
x=structure(list(meanSR_strong = c(NA, 0.376009009009009, NA, NA,
NA, NA, 0.615585585585586, NA, 0.607354054054054, 0.590210810810811,
0.57005045045045, 0.596616216216216, 0.584066666666667, 0.538597297297297,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0.639010810810811,
0.634272972972973), meanSR_weak = c(0.574724324324324, 0.562030630630631,
0.586247747747748, NA, NA, NA, 0.615585585585586, NA, 0.607354054054054,
0.590210810810811, 0.57005045045045, 0.596616216216216, 0.608510810810811,
0.538597297297297, NA, NA, NA, 0.555463063063063, 0.376715315315315,
NA, NA, NA, NA, NA, NA, 0.60972972972973, NA, NA, 0.639010810810811,
0.634272972972973), cloud.pct_strong = c(100, 36.036036036036,
98.1981981981982, 100, 100, 100, 0, 100, 0, 0, 0, 0, 3.6036036036036,
0, NA, NA, 100, 67.5675675675676, 100, 100, NA, 100, 100, 100,
100, 74.7747747747748, 100, 100, 0, 0), cloud.pct_weak = c(0,
0, 0, 100, 100, 100, 0, 100, 0, 0, 0, 0, 0, 0, NA, NA, 100, 0,
36.036036036036, 67.5675675675676, NA, 100, 100, 100, 100, 0.900900900900901,
100, 60.3603603603604, 0, 0), date = structure(c(951868800, 951955200,
952041600, 952128000, 952214400, 952300800, 952387200, 952473600,
952560000, 952646400, 952732800, 952819200, 952905600, 952992000,
953078400, 953164800, 953251200, 953337600, 953424000, 953510400,
953596800, 953683200, 953769600, 953856000, 953942400, 954028800,
954115200, 954201600, 954288000, 954374400), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), .Names = c("meanSR_strong", "meanSR_weak",
"cloud.pct_strong", "cloud.pct_weak", "date"), row.names = c(NA,
-30L), class = c("tbl_df", "tbl", "data.frame"))
The R rle function can be used for this. First build an rle-list ("values" and "lengths", see ?rle) of the is.na-values:
z <- rle(is.na(x$meanSR_strong))
Then change the z$values entries from TRUE to FALSE when the run of NA's is less than some length that you choose. Here I choose 10:
z$values[z$lengths <10& z$values==TRUE] <- FALSE
Then reconstruct a logical vector for indexing with the [<- function using the rep-function which is essentially an inverse of rle:
x [ rep( z$values, z$lengths), "meanSR_strong"] <-
x[ rep( z$values, z$lengths), "meanSR_weak"]
print(x, n=30)
# A tibble: 30 x 5
meanSR_strong meanSR_weak cloud.pct_strong cloud.pct_weak date
<dbl> <dbl> <dbl> <dbl> <dttm>
1 NA 0.5747243 100.000000 0.0000000 2000-03-01
2 0.3760090 0.5620306 36.036036 0.0000000 2000-03-02
3 NA 0.5862477 98.198198 0.0000000 2000-03-03
4 NA NA 100.000000 100.0000000 2000-03-04
5 NA NA 100.000000 100.0000000 2000-03-05
6 NA NA 100.000000 100.0000000 2000-03-06
7 0.6155856 0.6155856 0.000000 0.0000000 2000-03-07
8 NA NA 100.000000 100.0000000 2000-03-08
9 0.6073541 0.6073541 0.000000 0.0000000 2000-03-09
10 0.5902108 0.5902108 0.000000 0.0000000 2000-03-10
11 0.5700505 0.5700505 0.000000 0.0000000 2000-03-11
12 0.5966162 0.5966162 0.000000 0.0000000 2000-03-12
13 0.5840667 0.6085108 3.603604 0.0000000 2000-03-13
14 0.5385973 0.5385973 0.000000 0.0000000 2000-03-14
15 NA NA NA NA 2000-03-15
16 NA NA NA NA 2000-03-16
17 NA NA 100.000000 100.0000000 2000-03-17
18 0.5554631 0.5554631 67.567568 0.0000000 2000-03-18
19 0.3767153 0.3767153 100.000000 36.0360360 2000-03-19
20 NA NA 100.000000 67.5675676 2000-03-20
21 NA NA NA NA 2000-03-21
22 NA NA 100.000000 100.0000000 2000-03-22
23 NA NA 100.000000 100.0000000 2000-03-23
24 NA NA 100.000000 100.0000000 2000-03-24
25 NA NA 100.000000 100.0000000 2000-03-25
26 0.6097297 0.6097297 74.774775 0.9009009 2000-03-26
27 NA NA 100.000000 100.0000000 2000-03-27
28 NA NA 100.000000 60.3603604 2000-03-28
29 0.6390108 0.6390108 0.000000 0.0000000 2000-03-29
30 0.6342730 0.6342730 0.000000 0.0000000 2000-03-30
temp = inverse.rle(with(rle(is.na(x$meanSR_strong)),
list(lengths = lengths,
values = replace(values, which(lengths > 10), 2))))
replace(x$meanSR_strong, temp == 2, x$meanSR_weak[temp == 2])
# [1] NA 0.3760090 NA NA NA
# [6] NA 0.6155856 NA 0.6073541 0.5902108
#[11] 0.5700505 0.5966162 0.5840667 0.5385973 NA
#[16] NA NA 0.5554631 0.3767153 NA
#[21] NA NA NA NA NA
#[26] 0.6097297 NA NA 0.6390108 0.6342730

Construct new column from last non-NA values for each row [duplicate]

This question already has answers here:
Select last non-NA value in a row, by row
(3 answers)
Closed last month.
I have a data frame Depth which consist of LON and LAT with corresponding depths temperature data. For each coordinate (LON and LAT) I would like to pull out last record of each depth corresponding to the coordinates into a new data frame,
> Depth<-read.csv('depthdata.csv')
> head(Depth)
LAT LON X150 X175 X200 X225 X250 X275 X300 X325 X350 X375 X400 X425 X450
1 -78.375 -163.875 -1.167 -1.0 NA NA NA NA NA NA NA NA NA NA NA
2 -78.125 -168.875 -1.379 -1.3 -1.259 -1.6 -1.476 -1.374 -1.507 NA NA NA NA NA NA
3 -78.125 -167.625 -1.700 -1.7 -1.700 -1.7 NA NA NA NA NA NA NA NA NA
4 -78.125 -167.375 -2.100 -2.2 -2.400 -2.3 -2.200 NA NA NA NA NA NA NA NA
5 -78.125 -167.125 -1.600 -1.6 -1.600 -1.6 NA NA NA NA NA NA NA NA NA
6 -78.125 -166.875 NA NA NA NA NA NA NA NA NA NA NA NA NA
so that I will have this;
LAT LON
-78.375 -163.875 -1
-78.125 -168.875 -1.507
-78.125 -167.625 -1.7
-78.125 -167.375 -2.2
-78.125 -167.125 -1.6
-78.125 -166.875 NA
I tried the tail() function but I don't have the desirable result.
As I understand it, you want the last non-NA value in each row, for all columns except the first two.
We can use max.col() along with is.na() with our relevant columns to get us the column number for the last non-NA value. 2 is added (shown by + 2L) to compensate for the removal of the first two columns (shown by [-(1:2)]).
idx <- max.col(!is.na(Depth[-(1:2)]), ties.method = "last") + 2L
We can use idx in cbind() to create an index matrix for retrieving the values.
Depth[cbind(seq_len(nrow(Depth)), idx)]
# [1] -1.000 -1.507 -1.700 -2.200 -1.600 NA
Bind this together with the first two columns of the original data with cbind() and we're done.
cbind(Depth[1:2], LAST = Depth[cbind(seq_len(nrow(Depth)), idx)])
# LAT LON LAST
# 1 -78.375 -163.875 -1.000
# 2 -78.125 -168.875 -1.507
# 3 -78.125 -167.625 -1.700
# 4 -78.125 -167.375 -2.200
# 5 -78.125 -167.125 -1.600
# 6 -78.125 -166.875 NA
Data:
Depth <- structure(list(LAT = c(-78.375, -78.125, -78.125, -78.125, -78.125,
-78.125), LON = c(-163.875, -168.875, -167.625, -167.375, -167.125,
-166.875), X150 = c(-1.167, -1.379, -1.7, -2.1, -1.6, NA), X175 = c(-1,
-1.3, -1.7, -2.2, -1.6, NA), X200 = c(NA, -1.259, -1.7, -2.4,
-1.6, NA), X225 = c(NA, -1.6, -1.7, -2.3, -1.6, NA), X250 = c(NA,
-1.476, NA, -2.2, NA, NA), X275 = c(NA, -1.374, NA, NA, NA, NA
), X300 = c(NA, -1.507, NA, NA, NA, NA), X325 = c(NA, NA, NA,
NA, NA, NA), X350 = c(NA, NA, NA, NA, NA, NA), X375 = c(NA, NA,
NA, NA, NA, NA), X400 = c(NA, NA, NA, NA, NA, NA), X425 = c(NA,
NA, NA, NA, NA, NA), X450 = c(NA, NA, NA, NA, NA, NA)), .Names = c("LAT",
"LON", "X150", "X175", "X200", "X225", "X250", "X275", "X300",
"X325", "X350", "X375", "X400", "X425", "X450"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6"))

Resources