Transfer pivottable to another table in R - r

In my research I have a dataset of cancer patients with some clinical information like cancer stage and treatment etc. Each patient has one row in a table with this clinical information. In addition, each patient has, at one or several timepoints during the treatment, taken blood samples, depending on how long the patient has been followed at the clinic. The first sample is from the first visit and the second sample is from the second visit at the clinic, and so on.
In the table, there is a variable (ie. column) that is named Sample_Time_1, which is the time for the first sample. Sample_Time_2 has the time (date) for the second sample and so on.
However - the samples were analysed at the lab and I got the result in a pivottable, which means I have a table where each sample has one row and therefore the results from one patient is displayed on several rows.
For example, create two tables:
x <- c(1,2,2,3,3,3,3,4,5,6,6,6,6,7,8,9,9,10)
y <- as.Date(c("2011-05-17","2012-06-30","2012-08-11","2011-10-15","2011-11-25","2012-01-07","2012-02-15","2011-08-13","2012-02-03","2011-11-08","2011-12-21","2012-02-01","2012-03-12","2012-01-03","2012-04-20","2012-03-31","2012-05-10","2011-12-15"), format="%Y-%m-%d", origin="1960-01-01")
z <- c(123,185,153,153,125,148,168,187,194,115,165,167,143,151,129,130,151,134)
Sheet_1 <- matrix(c(x,y,z), ncol=3, byrow=FALSE)
colnames(Sheet_1) <- c("ID","Sample_Time", "Sample_Value")
Sheet_1 <- as.data.frame(Sheet_1)
Sheet_1$Sample_Time <- y
x1 <- c(1,2,3,4,5,6,7,8,9,10)
x2 <- c(3,3,2,3,2,2,4,2,3,3)
x3 <- c(1,2,2,3,3,1,3,1,1,2)
x4 <- as.Date(c("2011-05-17","2012-06-30","2011-10-15","2011-08-13","2012-02-03","2011-11-08","2012-01-03","2012-04-20","2012-03-31","2011-12-15"), format="%Y-%m-%d", origin="1960-01-01")
x5 <- as.Date(c(NA,"2012-08-11","2011-11-25",NA,NA,"2011-12-21",NA,NA,"2012-05-10",NA), format="%Y-%m-%d", origin="1960-01-01")
x6 <- as.Date(c(NA,NA,"2012-01-07",NA,NA,"2012-02-01",NA,NA,NA,NA), format="%Y-%m-%d", origin="1960-01-01")
x7 <- as.Date(c(NA,NA,"2012-02-15",NA,NA,"2012-03-12",NA,NA,NA,NA), format="%Y-%m-%d", origin="1960-01-01")
Sheet_2 <- as.data.frame(c(1:10))
colnames(Sheet_2) <- "ID"
Sheet_2$Stage <- x2
Sheet_2$Treatment <- x3
Sheet_2$Sample_Time_1 <- x4
Sheet_2$Sample_Time_2 <- x5
Sheet_2$Sample_Time_3 <- x6
Sheet_2$Sample_Time_4 <- x7
Sheet_2$Sample_Value_1 <- NA
Sheet_2$Sample_Value_2 <- NA
Sheet_2$Sample_Value_3 <- NA
Sheet_2$Sample_Value_4 <- NA
I would like to transfer the Sample_Value for the first date a sample was taken from a patient from Sheet_1 to Sheet_2$Sample_Value_1 and if there are more samples, I would like to transfer them to column "Sample_Value_2" and so on.
I have tried with a double for-loop. For each patient (=ID) in Sheet_1 I have run through Sheet_2 and if there is a mach on ID, then I use another for-loop to see if there is a mach on a Sample_Time and insert (using if) the Sample_Value. However, I do not manage to get it to work and I have a strong feeling there must be a better way.
Any suggestions?

Is this what you want:
Prepare Sheet_1 for reshaping from long to wide by introducing an extra column with unique ID for each blood sample per patient
Sheet_1$uniqid <- with(Sheet_1, ave(as.character(ID), ID, FUN = seq_along))
And with this, do the re-shaping
S_1 <- reshape( Sheet_1, idvar = "ID", timevar = "uniqid", direction = "wide")
which gives you
> S_1
ID Sample_Time.1 Sample_Value.1 Sample_Time.2 Sample_Value.2 Sample_Time.3
1 1 2011-05-17 123 <NA> NA <NA>
2 2 2012-06-30 185 2012-08-11 153 <NA>
4 3 2011-10-15 153 2011-11-25 125 2012-01-07
8 4 2011-08-13 187 <NA> NA <NA>
9 5 2012-02-03 194 <NA> NA <NA>
10 6 2011-11-08 115 2011-12-21 165 2012-02-01
14 7 2012-01-03 151 <NA> NA <NA>
15 8 2012-04-20 129 <NA> NA <NA>
16 9 2012-03-31 130 2012-05-10 151 <NA>
18 10 2011-12-15 134 <NA> NA <NA>
Sample_Value.3 Sample_Time.4 Sample_Value.4
1 NA <NA> NA
2 NA <NA> NA
4 148 2012-02-15 168
8 NA <NA> NA
9 NA <NA> NA
10 167 2012-03-12 143
14 NA <NA> NA
15 NA <NA> NA
16 NA <NA> NA
18 NA <NA> NA
The number after the dot in the colnames is the uniqid.
Now you can merge the relevant columns from Sheet_2
S_2 <- merge( Sheet_2[ 1:3 ], S_1, by = "ID" )
and the result should be what you are looking for:
> S_2
ID Stage Treatment Sample_Time.1 Sample_Value.1 Sample_Time.2 Sample_Value.2
1 1 3 1 2011-05-17 123 <NA> NA
2 2 3 2 2012-06-30 185 2012-08-11 153
3 3 2 2 2011-10-15 153 2011-11-25 125
4 4 3 3 2011-08-13 187 <NA> NA
5 5 2 3 2012-02-03 194 <NA> NA
6 6 2 1 2011-11-08 115 2011-12-21 165
7 7 4 3 2012-01-03 151 <NA> NA
8 8 2 1 2012-04-20 129 <NA> NA
9 9 3 1 2012-03-31 130 2012-05-10 151
10 10 3 2 2011-12-15 134 <NA> NA
Sample_Time.3 Sample_Value.3 Sample_Time.4 Sample_Value.4
1 <NA> NA <NA> NA
2 <NA> NA <NA> NA
3 2012-01-07 148 2012-02-15 168
4 <NA> NA <NA> NA
5 <NA> NA <NA> NA
6 2012-02-01 167 2012-03-12 143
7 <NA> NA <NA> NA
8 <NA> NA <NA> NA
9 <NA> NA <NA> NA
10 <NA> NA <NA> NA

Related

Extracting a numeric information align with ID from unstructured dataset in R

I am trying to extract score information for each ID and for each itemID. Here how my sample dataset looks like.
df <- data.frame(Text_1 = c("Scoring", "1 = Incorrect","Text1","Text2","Text3","Text4", "Demo 1: Color Naming","Amarillo","Azul","Verde","Azul",
"Demo 1: Errors","Item 1: Color naming","Amarillo","Azul","Verde","Azul",
"Item 1: Time in seconds","Item 1: Errors",
"Item 2: Shape Naming","Cuadrado/Cuadro","Cuadrado/Cuadro","Círculo","Estrella","Círculo","Triángulo",
"Item 2: Time in seconds","Item 2: Errors"),
School.2 = c("Teacher:","DC Name:","Date (mm/dd/yyyy):","Child Grade:","Student Study ID:",NA, NA,NA,NA,NA,NA,
0,"1 = Incorrect responses",0,1,NA,NA,NA,0,"1 = Incorrect responses",0,NA,NA,1,1,0,NA,0),
X_Elementary_School..3 = c("Bill:","X District","10/7/21","K","123-2222-2:",NA, NA,NA,NA,NA,NA,
NA,"Child response",NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA,NA,NA),
School.4 = c("Teacher:","DC Name:","Date (mm/dd/yyyy):","Child Grade:","Student Study ID:",NA, 0,NA,1,NA,NA,0,"1 = Incorrect responses",0,1,NA,NA,120,0,"1 = Incorrect responses",NA,1,0,1,NA,1,110,0),
Y_Elementary_School..2 = c("John:","X District","11/7/21","K","112-1111-3:",NA, NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA,"Child response",NA,NA,NA,NA,NA,NA, NA,NA))
> df
Text_1 School.2 X_Elementary_School..3 School.4 Y_Elementary_School..2
1 Scoring Teacher: Bill: Teacher: John:
2 1 = Incorrect DC Name: X District DC Name: X District
3 Text1 Date (mm/dd/yyyy): 10/7/21 Date (mm/dd/yyyy): 11/7/21
4 Text2 Child Grade: K Child Grade: K
5 Text3 Student Study ID: 123-2222-2: Student Study ID: 112-1111-3:
6 Text4 <NA> <NA> <NA> <NA>
7 Demo 1: Color Naming <NA> <NA> 0 <NA>
8 Amarillo <NA> <NA> <NA> <NA>
9 Azul <NA> <NA> 1 <NA>
10 Verde <NA> <NA> <NA> <NA>
11 Azul <NA> <NA> <NA> <NA>
12 Demo 1: Errors 0 <NA> 0 <NA>
13 Item 1: Color naming 1 = Incorrect responses Child response 1 = Incorrect responses Child response
14 Amarillo 0 <NA> 0 <NA>
15 Azul 1 <NA> 1 <NA>
16 Verde <NA> <NA> <NA> <NA>
17 Azul <NA> <NA> <NA> <NA>
18 Item 1: Time in seconds <NA> <NA> 120 <NA>
19 Item 1: Errors 0 <NA> 0 <NA>
20 Item 2: Shape Naming 1 = Incorrect responses Child response 1 = Incorrect responses Child response
21 Cuadrado/Cuadro 0 <NA> <NA> <NA>
22 Cuadrado/Cuadro <NA> <NA> 1 <NA>
23 Círculo <NA> <NA> 0 <NA>
24 Estrella 1 <NA> 1 <NA>
25 Círculo 1 <NA> <NA> <NA>
26 Triángulo 0 <NA> 1 <NA>
27 Item 2: Time in seconds <NA> <NA> 110 <NA>
28 Item 2: Errors 0 <NA> 0 <NA>
This sample dataset is limited only for two schools, two teachers and two students.
In this step, I need to extract student responses for each item.
Wherever the first column has Item , I need to grab from there. I especially need to index the rows and columns columns rather than giving the exact row columns number since this will be for multiple datafiles and each files has different information. No need to grab the ..:Error part.
################################################################################
# ## 2-extract the score information here
# ## 1-grab item information from where "Item 1:.." starts
Here, rather than using row number, how to automate this part.
score<-df[c(7:11,13:17,20:26),c(seq(2,dim(df)[2],2))] # need to automate row and columns index here
score<-as.data.frame(t(score))
rownames(score)<-seq(1,nrow(score),1)
colnames(score)<-paste0('i',seq(1,ncol(score),1)) # assign col names for items
score<-apply(score,2,as.numeric) # only keep numeric columns
score<-as.data.frame(score)
score$total<-rowSums(score,na.rm=T); score # create a total score
> score
i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 i17 total
1 NA NA NA NA NA NA 0 1 NA NA NA 0 NA NA 1 1 0 3
2 0 NA 1 NA NA NA 0 1 NA NA NA NA 1 0 1 NA 1 5
Additionally, I need to add ID which I could not achieve here.
My desired output would be:
> score
ID i1 i2 i3 i4 i5 i6 i7 i8 i9 i10 i11 i12 i13 i14 i15 i16 i17 total
1 123-2222-2 NA NA NA NA NA NA 0 1 NA NA NA 0 NA NA 1 1 0 3
2 112-1111-3 0 NA 1 NA NA NA 0 1 NA NA NA NA 1 0 1 NA 1 5

R: na.locf not behaving as expected

I am trying to use the na.locf function in a mutate and I am getting a strange answer. The data is ordered desc by date and then if a column is NA gets the result from na.locf and otherwise uses the value in the column. For most of the data, the answer is being returned as expected, but one row is coming back not as the previous non-NA but as the next non-NA. If we order the data by date ascending and use na.rm = F and fromLast = T it works as expected, but I want to understand why the result is not working if date is ordered descending.
The example is as follows:
example = data.frame(Date = factor(c("1/14/15", "1/29/15", "2/3/15",
"2/11/15", "2/15/15", "3/4/15","3/7/15", "3/7/15", "3/11/15",
"3/18/15", "3/21/15", "4/22/15", "4/22/15", "4/23/15", "5/6/15",
"5/13/15", "5/18/15", "5/24/15", "5/26/15", "5/28/15", "5/29/15",
"5/29/15", "6/25/15", "6/25/15","8/6/15", "8/15/15", "8/20/15",
"8/22/15", "8/22/15", "8/29/15")),
Scan = c(1, rep(NA, 21),2,rep(NA,7)),
Hours = c(rep(NA,3), rep(3,3), NA, 2, rep(3,3), NA, 2, 3, 2,
rep(3,5), NA, 2, rep(c(NA, 3),2), 3, NA, 2, 3)
)
example %>%
mutate(
date = as.Date(Date, "%m/%d/%y"),
Hours = replace_na(Hours,0),
scan_date = as.Date(ifelse(is.na(Scan),
NA,
date),
origin="1970-01-01")) %>%
arrange(desc(date)) %>%
mutate(
scan_new = ifelse(is.na(Scan),
na.locf(Scan),
Scan))
The issue in the result is in row 24, the Scan is coming in as 1 rather than 2:
Date Scan Hours date scan_date scan_new
23 3/7/15 NA 0 2015-03-07 <NA> 2
24 3/7/15 NA 2 2015-03-07 <NA> 1
25 3/4/15 NA 3 2015-03-04 <NA> 2
Interestingly, other data with the same date is handled appropriately, for example on line 18-19
Date Scan Hours date scan_date scan_new
18 4/22/15 NA 0 2015-04-22 <NA> 2
19 4/22/15 NA 2 2015-04-22 <NA> 2
For reference as noted above, the following provides the expected answer:
example %>%
mutate(
date = as.Date(Date, "%m/%d/%y"),
Hours = replace_na(Hours,0),
scan_date = as.Date(ifelse(is.na(Scan),
NA,
date),
origin="1970-01-01")) %>%
arrange(desc(date)) %>%
mutate(
scan_new = ifelse(is.na(Scan),
na.locf(Scan, na.rm = F, fromLast = T),
Scan))
Date Scan Hours date scan_date scan_new
6 3/4/15 NA 3 2015-03-04 <NA> 2
7 3/7/15 NA 0 2015-03-07 <NA> 2
8 3/7/15 NA 2 2015-03-07 <NA> 2
Can someone tell me why this is behaving this way?
In your first try na.locf(Scan), the leading NAs are removed and the remaining values are recycled to the full length in the ifelse. You can see the results with na.rm = F(or na.locf0, see comments) for reference:
example %>%
mutate(
date = as.Date(Date, "%m/%d/%y"),
Hours = replace_na(Hours,0),
scan_date = as.Date(ifelse(is.na(Scan),
NA,
date),
origin="1970-01-01")) %>%
arrange(desc(date)) %>%
mutate(
scan_new = ifelse(is.na(Scan),
na.locf(Scan, na.rm = FALSE),
Scan))
# Date Scan Hours date scan_date scan_new
# 1 8/29/15 NA 3 2015-08-29 <NA> NA
# 2 8/22/15 NA 0 2015-08-22 <NA> NA
# 3 8/22/15 NA 2 2015-08-22 <NA> NA
# 4 8/20/15 NA 3 2015-08-20 <NA> NA
# 5 8/15/15 NA 3 2015-08-15 <NA> NA
# 6 8/6/15 NA 0 2015-08-06 <NA> NA
# 7 6/25/15 2 0 2015-06-25 2015-06-25 2
# 8 6/25/15 NA 3 2015-06-25 <NA> 2
# 9 5/29/15 NA 0 2015-05-29 <NA> 2
# 10 5/29/15 NA 2 2015-05-29 <NA> 2
# 11 5/28/15 NA 3 2015-05-28 <NA> 2
# 12 5/26/15 NA 3 2015-05-26 <NA> 2
# 13 5/24/15 NA 3 2015-05-24 <NA> 2
# 14 5/18/15 NA 3 2015-05-18 <NA> 2
# 15 5/13/15 NA 3 2015-05-13 <NA> 2
# 16 5/6/15 NA 2 2015-05-06 <NA> 2
# 17 4/23/15 NA 3 2015-04-23 <NA> 2
# 18 4/22/15 NA 0 2015-04-22 <NA> 2
# 19 4/22/15 NA 2 2015-04-22 <NA> 2
# 20 3/21/15 NA 3 2015-03-21 <NA> 2
# 21 3/18/15 NA 3 2015-03-18 <NA> 2
# 22 3/11/15 NA 3 2015-03-11 <NA> 2
# 23 3/7/15 NA 0 2015-03-07 <NA> 2
# 24 3/7/15 NA 2 2015-03-07 <NA> 2
# 25 3/4/15 NA 3 2015-03-04 <NA> 2
# 26 2/15/15 NA 3 2015-02-15 <NA> 2
# 27 2/11/15 NA 3 2015-02-11 <NA> 2
# 28 2/3/15 NA 0 2015-02-03 <NA> 2
# 29 1/29/15 NA 0 2015-01-29 <NA> 2
# 30 1/14/15 1 0 2015-01-14 2015-01-14 1

I need to add several rows together based on the fact that they have something in common with another row

Using the information on hand I need to predict how much of a particular product we need next month. I have several months worth of data going back, however the data is separated by both VPN and by a separate warehouse number. I just need to know how much to order in general and ignore the warehouse separation. we'll be adding that back in later.
There are multiple duplicates of many of the VPN's and i would like to consolidate all the duplicates and also sum the numbers that have been separated.
VPN Month To Date December November October September August July June May April March
0A36227-AA 15 6 4 2 NA 4 6 4 2 <NA> 4
0A36227-AA NA 1 NA NA NA NA 1 <NA> <NA> <NA> <NA>
0A36227-AA 2 3 1 NA 2 3 3 1 <NA> 2 3
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA 1 NA NA NA <NA> 1 <NA> <NA> <NA>
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA NA NA NA NA <NA> <NA> <NA> <NA> <NA>
So i want to combine all the duplicates and add all the numbers from the rows into just one row per VPN.
I've tried using the aggregate function and it didn't work for me. i may have used it wrong though.
any help would be appreciated!
also there are some cases where it may cause an infinite number to show up. if anyone has any further advice for how to handle that it would be welcome.
You basically want to know how to perform sum while grouping in your data frame.
You will find plenty of answer.
I have a data.table solution for your case:
plouf <- read.table(text = " VPN Month.To.Date December November October September August July June May April March
0A36227-AA 15 6 4 2 NA 4 6 4 2 <NA> 4
0A36227-AA NA 1 NA NA NA NA 1 <NA> <NA> <NA> <NA>
0A36227-AA 2 3 1 NA 2 3 3 1 <NA> 2 3
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA 1 NA NA NA <NA> 1 <NA> <NA> <NA>
0A36258-AA NA NA NA 1 NA NA <NA> <NA> 1 <NA> <NA>
0A36258-AA 1 NA NA NA NA NA <NA> <NA> <NA> <NA> <NA>",
stringsAsFactors = FALSE, header = TRUE)
here is the code
DT <- setDT(plouf)
tochange <- names(DT)[!names(DT) %in% "VPN"]
here the tochange vector is the list of your column you want to average
DT[,c(tochange) := lapply(.SD,function(x){as.numeric(x)}),.SDcols = tochange]
DT[,lapply(.SD,function(x){sum(x,na.rm = TRUE)}),.SDcols = tochange,by = VPN]
The first line is to set everything to numeric¨
The second line perform the sum ignoring the NAs and grouping by VPN. I am not 100% sure that is what you wanted.
VPN Month.To.Date December November October September August July June May April March i
1: 0A36227-AA 17 10 5 2 2 7 10 5 2 2 7 10
2: 0A36258-AA 2 0 1 2 0 0 0 1 2 0 0 0
I hope it helps
here is the dplyr equivalent
plouf %>%
mutate_at(vars(tochange),funs(as.numeric)) %>%
group_by(VPN) %>%
summarise_at(vars(tochange),funs(sum(.,na.rm = TRUE)))

R Fill cells with previous data

I have a table like the following:
days Debit loaddate
1 23/01/2014 138470289.4 23/01/2014
2 24/01/2014 NA NA
3 25/01/2014 NA NA
4 26/01/2014 NA NA
5 27/01/2014 NA NA
one row for each day and then in the columns loaddate after a few NA another date appears:
28 19/02/2014 NA NA
29 20/02/2014 NA NA
30 21/02/2014 NA NA
31 22/02/2014 9090967.9 22/02/2014
32 23/02/2014 NA NA
33 24/02/2014 308083.5 24/02/2014
I would like to replace each NA in loaddate column with the previous date in loaddate.
I tried:
for(i in 1:nrow(data3))
{ if (!is.na(data3[i,'Debit']))
{data3[i,'loaddate1']<-as.Date(data3[i,'loaddate'], format='%Y-%m-%d')}
else {data3[i,'loaddate1']<-data3[i-1,'loaddate1']}
}
But I got the wrong format:
> head(data3)
days Debit loaddate loaddate1
1 2014-01-23 138470289 2014-01-23 16093
2 2014-01-24 NA <NA> 16093
3 2014-01-25 NA <NA> 16093
4 2014-01-26 NA <NA> 16093
5 2014-01-27 NA <NA> 16093
6 2014-01-28 NA <NA> 16093
I need to get the date format also. If I do:
for(i in 1:nrow(data3))
{ if (!is.na(data3[i,'Debit']))
{data3[i,'loaddate1']<-as.Date(data3[i,'loaddate'], format='%Y-%m-%d')}
else {data3[i,'loaddate1']<-as.Date(data3[i-1,'loaddate1'], format='%Y-%m-%d')}
}
I got the wrong result (with NA).
days Debit loaddate loaddate1
1 2014-01-23 138470289 2014-01-23 16093
2 2014-01-24 NA <NA> <NA>
3 2014-01-25 NA <NA> <NA>
4 2014-01-26 NA <NA> <NA>
5 2014-01-27 NA <NA> <NA>
6 2014-01-28 NA <NA> <NA>
How can I get the right result and with the right format?
Also, Is there a better way to do this replacement? I mean without a loop.
Thanks.
Try zoo::na.locf and make sure to use the appropriate date format:
library(zoo)
data3$loaddate <- as.Date(na.locf(data3$loaddate), format='%d/%m/%Y'))

How can I fill up missing information using the previous values for each column? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Replacing NAs with latest non-NA value
How can I fill up missing information using the previous values for each column?
Date.end Date.beg Pollster Serra.PSDB
2012-06-26 2012-06-25 Datafolha 31.0
2012-06-27 <NA> <NA> NA
2012-06-28 <NA> <NA> NA
2012-06-29 <NA> <NA> NA
2012-06-30 <NA> <NA> NA
2012-07-01 <NA> <NA> NA
2012-07-02 <NA> <NA> NA
2012-07-03 <NA> <NA> NA
2012-07-04 <NA> Ibope 22
2012-07-05 <NA> <NA> NA
2012-07-06 <NA> <NA> NA
2012-07-07 <NA> <NA> NA
2012-07-08 <NA> <NA> NA
2012-07-09 <NA> <NA> NA
2012-07-10 <NA> <NA> NA
2012-07-11 <NA> <NA> NA
2012-07-12 2012-07-09 Veritá 31.4
I'm not sure if that is the best way to do it. Probably there is some package with exactly that functionality out there. The following approach might not be the one with the very best performance, but it certainly works and should be fine for small to medium datasets. I would be cautious to apply it for very large datasets (more than a million rows or something like that)
fillNAByPreviousData <- function(column) {
# At first we find out which columns contain NAs
navals <- which(is.na(column))
# and which columns are filled with data.
filledvals <- which(! is.na(column))
# If there would be no NAs following each other, navals-1 would give the
# entries we need. In our case, however, we have to find the last column filled for
# each value of NA. We may do this using the following sapply trick:
fillup <- sapply(navals, function(x) max(filledvals[filledvals < x]))
# And finally replace the NAs with our data.
column[navals] <- column[fillup]
column
}
Here is some example using a test dataset:
set.seed(123)
test <- 1:20
test[floor(runif(5,1, 20))] <- NA
> test
[1] 1 2 3 4 5 NA 7 NA 9 10 11 12 13 14 NA 16 NA NA 19 20
> fillNAByPreviousData(test)
[1] 1 2 3 4 5 5 7 7 9 10 11 12 13 14 14 16 16 16 19 20

Resources