Identifying individuals with observations across two datasets

Identifying individuals with observations across two datasets - r

I am working with R and "WGCNA" package. I am doing an integrative analysis of transcriptome and metabolome.
I have two data.frames, one for the transcriptome data: datExprFemale, and one for the metabomics data: allTraits, but I am having trouble merging the two data.frames together.
> datExprFemale[1:5, 1:5]
ID gene1 gene2 gene3 gene4
F16 -0.450904880 0.90116800 -2.710879397 0.98942336
F17 -0.304889916 0.70307639 -0.245912838 -0.01089557
F18 0.001696330 0.43059153 -0.177277078 -0.24611398
F19 -0.005428231 0.32838938 0.001070509 -0.31351216
H1 0.183912553 -0.10357460 0.069589703 0.15791036
> allTraits[1:5, 1:5]
IND met1 met2 met3 met4
F15 6546 68465 56465 6548
F17 89916 7639 2838 9557
F20 6330 53 7078 11398
F1 231 938 509 351216
The individuals in allTraits have measurements in datExprFemale, but some individuals in datExprFemale do not occur in allTraits.
Here is what I have tried to merge the two data.frames together:
# First get a vector containing the row names (individual's ID) in datExprFemale
IND=rownames(datExprFemale)
# Get the rows in which two variables have the same individuals
traitRows = match(allTraits$IND, IND)
datTraits = allTraits[traitRows, -1]
This gives me the following:
met1 met2 met3 met4
11 0.0009 0.0559 7.1224 3.3894
12 0.0006 0.0370 10.5776 14.4437
15 0.0011 0.0295 5.7941 19.0225
16 0.0010 0.0531 6.1010 4.7698
17 0.0016 0.0462 7.7819 7.8796
19 0.0011 0.0192 12.7126 9.2564
20 0.0007 0.0502 9.4147 15.3579
21 0.0025 0.0455 8.4129 17.7273
NA NA NA NA NA
NA.1 NA NA NA NA
NA.2 NA NA NA NA
NA.3 NA NA NA NA
NA.4 NA NA NA NA
3 0.0017 0.0375 8.8503 8.7581
7 0.0006 0.0156 7.9272 4.9887
8 0.0011 0.0154 8.4716 8.6515
9 0.0010 0.0306 9.1220 3.5843
As you see there are some NA values, but I'm not sure why?
Now when I want to assign the ID of each individual to the corresponding row using the following code :
rownames(datTraits) = allTraits[traitRows, 1]
R gives this error:
Error in `row.names<-.data.frame`(`*tmp*`, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names':
I'm not sure what I'm doing wrong,

There's a few problems in your code:
In the format you've presented, your datExprFemale does not have rownames, so the match won't work at all.
match is telling you the which rows the individuals in allTraits correspond to in datExprFemale, not the rows you need to extract from allTraits.
Here's the approach I would take:
# First make sure `allTraits` and `datExprFemale` actually have the right rownames
rownames(datExprFemale) = datExprFemale$ID
rownames(allTraits) = allTraits$IND
# Now get the individuals who have both transcriptomic and metabolomic
# measurements
has.both = union(rownames(allTraits), rownames(datExprFemale))
# Now pull out the subset of allTraits you want:
allTraits[has.both,]

thanks for your reply. in fact "datTraits" in the code must be like this:
Insulin_ug_l Glucose_Insulin Leptin_pg_ml Adiponectin Aortic.lesions
F2_3 944 0.42055085 15148.76 14.339 296250
F2_14 632 0.67088608 6188.74 15.439 486313
F2_15 3326 0.16746843 18400.26 11.124 180750
F2_19 426 0.89671362 8438.70 16.842 113000
F2_20 2906 0.15691672 41801.54 13.498 166750
F2_23 920 0.58804348 24133.54 14.511 234000
F2_24 1895 0.24538259 52360.00 13.813 267500
F2_26 7293 0.09090909 126880.00 14.118 198000
F2_37 653 0.65849923 17100.00 12.470 121000
F2_42 1364 0.35703812 99220.00 14.531 110000
in which rows are individuals and columns are metabolites. this variable contains individuals who are in both transcriptomics and metabolomics files.
but in case of the codes I have copied them from the tutorial of WGCNA.
thanks for any suggestion,
Behzad

Related

R count number of NA values for each row of a CSV

I'm trying to count the number of NA values for each row of a csv. How would I create a data frame containing the row name and count of NA values for that row?
Dataset -
# KJIS10 TYDA40 CSDF32 ASDF67
#c1 52.12 NA NA 67.23
#c2 NA 60.3 23.78 73.23
#c3 69.32 123.21 18.46 95.42
#c4 78.23 NA 94.36 107.43
#c5 89.15 47.98 36.18 38.54
#c6 90.45 NA 78.12 32.21
#c7 NA 67.2 NA NA
What I've tried so far, I'm able to get the correct counts but it's not summarised.
#Created dataframe from CSV
df = as.data.frame(import)
missing = rowSums(is.na(df))
Data frame I'm trying to create -
#'c1 2
#'c2 1
#'c3 0
#'c4 1
#'c5 0
#'c6 1
#'c7 3

try this:
result <- data.frame("rowmname"=rownames(df), "missing"=rowSums(is.na(df)))
result

R - enter basic formula

I am new to R and struggling to understand its quirks. I'm trying to do something which should be really simple, but is turning out to be apparently very complicated.
I am used to Excel, SQL and Minitab, where you can enter a value in one column which includes references to other columns and parameters. However, R doesn't seem to be allowing me to do this.
I have a table with (currently) four columns:
Date Pallets Lt Tt
1 28/12/2011 491 NA NA
2 29/12/2011 385 NA 0.787890411
3 30/12/2011 662 NA NA
4 31/12/2011 28 NA NA
5 01/01/2012 46 NA NA
6 02/01/2012 403 NA NA
7 03/01/2012 282 NA NA
8 04/01/2012 315 NA NA
9 05/01/2012 327 NA NA
10 06/01/2012 458 NA NA
and have a parameter "beta", with a value which I have assigned as 0.0002.
All I want to do is assign a formula to rows 3:10 which is:
beta*(Pallets t - Pallets t-1)+(1-beta)*Tt t-1.
I thought that the appropriate code might be:
Table[3:10,4]<-beta*(Table[3:10,"Pallets"]-Table[2:9,"Pallets"])+(1-beta)*Table[2:9,"Tt"]
However, this doesn't work. The first time I enter this formula, it generates:
Date Pallets Lt Tt
1 28/12/2011 491 NA NA
2 29/12/2011 385 NA 0.7878904
3 30/12/2011 662 NA 0.8431328
4 31/12/2011 28 NA NA
5 01/01/2012 46 NA NA
6 02/01/2012 403 NA NA
7 03/01/2012 282 NA NA
8 04/01/2012 315 NA NA
9 05/01/2012 327 NA NA
10 06/01/2012 458 NA NA
So it's generated the correct answer for the second item in the series, but not for any of the subsequent values.
It seems as though R doesn't automatically update each row, and the relationship to each other row, when you enter a formula, as Excel does. Having said that, Excel actually would require me to enter the formula in cell [4,Tt], and then drag this down to all of the other cells. Perhaps R is the same, and there is an equivalent to "dragging down" which I need to do?
Finally, I also noticed that when I change the value of the beta parameter, through, e.g. beta<-0.5, and then print the Table values again, they are unchanged - so the table hasn't updated even though I have changed the value of the parameter.
Appreciate that these are basic questions, but I am very new to R.

In R, the computations are not made "cell by cell", but are vectorised - in your example, R takes the vectors Table[3:10,"Pallets"], Table[2:9,"Pallets"] and Table[2:9,"Tt"] as they are at the moment, computes the resulting vector, and finally assigns it to Table[3:10,4].
If you want to make some computations "cell by cell", you have to use the for loop:
beta <- 0.5
df <- data.frame(v1 = 1:12, v2 = 0)
for (i in 3:10) {
df[i, "v2"] <- beta * (df[i, "v1"] - df[i-1, "v1"]) + (1 - beta) * df[i-1, "v2"]
}
df
v1 v2
1 1 0.0000000
2 2 0.0000000
3 3 0.5000000
4 4 0.7500000
5 5 0.8750000
6 6 0.9375000
7 7 0.9687500
8 8 0.9843750
9 9 0.9921875
10 10 0.9960938
11 11 0.0000000
12 12 0.0000000
As it comes to your second question, R will never update any values on its own (imagine having set manual calculation in Excel). So you need to repeat the computations after changing beta.

Although it's generally a bad design, but you can iterate over rows in a loop:
Table$temp <- c(0,diff(Table$Palletes,1))
prevTt = 0
for (i in 1:10)
{
Table$Tt[i] = Table$temp * beta + (1-beta)*prevTt
prevTt = Table$Tt[i]
}
Table$temp <- NULL

R: Selection for x,y coordinates with conditions

I have some difficulties to solve a problem concerning the selection of values in a data frame. Here is the thing:
- I have a data frame containing these variables: x-coordinates, y-coordinates, diameter, G value, H value, Quality value, Ecological value. Each line corresponds to one individual (which are trees in this exercise)
I need to find the individual with the best quality value = this I can do it
But then, I have to find the second tree with a good quality value, which has to be in the 10 next meters of the reference tree (the one with the best quality value).
And this selection has to be made at every tree selected, every time 10 meters further!
this should bring me to a selection of x-y-coordinates, which are separated by 10 meters and represent good quality value.
Now, here is what I tried:
> kk<- function(x, y)
+ {
+ coordx<-data$x.Koordinate[data$Q==24] #I have looked before for the best quality value of the sample, which is 24
+ coordy<-data$y.Koordinate[data$Q==24]
+ x <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate,0) #I choose that I did'nt wanted to have less than 15 of quality value
+ y<-ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate,0)#-11 meters from the reference coordinates, the next tree selected has to be inbetween
+ return(c(x,y))
+ }
> kk(data$x.Koordinate, data$y.Koordinate)
[1] 0 0 0 0 0 205550 205550 0 205600 205600 0 0 0 0 0 0 0
[18] 604100 0 604150 604100 0
The problem here is that we can not clearly see the difference between the coordinates for x and the ones for y.
I tried this:
> kk<- function(x, y)
+ {
+ coordx<-data$x.Koordinate[data$Q==24]
+ coordy<-data$y.Koordinate[data$Q==24]
+ x <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate," ")
+ y<-ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate," ")
+ return(list(x,y))
+ }
> kk(data$x.Koordinate, data$y.Koordinate)
[[1]]
[1] " " " " " " " " " " "205550" "205550" " " "205600" "205600" " "
[[2]]
[1] " " " " " " " " " " " " "604100" " " "604150" "604100" " "
>
Where we can see better the two levels related to the x and y coordinates.
The first question is simple: Is it possible for this function to return the values in a form like x,y or x y ? (without any 0, or « », or space) Or should I use another R function to obtain this result?
The second question is complex: How can I say to R to repeat this function from the coordinates he finds in this first attempt, and for the whole data?

Thank you very much for your answer. It helps me a lot! The first part of my problem is solved, the second seems to have a bug somewhere... And I don't see clearly where the function says to R to go every 10 meters further (in fact, every 50 meters, according to my data, see below)...But thank you anyway, it's a good starter, I will continue my research on this problem :)
PS: I understand it is difficult without the data. Unfortunately, I cannot show them on the net. However, I can show you a part of it:
ID Bezeichnung x.Koordinate y.Koordinate Q N hdom V Mittelstamm Fi Ta Foe Lae ueN Bu Es Ei Ah ueL Struktur
1 10,809 62 205450 603950 8 1067 21 64 10 NA NA NA NA NA 100 NA NA NA NA NA
2 10,810 63 205450 604000 16 1333 22 128 12 NA NA NA NA NA 75 NA NA 25 NA NA
3 10,811 56 205500 604050 20 800 22 160 18 NA NA NA NA NA 60 NA NA NA 40 NA
4 10,812 55 205500 604000 12 1033 20 97 12 33 NA NA NA NA 67 NA NA NA NA NA
5 10,813 54 205500 603950 20 500 56 0 23 NA NA NA NA NA 100 NA NA NA NA NA
6 10,814 46 205550 604050 16 567 32 215 19 75 NA NA NA NA 25 NA NA NA NA NA
7 10,815 47 205550 604100 16 233 26 174 30 NA 25 NA NA NA 50 NA NA NA 25 NA
8 10,816 48 205550 604150 0 1167 16 0 0 NA NA NA NA NA NA NA NA NA NA NA
9 10,817 43 205600 604150 24 633 33 366 22 83 17 NA NA NA NA NA NA NA NA NA
10 10,818 42 205600 604100 16 1500 33 282 12 NA NA NA NA NA NA NA NA 75 25 NA
Here is the result with your answer for the second problem:
> Arbres<-kk(x.Koordinate, y.Koordinate, data=data)
> for (i in 1:length(Arbres[,1])
+ kk(Arbres(i,1),Arbres[i,2])
Error: unexpected symbol in:
"for (i in 1:length(Arbres[,1])
kk"
Sorry, I just rename it "Arbre"
Thanks again,
C.

It's a bit hard for me to try this out without your dataset, or a small example of it, but I think the following should work for your first question.
The first time you use the function you enter the x and y coordinate of the tree that has Quality 24 for x and y in your function
> kk<- function(x, y)
+ {
+ coordx<-x
+ coordy<-y
+ x1 <- ifelse(data$x.Koordinate>coordx-11 & data$Q>15,data$x.Koordinate,NA)
+ y1 <- ifelse(data$y.Koordinate>coordy-11 & data$Q>15,data$y.Koordinate,NA)
+ return(matrix(c(x1,y1),nrow=length(x1), ncol=2, dimnames=list(NULL, c("x","y"))))
+ }
That should give you a matrix with two columns corresponding to the x and y coordinates and a NA if the condition is not met.
The second question is more difficult because as your output already showed there are multiple trees that meet the criteria you've set. If you want all of these checked again you can use the output of your function in a loop. Something like this:
Tree1_friends<-kk(data$x.Koordinate[data$Q==24], data$y.Koordinate[data$Q==24])
for (i in 1:length(Tree1_friends[,1]))
print(kk(Tree1_friends[i,1],Tree1_friends[i,2]))
Note that this code only prints the result, but with some clever assignment strategy you can probably save them as well

dplyr::left_join produce NA values for new joined columns

I have two tables I wish to left_join through the dplyr package. The issue is that is produces NA values for all new columns (the ones I'm after).
As you can see below, the left_join procudes NA values for the new column of Incep.Price and DayCounter. Why does this happen, and how can this be resolved?
Update: Thanks to #akrun, using left_join(Avanza.XML, checkpoint, by = c('Firm' = 'Firm')) solves the issue and the columns are joined correctly.
However the warning message is sitll the same, could someone explain this behaviour? Why one must in this case explicitly specify the join columns, or otherwise produce NA values?
> head(Avanza.XML)
Firm Gain.Month.1 Last.Price Vol.Month.1
1 Stockwik Förvaltning 131.25 0.074 131264420
2 Novestra 37.14 7.200 605330
3 Bactiguard Holding 29.55 14.250 2815572
4 MSC Group B 20.87 3.070 671855
5 NeuroVive Pharmaceutical 18.07 9.800 3280944
6 Shelton Petroleum B 16.21 3.800 2135798
> head(checkpoint)
Firm Gain.Month.1 Last.Price Vol.Month.1 Incep.Price DayCounter
1 Stockwik Förvaltning 87.50 0.06 91270090 0.032000 2016-01-25
2 Novestra 38.10 7.25 604683 5.249819 2016-01-25
3 Bactiguard Holding 29.09 14.20 2784161 11.000077 2016-01-25
4 MSC Group B 27.56 3.24 657699 2.539981 2016-01-25
5 Shelton Petroleum B 19.27 3.90 1985305 3.269892 2016-01-25
6 NeuroVive Pharmaceutical 16.87 9.70 3220303 8.299820 2016-01-25
> head(left_join(Avanza.XML, checkpoint))
Joining by: c("Firm", "Gain.Month.1", "Last.Price", "Vol.Month.1")
Firm Gain.Month.1 Last.Price Vol.Month.1 Incep.Price DayCounter
1 Stockwik Förvaltning 131.25 0.074 131264420 NA <NA>
2 Novestra 37.14 7.200 605330 NA <NA>
3 Bactiguard Holding 29.55 14.250 2815572 NA <NA>
4 MSC Group B 20.87 3.070 671855 NA <NA>
5 NeuroVive Pharmaceutical 18.07 9.800 3280944 NA <NA>
6 Shelton Petroleum B 16.21 3.800 2135798 NA <NA>
Warning message:
In left_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector

There are two problems.
Not specifying the by argument in left_join: In this case, by default all the columns are used as the variables to join by. If we look at the columns - "Gain.Month.1", "Last.Price", "Vol.Month.1" - all numeric class and do not have a matching value in each of the datasets. So, it is better to join by "Firm"
left_join(Avanza.XML, checkpoint, by = "Firm")
The "Firm" column class - factor: We get warning when there is difference in the levels of the factor column (if it is the variable that we join by). In order to remove the warning, we can either convert the "Firm" column in both datasets to character class
Avanza.XML$Firm <- as.character(Avanza.XML$Firm)
checkpoint$Firm <- as.character(checkpoint$Firm)
Or if we still want to keep the columns as factor, then change the levels in the "Firm" to include all the levels in both the datasets
lvls <- sort(unique(c(levels(Avanza.XML$Firm),
levels(checkpoint$Firm))))
Avanza.XML$Firm <- factor(Avanza.XML$Firm, levels=lvls)
checkpoint$Firm <- factor(checkpoint$Firm, levels=lvls)
and then do the left_join.

Lagging Forward in plm

This is a very simple question, but I haven't been able to find a definitive answer, so I thought I would ask it. I use the plm package for dealing with panel data. I am attempting to use the lag function to lag a variable FORWARD in time (the default is to retrieve the value from the previous period, and I want the value from the NEXT). I found a number of old articles/questions (circa 2009) suggesting that this is possible by using k=-1 as an argument. However, when I attempt this, I get an error.
Sample code:
library(plm)
df<-as.data.frame(matrix(c(1,1,1,2,2,3,20101231,20111231,20121231,20111231,20121231,20121231,50,60,70,120,130,210),nrow=6,ncol=3))
names(df)<-c("individual","date","data")
df$date<-as.Date(as.character(df$date),format="%Y%m%d")
df.plm<-pdata.frame(df,index=c("individual","date"))
Lagging:
lag(df.plm$data,0)
##returns
1-2010-12-31 1-2011-12-31 1-2012-12-31 2-2011-12-31 2-2012-12-31 3-2012-12-31
50 60 70 120 130 210
lag(df.plm$data,1)
##returns
1-2010-12-31 1-2011-12-31 1-2012-12-31 2-2011-12-31 2-2012-12-31 3-2012-12-31
NA 50 60 NA 120 NA
lag(df.plm$data,-1)
##returns
Error in rep(1, ak) : invalid 'times' argument
I've also read that plm.data has replaced pdata.frame for some applications in plm. However, plm.data doesn't seem to work with the lag function at all:
df.plm<-plm.data(df,indexes=c("individual","date"))
lag(df.plm$data,1)
##returns
[1] 50 60 70 120 130 210
attr(,"tsp")
[1] 0 5 1
I would appreciate any help. If anyone has another suggestion for a package to use for lagging, I'm all ears. However, I do love plm because it automagically deals with lagging across multiple individuals and skips gaps in the time series.

EDIT2: lagging forward (=leading values) is implemented in plm CRAN releases >= 1.6-4 .
Functions are either lead() or lag() (latter with a negative integer for leading values).
Take care of any other packages attached that use the same function names. To be sure, you can refer to the function by the full namespace, e.g., plm::lead.
Examples from ?plm::lead:
# First, create a pdata.frame
data("EmplUK", package = "plm")
Em <- pdata.frame(EmplUK)
# Then extract a series, which becomes additionally a pseries
z <- Em$output
class(z)
# compute negative lags (= leading values)
lag(z, -1)
lead(z, 1) # same as line above
identical(lead(z, 1), lag(z, -1)) # TRUE

The collapse package in CRAN has a C++ based function flag and also associated lag/lead operators L and F. It supports continuous sequences of lags/leads (positive and negative n values), and plm pseries and pdata.frame classes. Performance: 100x faster than plm and 10x faster than data.table (the fastest in R at the time of writing). Example:
library(collapse)
pwlddev <- plm::pdata.frame(wlddev, index = c("iso3c", "year"))
head(flag(pwlddev$LIFEEX, -1:1)) # A sequence of lags and leads
F1 -- L1
ABW-1960 66.074 65.662 NA
ABW-1961 66.444 66.074 65.662
ABW-1962 66.787 66.444 66.074
ABW-1963 67.113 66.787 66.444
ABW-1964 67.435 67.113 66.787
ABW-1965 67.762 67.435 67.113
head(L(pwlddev$LIFEEX, -1:1)) # Same as above
head(L(pwlddev, -1:1, cols = 9:12)) # Computing on columns 9 through 12
iso3c year F1.PCGDP PCGDP L1.PCGDP F1.LIFEEX LIFEEX L1.LIFEEX F1.GINI GINI L1.GINI
ABW-1960 ABW 1960 NA NA NA 66.074 65.662 NA NA NA NA
ABW-1961 ABW 1961 NA NA NA 66.444 66.074 65.662 NA NA NA
ABW-1962 ABW 1962 NA NA NA 66.787 66.444 66.074 NA NA NA
ABW-1963 ABW 1963 NA NA NA 67.113 66.787 66.444 NA NA NA
ABW-1964 ABW 1964 NA NA NA 67.435 67.113 66.787 NA NA NA
ABW-1965 ABW 1965 NA NA NA 67.762 67.435 67.113 NA NA NA
F1.ODA ODA L1.ODA
ABW-1960 NA NA NA
ABW-1961 NA NA NA
ABW-1962 NA NA NA
ABW-1963 NA NA NA
ABW-1964 NA NA NA
ABW-1965 NA NA NA
library(microbenchmark)
library(data.table)
microbenchmark(plm_class = flag(pwlddev),
ad_hoc = flag(wlddev, g = wlddev$iso3c, t = wlddev$year),
data.table = qDT(wlddev)[, shift(.SD), by = iso3c])
Unit: microseconds
expr min lq mean median uq max neval cld
plm_class 462.313 512.5145 1044.839 551.562 637.6875 15913.17 100 a
ad_hoc 443.124 519.6550 1127.363 559.817 701.0545 34174.05 100 a
data.table 7477.316 8070.3785 10126.471 8682.184 10397.1115 33575.18 100 b

I had this same problem and couldn't find a good solution in plm or any other package. ddply was tempting (e.g. s5 = ddply(df, .(country,year), transform, lag=lag(df[, "value-to-lag"], lag=3))), but I couldn't get the NAs in my lagged column to line up properly for lags other than one.
I wrote a brute force solution that iterates over the dataframe row-by-row and populates the lagged column with the appropriate value. It's horrendously slow (437.33s for my 13000x130 dataframe vs. 0.012s for turning it into a pdata.frame and using lag) but it got the job done for me. I thought I would share it here because I couldn't find much information elsewhere on the internet.
In the function below:
df is your dataframe. The function returns df with a new column containing the forward values.
group is the column name of the grouping variable for your panel data. For example, I had longitudinal data on multiple countries, and I used "Country.Name" here.
x is the column you want to generate lagged values from, e.g. "GDP"
forwardx is the (new) column that will contain the forward lags, e.g. "GDP.next.year".
lag is the number of periods into the future. For example, if your data were taken in annual intervals, using lag=5 would set forwardx to the value of x five years later.
.
add_forward_lag <- function(df, group, x, forwardx, lag) {
for (i in 1:(nrow(df)-lag)) {
if (as.character(df[i, group]) == as.character(df[i+lag, group])) {
# put forward observation in forwardx
df[i, forwardx] <- df[i+lag, x]
}
else {
# end of group, no forward observation
df[i, forwardx] <- NA
}
}
# last elem(s) in forwardx are NA
for (j in ((nrow(df)-lag+1):nrow(df))) {
df[j, forwardx] <- NA
}
return(df)
}
See sample output using built-in DNase dataset. This doesn't make sense in context of the dataset, but it lets you see what the columns do.
require(DNase)
add_forward_lag(DNase, "Run", "density", "lagged_density",3)
Grouped Data: density ~ conc | Run
Run conc density lagged_density
1 1 0.04882812 0.017 0.124
2 1 0.04882812 0.018 0.206
3 1 0.19531250 0.121 0.215
4 1 0.19531250 0.124 0.377
5 1 0.39062500 0.206 0.374
6 1 0.39062500 0.215 0.614
7 1 0.78125000 0.377 0.609
8 1 0.78125000 0.374 1.019
9 1 1.56250000 0.614 1.001
10 1 1.56250000 0.609 1.334
11 1 3.12500000 1.019 1.364
12 1 3.12500000 1.001 1.730
13 1 6.25000000 1.334 1.710
14 1 6.25000000 1.364 NA
15 1 12.50000000 1.730 NA
16 1 12.50000000 1.710 NA
17 2 0.04882812 0.045 0.123
18 2 0.04882812 0.050 0.225
19 2 0.19531250 0.137 0.207
Given how long this takes, you may want to use a different approach: backwards-lag all of your other variables.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Identifying individuals with observations across two datasets - r

Related

R count number of NA values for each row of a CSV

R - enter basic formula

R: Selection for x,y coordinates with conditions

dplyr::left_join produce NA values for new joined columns

Lagging Forward in plm

Categories

Resources