copy data frame to the other matching row number - r

I have a dataframe A like below.
Notice that the first column is the row name with random order.
ID
5 10
3 10
1 10
Them. I have another 5 * 1 data frame B with NAs. I am trying to copy A to B matching the column names in A. I want to get a data frame like below.
ID
1 10
2 NA
3 10
4 NA
5 10

What you are trying to do is potentially dangerous. If you are 100% sure that the rows contain identifiers that would match between the 2 data frames, here's the code.
library(tidyverse)
# Generate a data frame that looks like yours (you don't need this)
df <- data.frame(ID=c(10, NA, 10, NA, 10))
# Assign row names to a new column on the df
df$names <- row.names(df)
# Here's how your data will look like
df<-df[complete.cases(df),]
# Make a second df
df2 <- data.frame(names=as.character(1:20))
# Join by names (what are other possible columns to join by ?)
left_join(df2, df, by="names")
This will produce
names ID
1 1 10
2 2 NA
3 3 10
4 4 NA
5 5 10
6 6 NA
7 7 NA
8 8 NA
9 9 NA
10 10 NA
11 11 NA
12 12 NA
13 13 NA
14 14 NA
15 15 NA
16 16 NA
17 17 NA
18 18 NA
19 19 NA
20 20 NA

Related

How to combine/concatenate two dataframes one after the other but not merging common columns in R?

Suppose there are two dataframes as follows with same column names and I want to combine/concatenate one after the other without merging the common columns. There is a way of assigning it columnwise like df1[3]<-df2[1] but would like to know if there's some other way.
df1<-data.frame(A=c(1:10), B=c(2:5, rep(NA,6)))
df2<-data.frame(A=c(12:20), B=c(32:40))
Expected Output:
A B A.1 B.1
1 2 12 32
2 3 13 33
3 4 14 34
4 5 15 35
5 NA 16 36
6 NA 17 37
7 NA 18 38
8 NA 19 39
9 NA 20 40
10 NA NA NA
I tend to work with multiple frames like this as a list of frames. Try this:
LOF <- list(df1, df2)
maxrows <- max(sapply(LOF, nrow))
out <- do.call(cbind, lapply(LOF, function(z) z[seq_len(maxrows),]))
names(out) <- make.names(names(out), unique = TRUE)
out
# A B A.1 B.1
# 1 1 2 12 32
# 2 2 3 13 33
# 3 3 4 14 34
# 4 4 5 15 35
# 5 5 NA 16 36
# 6 6 NA 17 37
# 7 7 NA 18 38
# 8 8 NA 19 39
# 9 9 NA 20 40
# 10 10 NA NA NA
One advantage of this is that it allows you to work with an arbitrary number of frames, not just two.
One base R way could be
setNames(Reduce(cbind.data.frame,
Map(`length<-`, c(df1, df2), max(nrow(df1), nrow(df2)))),
paste0(names(df1), rep(c('', '.1'), each=2)))
# A B A.1 B.1
# 1 1 2 12 32
# 2 2 3 13 33
# 3 3 4 14 34
# 4 4 5 15 35
# 5 5 NA 16 36
# 6 6 NA 17 37
# 7 7 NA 18 38
# 8 8 NA 19 39
# 9 9 NA 20 40
# 10 10 NA NA NA
Another option is to use the merge function. The documentation can be a bit cryptic, so here is a short explanation of the arguments:
by -- "the name "row.names" or the number 0 specifies the row names"
all = TRUE -- keeps all original rows from both dataframes
suffixes -- specify how you want the duplicated colnames to be distinguished
sort -- keep original sorting
merge(df1, df2, by = 0, all = TRUE, suffixes = c('', '.1'), sort = FALSE)
One way would be
cbind(
df1,
rbind(
df2,
rep(NA, nrow(df1) - nrow(df2))
)
)
`````

R merge two data.frame by id and sub-id while changing column names?

I have two dataframes of this format.
df1:
id x y
1 2 3
2 4 5
3 6 7
4 8 9
5 1 1
df2:
id id2 v v2
1 t 11 21
1 b 12 22
2 t 13 23
2 b 14 24
3 t 15 25
3 b 16 26
4 b 17 27
Hence, sometimes, the id in main 'df' will appear twice (maximum) sometimes once, and sometimes not at all. The expected result would be:
df_merged:
id x y v.t v2.t v.b v2.b
1 2 3 11 21 12 22
2 4 5 13 23 24 24
3 6 7 15 25 16 26
4 8 9 NA NA 17 27
5 1 1 NA NA NA NA
I have used merge but due to the fact that id2 in df2 doesn't match, I get two instances of id in df_merged like so:
id x y v v2
1 ...
1 ...
Thanks in advance!
We can start by adjusting df2 to the right format then do a normal joining.
librar(dplyr)
library(tidyr)
df2 %>% gather(key,val,-id,-id2) %>% #Transfer from wide to long format for v and v2
mutate(new_key=paste0(key,'.',id2)) %>% #Create a new id2 as new_key
select(-id2,-key) %>% #de-select the unnessary columns
spread(new_key,val) %>% #Transfer back to wide foramt with right foramt for id
right_join(df1) %>% #right join df1 "To includes all rows in df1" using id
select(id,x,y,v.t,v2.t,v.b,v2.b) #rearrange columns name
Joining, by = "id"
id x y v.t v2.t v.b v2.b
1 1 2 3 11 21 12 22
2 2 4 5 13 23 14 24
3 3 6 7 15 25 16 26
4 4 8 9 NA NA 17 27
5 5 1 1 NA NA NA NA
You can solve this just using merge. Split df2 based on whether id2 equals b or t. Merge these two new objects with df1, and finally merge them together. The code includes one additional step to also include data found in df1 but not df2.
dfb <- merge(df1, df2[df2$id2=='b',], by='id')
dft <- merge(df1, df2[df2$id2=='t',], by='id')
dfRest <- df1[!df1$id %in% df2$id,]
dfAll <- merge(dfb[,c('id','x','y','v','v2')], dft[,c('id','v','v2')], by='id', all.x=T)
merge(dfAll, dfRest, all.x=T, all.y=T)
id x y v.x v2.x v.y v2.y
1 1 2 3 12 22 11 21
2 2 4 5 14 24 13 23
3 3 6 7 16 26 15 25
4 4 8 9 17 27 NA NA
5 5 1 1 NA NA NA NA

r - Extract subsequences with specific time increments

I have a data frame df. It has several columns, two of them are dates and serial_day, corresponding to the date an observation was taken and MATLAB's serial day. I would like to restrict my time series such that the increment (in days) between two consecutive observations is 3 or 4 and separate such blocks by a NA row.
It is known that consecutive daily observations never occur and the case of 2 day separation followed by 2 day separation is rare, so it can be ignored.
In the example, increment is shown for convenience, but it is easily generated using the diff function. So, if the data frame is
serial_day increment
1 4 NA
2 7 3
3 10 3
4 12 2
5 17 5
6 19 2
7 22 3
8 25 3
9 29 4
10 34 5
I would hope to get a new data frame as:
serial_day increment
1 4 NA
2 7 3
3 10 3
4 NA ## Entire row of NAs NA
5 19 NA
6 22 3
7 25 3
8 29 4
9 NA ## Entire row of NAs NA
I can't figure out a way to do this without looping, which is bad idea in R.
First you check in which rows the increment is not equal to 3 or 4. Then you'd replace these rows with a row of NAs:
inds <- which( df$increment > 4 | df$increment < 3 )
df[inds, ] <- rep(NA, ncol(df))
# serial_day increment
# 1 4 NA
# 2 7 3
# 3 10 3
# 4 NA NA
# 5 NA NA
# 6 NA NA
# 7 22 3
# 8 25 3
# 9 29 4
# 10 NA NA
This may result in multiple consecutive rows of NAs. In order to reduce these consecutive NA-rows to a single NA-row, you'd check where the NA-rows are located with which() and then see whether these locations are consecutive with diff() and remove these rows from df:
NArows <- which(rowSums(is.na(df)) == ncol(df)) # c(4, 5, 6, 10)
inds2 <- NArows[c(FALSE, diff(NArows) == 1)] # c(5, 6)
df <- df[-inds2, ]
# serial_day increment
# 1 4 NA
# 2 7 3
# 3 10 3
# 4 NA NA
# 7 22 3
# 8 25 3
# 9 29 4
# 10 NA NA

Add an integer to every element of data frame

Say I have a data frame as follows
rsi5 rsi10
1 NA NA
2 NA NA
3 NA NA
4 NA NA
5 NA NA
6 44.96650 NA
7 39.68831 NA
8 28.35625 NA
9 37.77910 NA
10 53.54822 NA
11 52.05308 46.01867
12 80.44368 66.09973
13 60.88418 56.04507
14 53.59851 52.10633
15 46.45874 48.23648
I wish to simply add 1 (i.e. 9 becomes 10) to each non-NA element of this data frame. There is probably a very simple solution to this but simple arithmetics on dataframes do not seem to work in R giving very strange results.
Just use + 1 as you would expect. Below is a mock example as it wasn't worth copying your data for for this.
Step One: Create a data.frame
R> df <- data.frame(A=c(NA, 1, 2, 3), B=c(NA, NA, 12, 13))
R> df
A B
1 NA NA
2 1 NA
3 2 12
4 3 13
R>
Step Two: Add one
R> df + 1
A B
1 NA NA
2 2 NA
3 3 13
4 4 14
R>

How to replace columns containing NA with the contents of the previous column?

I have a large dataframe with random columns which contain NA values. It looks like this:
2002-06-26 2002-06-27 2002-06-28 2002-07-01 2002-07-02 2002-07-03 2002-07-05
1 US1718711062 NA BMG4388N1065 US0116591092 NA AN8068571086 GB00BYMT0J19
2 US9837721045 NA US0025671050 US03662Q1058 NA BMG3223R1088 US0097281069
3 NA US00847J1051 US06652V2088 NA BMG4388N1065 US0305061097
4 NA US04351G1013 US1046741062 NA BMG7496G1033 US03836W1036
5 NA US2925621052 US1431301027 NA CA88157K1012 US06652V2088
6 NA US34988V1061 US1897541041 NA CH0044328745 US1547604090
7 NA US3596941068 US2053631048 NA GB00B5BT0K07 US1778351056
8 NA US4180561072 US2567461080 NA IE00B5LRLL25 US1999081045
9 NA US4198791018 US2925621052 NA IE00B8KQN827 US3498531017
10 NA US45071R1095 US3989051095 NA IE00BGH1M568 US42222N1037
I need a code which identifies and fills out the NA columns with the contents of the previous column. So for example column "2002-06-27" should contain "US1718711062" and "US9837721045". The NA columns are at irregular intervals.
Columns are also of random length some only containing one element so I think the best way to identify columns with no values is to look at the first row like so:
row.has.na <- which(is.na(data[1,]))
[1] 2 5
To complete my comment: as you have already computed row.has.na, the vector of indices for the NA column, here is a way to use it and get what you need:
data[, row.has.na] <- data[, row.has.na - 1]
This should work. Note that this also works if two (or more) NA columns are next to each other. Maybe there is a way around the while-loop, but...
# Create some data
data <- data.frame(col1 = 1:10, col2 = NA, col3 = 10:1, col4 = NA, col5 = NA, col6 = NA)
# Find which columns contain NA in the first row
col_NA <- which(is.na(data[1,]))
# Select the previous columns
col_replace <- col_NA - 1
# Check if any NA columns are next to each other and fix it:
while(any(diff(col_replace) == 1)){
ind <- which(diff(col_replace) == 1) + 1
col_replace[ind] <- col_replace[ind] - 1
}
# Replace the NA columns with the previous columns
data[,col_NA] <- data[,col_replace]
col1 col2 col3 col4 col5 col6
1 1 1 10 10 10 10
2 2 2 9 9 9 9
3 3 3 8 8 8 8
4 4 4 7 7 7 7
5 5 5 6 6 6 6
6 6 6 5 5 5 5
7 7 7 4 4 4 4
8 8 8 3 3 3 3
9 9 9 2 2 2 2
10 10 10 1 1 1 1

Resources