Fill in missing rows in data in R - r

Suppose I have a data frame like this:
1 8
2 12
3 2
5 -6
6 1
8 5
I want to add a row in the places where the 4 and 7 would have gone in the first column and have the second column for these new rows be 0, so adding these rows:
4 0
7 0
I have no idea how to do this in R.
In excel, I could use a vlookup inside an iferror. Is there a similar combo of functions in R to make this happen?
Edit: also, suppose that row 1 was missing and needed to be filled in similarly. Would this require another solution? What if I wanted to add rows until I reached ten rows?

Use tidyr::complete to fill in the missing sequence between min and max values.
library(tidyr)
library(rlang)
complete(df, V1 = min(V1):max(V1), fill = list(V2 = 0))
#Or using `seq`
#complete(df, V1 = seq(min(V1), max(V1)), fill = list(V2 = 0))
# V1 V2
# <int> <dbl>
#1 1 8
#2 2 12
#3 3 2
#4 4 0
#5 5 -6
#6 6 1
#7 7 0
#8 8 5
If we already know min and max of the dataframe we can use them directly. Let's say we want data from V1 = 1 to 10, we can do.
complete(df, V1 = 1:10, fill = list(V2 = 0))
If we don't know the column names beforehand, we can do something like :
col1 <- names(df)[1]
col2 <- names(df)[2]
complete(df, !!sym(col1) := 1:10, fill = as.list(setNames(0, col2)))
data
df <- structure(list(V1 = c(1L, 2L, 3L, 5L, 6L, 8L), V2 = c(8L, 12L,
2L, -6L, 1L, 5L)), class = "data.frame", row.names = c(NA, -6L))

Related

How to merge two data frame which has jumbled column names

I have 2 data frames df1 and df2 with the same column names but in different column numbers. How to merge as df3 without creating additional columns/rows.
df1
a b c
1 3 6
df2
b c a
5 6 1
expected df3
a b c
1 3 6
1 5 6
Tried below code but it did not work
df3=merge(df1, df2, by = "col.names")
We may use bind_rows which automatically find the matching column names and if it is not there, it will add a NA row for those doesn't have. The order of columns will be based on the order from the first dataset input in `bind_rows i.e. df1
library(dplyr)
bind_rows(df1, df2)
-output
a b c
1 1 3 6
2 1 5 6
data
df1 <- structure(list(a = 1L, b = 3L, c = 6L), class = "data.frame", row.names = c(NA,
-1L))
df2 <- structure(list(b = 5L, c = 6L, a = 1L), class = "data.frame", row.names = c(NA,
-1L))
Rearrange columns of any one dataframe according on another dataframe so both the columns have the same order of column names and then use rbind.
rbind(df1, df2[names(df1)])
# a b c
#1 1 3 6
#2 1 5 6
In this case, using rbind(df1, df2) should work too.

Repeat row in a column of a data table

I have a data table which includes NAs in some cells as below.
Datatable:
enter image description here
However, I want to repeat 1st row in the column called "Category" to the following two rows written "NA" without any change in other columns which are "Numeric" and "Numeric.null". Same thing for 4th row in Category, repeat it to 5th and 6th rows but no change in other columns.
New:
2
I'm just learning R programming. I have tried rep function. But I couldn't do. Please help me.
We can use fill from tidyr
library(dplyr)
library(tidyr)
df1 <- df1 %>%
fill(Category)
df1
# Category Numeric Numeric.null
#1 A 1 1
#2 A 2 2
#3 A 3 4
#4 D 4 7
#5 D 5 6
#6 D 6 8
#7 E 7 11
Or using data.table with na.locf0
library(data.table)
library(zoo)
setDT(df1)[, Category := na.locf0(Category)][]
data
df1 <- structure(list(Category = c("A", NA, NA, "D", NA, NA, "E"), Numeric = 1:7,
Numeric.null = c(1L, 2L, 4L, 7L, 6L, 8L, 11L)),
class = "data.frame", row.names = c(NA,
-7L))

How to use column indices to collect values from columns in R

x y z column_indices
6 7 1 1,2
5 4 2 3
1 3 2 1,3
I have the column indices of the values I would like to collect in a separate column like so, what I want to create is something like this:
x y z column_indices values
6 7 1 1,2 6,7
5 4 2 3 2
1 3 2 1,3 1,2
What is the simplest way to do this in R?
Thanks!
In base R, we can use apply, split the column_indices on ',', convert them to integer and get the corresponding value from the row.
df$values <- apply(df, 1, function(x) {
inds <- as.integer(strsplit(x[4], ',')[[1]])
toString(x[inds])
})
df
# x y z column_indices values
#1 6 7 1 1,2 6, 7
#2 5 4 2 3 2
#3 1 3 2 1,3 1, 2
data
df <- structure(list(x = c(6L, 5L, 1L), y = c(7L, 4L, 3L), z = c(1L,
2L, 2L), column_indices = structure(c(1L, 3L, 2L), .Label = c("1,2",
"1,3", "3"), class = "factor")), class = "data.frame", row.names = c(NA, -3L))
One solution involving dplyr and tidyr could be:
df %>%
pivot_longer(-column_indices) %>%
group_by(column_indices) %>%
mutate(values = toString(value[1:n() %in% unlist(strsplit(column_indices, ","))])) %>%
pivot_wider(names_from = "name", values_from = "value")
column_indices values x y z
<chr> <chr> <int> <int> <int>
1 1,2 6, 7 6 7 1
2 3 2 5 4 2
3 1,3 1, 2 1 3 2

Sum a variable by group & create new column with frequency [duplicate]

This question already has answers here:
Apply several summary functions (sum, mean, etc.) on several variables by group in one call
(7 answers)
Closed 6 years ago.
I have 2 columns of data. The first one is an id and the second one a value.
There may be many occurrences of the same id.
I need to aggregate the data by summing all the values for the same id AND I would like to create a new column with the number of occurrences of the same id.
For example:
id value
1 15
1 10
2 5
3 7
1 4
3 12
4 16
I know I can use aggregate to sum the values and reduce the table to 4 rows, but I would like an extra column with the number of occurrences of the id like this:
id value freq
1 29 3
2 5 1
3 19 2
4 16 1
Thank you
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'id', get the sum of 'value' and also the number of rows with (.N)
library(data.table)
setDT(df1)[, .(value=sum(value), freq = .N) , by = id]
# id value freq
#1: 1 29 3
#2: 2 5 1
#3: 3 19 2
#4: 4 16 1
Or as #Frank commented
dcast(setDT(df1), id ~ ., fun = list(sum, length))
Or a similar approach with dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
summarise(value = sum(value), freq = n())
Using base R, one can can combine aggregate() and table() like this:
cbind(aggregate(value ~ id, df1, sum), freq=as.vector(table(df1$id)))
# id value freq
#1 1 29 3
#2 2 5 1
#3 3 19 2
#4 4 16 1
data used in this example:
df1 <- structure(list(id = c(1L, 1L, 2L, 3L, 1L, 3L, 4L),
value = c(15L, 10L, 5L, 7L, 4L, 12L, 16L)),
.Names = c("id", "value"), class = "data.frame",
row.names = c(NA, -7L))

Fill a column's blank spaces contingent on a second column in R

I'd appreciate some help with this one. I have something similar to the data below.
df$A df$B
1 .
1 .
1 .
1 6
2 .
2 .
2 7
What I need to do is fill in df$B with each value that corresponds to the end of the run of values in df$A. Example below.
df$A df$B
1 6
1 6
1 6
1 6
2 7
2 7
2 7
Any help would be welcome.
It seems to me that the missing values are denoted by .. It is better to read the dataset with na.strings="." so that the missing values will be NA. For the current dataset, the 'B' column would be character/factor class (depending upon whether you used stringsAsFactors=FALSE/TRUE (default) in the read.table/read.csv.
Using data.table, we convert the data.frame to data.table (setDT(df1)), change the 'character' class to 'numeric' (B:= as.numeric(B)). This will also result in coercing the . to NA (a warning will appear). Grouped by "A", we change the "B" values to the last element (B:= B[.N])
library(data.table)
setDT(df1)[,B:= as.numeric(B)][,B:=B[.N] , by = A]
# A B
#1: 1 6
#2: 1 6
#3: 1 6
#4: 1 6
#5: 2 7
#6: 2 7
#7: 2 7
Or with dplyr
library(dplyr)
df1 %>%
group_by(A) %>%
mutate(B= as.numeric(tail(B,1)))
Or using ave from base R
df1$B <- with(df1, as.numeric(ave(B, A, FUN=function(x) tail(x,1))))
data
df1 <- structure(list(A = c(1L, 1L, 1L, 1L, 2L, 2L, 2L), B = c(".",
".", ".", "6", ".", ".", "7")), .Names = c("A", "B"),
class = "data.frame", row.names = c(NA, -7L))

Resources