I have would like to add a vector to a column, without specifying the other columns. I have example data as follows.
library(data.table)
dat <- fread("A B C D
one 2 three four
two 3 NA one")
vector_to_add <- c("five", "six")
Desired ouput:
out <- fread("A B C D
one 2 three four
two 3 NA one
NA NA five NA
NA NA six NA")
I saw some answers using an approach where vectors are used to rowbind:
row3 < c(NA, NA, "five", NA)
I would however like to find a solution in which I do not have specify the whole row.
EDIT: Shortly after posting I realised that it would probably be easiest to take an existing row, make the row NA, and replace the value in the column where the vector would be added, for each entry in the vector. This is however still quite a cumbersome solution I guess.
If you name your vector, then you can rbind that column and fill the rest of the cells with NAs.
df_to_add <- data.frame(C=c("five", "six"))
rbind(dat, df_to_add, fill=TRUE)
A B C D
1: one 2 three four
2: two 3 <NA> one
3: <NA> NA five <NA>
4: <NA> NA six <NA>
You can use the rbindlist() function from the data.table package to add a vector to a column in a data table without specifying the other columns. The rbindlist() function allows you to create a list of vectors or data tables and combine them into a single data table.
In your case, you can create a new vector with the values you want to add to the data table and use the rbindlist() function to append the vector to the data table. For example, the following code shows how to add the vector vector_to_add to the data table dat:
library(data.table)
dat <- fread("A B C D
one 2 three four
two 3 NA one")
vector_to_add <- c("five", "six")
# Create a new vector with the values to add to the data table
new_vector <- c(NA, NA, vector_to_add[1], NA)
# Use rbindlist() to append the new vector to the data table
out <- rbindlist(list(dat, new_vector))
# Add the second value from the vector to the data table
out <- rbindlist(list(out, c(NA, NA, vector_to_add[2], NA)))
After running this code, the data table out should contain the desired output:
A B C D
1: one 2 three four
2: two 3 NA one
3: NA NA five NA
4: NA NA six NA
You can use the rbindlist() function to append multiple vectors to the data table in a similar way.
Related
In the case that a=matrix(c(1,2,3,4),nrow=2,ncol=2) and b=c('name',3). I am trying to merge a and b such that the outcome is [1 3 name 3] in the first row and [2 4] in the second row.
The number of rows differs in each dataframe. Therefore cbind is going to have a hard time merging the data and will by default loop the shorter dataframe, in this case b.
I would suggest adding in the rowname as a column and then binding on that. By default, full_join will then generate NA values for dataframes missing that value of the bind. This question is partially a duplicate of Add (not merge!) two data frames with unequal rows and columns so you may find more help there.
# Load packages
library(tidyverse)
library(magrittr) # To use the inplace assignment operator (%<>%)
# Create dataframes
a <- data.frame(1:2,3:4)
b <- merge('name', 3)
# Create rowname column for each dataframe
a %<>% tibble::rownames_to_column()
b %<>% tibble::rownames_to_column()
# Use 'full join' to bind dataframes together
c <- dplyr::full_join(a, b, by=rowname) %>%
# Remove the rowname column
dplyr::select(-rowname)
# Print c
print(c)
X1.2 X3.4 x y
1 1 3 name 3
2 2 4 <NA> NA
If you are satisfied with a list, not data frame, this will work.
a <- matrix(c(1,2,3,4),nrow=2,ncol=2)
b <- c('name',3)
c <- list(a[,1],a[,2],b[1],b[2] )
If you need a data frame,
you have to make the 1st and 2nd row have the same number of columns, by stuffing the gaps with something.
d <- as.data.frame(c)
d[2,3:4] <- NA
I have the following string of characters:
pig<-c("A","B","C","D","AB","ABC","AB","AA","CD","CA",NA)
I am trying to get R to tell me how many of each total letters there are and how many total NAs there are. Thus, in this case I would like to the result to look like this:
print(cow)
A B C D NA
6 3 4 2 1
I have tried table in combination with strsplit but cannot figure out exactly how to do it. Any thoughts? Thanks!
You would need to use NULL (or the empty character "") for the split value in strsplit(), then unlist it. Then, in table() you'll want to use the useNA argument to include any NA values. Here we'll use "ifany", so that if there are any NA values they will be shown in the table and if there are not, NA will not be shown in the result at all.
table(unlist(strsplit(pig, NULL)), useNA = "ifany")
#
# A B C D <NA>
# 7 4 4 2 1
I have the following dataframe: (this is just a small sample)
VALUE COUNT AREA n_dd-2000 n_dd-2001 n_dd-2002 n_dd-2003 n_dd-2004 n_dd-2005 n_dd-2006 n_dd-2007 n_dd-2008 n_dd-2009 n_dd-2010
2 16 2431 243100 NA NA NA NA NA NA 3.402293 3.606941 4.000461 3.666381 3.499614
3 16 2610 261000 3.805082 4.013435 3.98 3.490139 3.433857 3.27813 NA NA NA NA NA
4 16 35419 3541900 NA NA NA NA NA NA NA NA NA NA NA
and I would like to combine all three rows into one row replacing NA with the number that appears in each column (there's only one number per column). Just ignore the first three columns. I used this code:
bdep[4,4:9] <- bdep[3,4:9]
to replace NA's with numbers from another row, but can't figure out how to repeat it for all the columns. The columns 4 and beyond have a sequence in each row of six numbers followed by 20 NA's, so I've tried going down the road of using lapply() and seq() or for loops, but my efforts are failing.
I did a simple solution by replacing the NA:s with zeroes and adding all rows per column. Did this work?
#data
bdep <- rbind(c(rep(NA,6),3.402293,3.606941,4.000461,3.666381,3.499614),
c(3.805082,4.013435,3.98,3.490139,3.433857,3.27813, rep(NA,5)),
c(rep(NA,11)))
#solution
bdep2 <- ifelse(is.na(bdep), 0, bdep)
bdep3 <- apply(bdep2, 2, sum)
bdep3 #the row you want?
I finally came to a solution by patching together some code I found in other posts (esp. sequencing and for loops). I think this would be considered messy coding, so I'd welcome other solutions. This should better describe what I was trying to do in the OP, where I was trying to generalize too much. Specifically, I have 17 variables, measured over 14 years (that's 238 columns), and something happened while generating these data where the first 6 years of a variable are in one row and the following 8 years are in the other row, so instead of re-run the model, I just wanted to combine the two rows into one.
Below are some sample data, simplified from my real scenario.
Create the data frame:
df <- data.frame(
VALUE = c(16, 16, 16),
COUNT = c(2431, 2610, 35419),
AREA = c(243100, 261000, 3541900),
n_dd_2000 = c(NA, 3.805, NA),
n_dd_2001 = c(3.402, NA, NA)
)
The next two lines establish a sequence starting a pattern at column 4, repeating every 1 column, repeated 2 times out in the first line, 1 time out in the second line, and how many times to repeat the sequence:
info <- data.frame(start=seq(4, by=1, length.out=2), len=rep(1,2))
info2 <- data.frame(start=seq(5, by=1, length.out=1), len=rep(1,2))
This is the code from my real dataset, where I started at column 4, repeated the pattern every 14 columns, out 17 times, and looked at the first 6, then 8 columns: info <- data.frame(start=seq(4, by=14, length.out=17), len=rep(c(6,8),17))
The two for loops below write the specified values in the sequence from row 2 and row 1 to row 3, respectively:
foo = sequence(info$len) + rep(info$start-1, info$len)
foo2 = sequence(info2$len) + rep(info2$start-1, info2$len)
for(n in 1:length(foo)){
df[3,foo[n]] <- df[2,foo[n]]
}
for(n in 1:length(foo2)){
df[3,foo2[n]] <- df[1,foo2[n]]
}
Then I removed the first two rows I got those values from and I'm left with one complete row, no NA's:
df <- df[-(1:2),]
I am trying to a simple task, and created a simple example. I would like to add the counts of a taxon recorded in a vector ('introduced',below) to the counts already measured in another vector ('existing'), according to the taxon name. However, when there is a new taxon (present in introduced by not in existing), I would like this taxon and its count to be added as a new entry in the matrix (doesn't matter what order, but name needs to be retained).
For example:
existing<-c(3,4,5,6)
names(existing)<-c("Tax1","Tax2","Tax3","Tax4")
introduced<-c(2,2)
names(introduced)<-c("Tax1","Tax5")
I want new matrix, called "combined" here, to look like this:
#names(combined)= c("Tax1","Tax2","Tax3","Tax4","Tax5")
#combined= c(5,4,5,6,2)
The main thing to see is that "Tax1"'s values are combined (3+2=5), "Tax5" (2) is added on to the end
I have looked around but previous answers similar to this have much more complex data and it is difficult to extract which function I need. I have been trying combinations of match and which, but just cannot get it right.
grp <- c(existing,introduced)
tapply(grp,names(grp),sum)
#Tax1 Tax2 Tax3 Tax4 Tax5
# 5 4 5 6 2
Instead of keeping your data in 'loose' vectors, you may consider collecting them in one data frame. First, put you two sets of vector data in data frames:
existing <- c(3, 4, 5, 6)
taxon <- c("Tax1", "Tax2", "Tax3", "Tax4")
df1 <- data.frame(existing, taxon)
introduced <- c(2, 2)
taxon <- c("Tax1", "Tax5")
df2 <- data.frame(introduced, taxon)
Then merge the two data frames by the common column, 'taxon'. Set all = TRUE to include all rows from both data frames:
df3 <- merge(df1, df2, all = TRUE)
Finally, sum 'existing' and 'introduced' taxon, and add the result to the data frame:
df3$combined <- rowSums(df3[ , c("existing", "introduced")], na.rm = TRUE)
df3
# taxon existing introduced combined
# 1 Tax1 3 2 5
# 2 Tax2 4 NA 4
# 3 Tax3 5 NA 5
# 4 Tax4 6 NA 6
# 5 Tax5 NA 2 2
I have a data frame that contains multiple rows and multiple columns.
I have a character vector that contains the names of some of the columns in the data frame. The number of columns can vary.
For each line, for each of these columns, I have to identify if one of them is not NA. (basically any(!is.na(df[namecolumns])) for each line), to then do a subset for the ones that are TRUE.
Actually, any(!is.na(df[1,][namescolumns])) works well, but it's only for the first line.
I could easily do a for loop, which is my first reflex as a programmer and because it works for the first line, but I'm sure it's not the R way and that there is a way to do this with an "apply" (lapply, mapply, sapply, tapply or other), but I can't figure out which one and how.
Thank you.
try using apply over the first dimension (rows):
apply(df, 1 function(x) any(!is.na(x[namescolumns])))
The results will come back transposed, and so, you might want to wrap the whole statement inside of t(.)
You can use a combination of lapply and Reduce
has.na.in.cols <- Reduce(`&`, lapply(colnames, function (name) !is.na(df[name])))
to get a vector of whether or not there are NA values in any of the columns in colnames, which can in turn be used to subset the data.
df[has.any.na,]
For example. Given:
df <- data.frame(a = c(1,2,3,4,NA,6,7),
b = c(2,4,6,8,10,12,14),
c = c("one","two","three","four","five","six","seven"),
d = c("a",NA,"c","d","e","f","g")
)
colnames <- c("a","d")
You can get:
> df[Reduce(`&`, lapply(colnames, function (name) !is.na(df[name]))),]
a b c d
1 1 2 one a
3 3 6 three c
4 4 8 four d
6 6 12 six f
7 7 14 seven g