I have two vectors of the same length and I'm trying to combine them such that they fill out each others missing values. For example:
a=c("",1,2,"")
b=c(5,"","",6)
I'm looking for this output:
5 1 2 6
Thanks much
In this case, the normally numeric comparison via pmax also works:
as.numeric(pmax(a,b))
#[1] 5 1 2 6
This is because R will resort to alphanumeric sorting when max/min etc are applied to character data:
max(c("b","a"))
#[1] "b"
And:
as.numeric(paste(a,b))
[1] 5 1 2 6
Or:
a[a==""] <- b[b!=""]
as.numeric(a)
# [1] 5 1 2 6
a[a == ""] <- 0
b[b == ""] <- 0
a <- as.numeric(a)
b <- as.numeric(b)
output <- a + b
as.numeric(ifelse(a != "", a, b))
Related
I am trying to make non-overlapping subsets of a totally inclusive group in R. The first subset contains pairs of elements from the totally inclusive group. The other subset should be all of the elements in the totally inclusive group, but not in the first subset.
poplength <- 10
samples <- 7
numpair <- 2
totallyinclusivegroup <- sample(1:poplength, samples)
Subset1 <- sample(totallyinclusivegroup, size = numpair*2)
I don't know how to get a "Subset2" that includes everything in "totallyinclusivegroup" but not in Subset 1. I've tried using the "-" operator, with no success. For example,
Subset2 <- totallyinclusivegroup[-Subset1]
does not work, and includes elements from Subset1. Any advice/help is appreciated.
We can negate with ! on the logical vector from %in% so that TRUE -> FALSE and viceversa
out <- totallyinclusivegroup[!totallyinclusivegroup %in% Subset1]
-output
Subset1
#[1] 2 6 9 7
totallyinclusivegroup
#[1] 3 2 6 1 9 7 8
out
#[1] 3 1 8
Or an easier option is setdiff
setdiff(totallyinclusivegroup, Subset1)
#[1] 3 1 8
If there are duplicate elements, it is better to use vsetdiff from vecsets
library(vecsets)
vsetdiff(totallyinclusivegroup, Subset1)
Try:
#Code
Subset2 <- totallyinclusivegroup[-which(totallyinclusivegroup%in% Subset1 )]
Output:
totallyinclusivegroup
[1] 8 5 10 2 9 1 3
Subset1
[1] 5 10 3 9
Subset2
[1] 8 2 1
I am doing an exercise to practice writing functions.
I'm trying to figure out the general code before writing the function that reproduces the output from the table function. So far, I have the following:
set.seed(111)
vec <- as.integer(runif(10, 5, 20))
x <- sort(unique(vec))
for (i in x) {
c <- length(x[i] == vec[i])
print(c)
}
But this gives me the following output:
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
[1] 1
I don't think I'm subsetting correctly in my loop. I've been watching videos, but I'm not quite sure where I'm going wrong. Would appreciate any insight!
Thanks!
We can sum the logical vector concatenate it to count
count <- c()
for(number in x) count <- c(count, sum(vec == number))
count
#[1] 3 1 4 1 5 4 3 2 7
In the OP's for loop, it is looping over the 'x' values and not on the sequence of 'x'
If we do
for(number in x) count <- c(count, length(vec[vec == number]))
it should work as well
You can try sapply + setNames to achieve the same result like table, i.e.,
count <- sapply(x, function(k) setNames(sum(k==vec),k))
or
count <- sapply(x, function(k) setNames(length(na.omit(match(vec,k))),k))
such that
> count
1 2 3 4 5 6 7 8 9
3 1 4 1 5 4 3 2 7
Here is a solution without using unique and with one pass through the vector (if only R was fast with for loops!):
count = list()
for (i in vec) {
val = as.character(i)
if (is.null(count[[val]]))
count[[val]] = 1
else
count[[val]] = count[[val]] + 1
}
unlist(count)
Completely new to R and am trying to count how many numbers in a list are larger than the one right before.
This is what I have so far,
count <- 0
number <- function(value) {
for (i in 1:length(value))
{ if(value[i+1] > value[i])
{count <- count + 1}
}
}
x <- c(1,2,1,1,3,5)
number(x)
The output should be 3 based on the list.
Any help or advice would be greatly appreciated!
A base R alternative would be diff
sum(diff(x) > 0)
#[1] 3
Or we can also eliminate first and last values and compare them.
sum(x[-1] > x[-length(x)])
#[1] 3
where
x[-1]
#[1] 2 1 1 3 5
x[-length(x)]
#[1] 1 2 1 1 3
You can lag your vector and count how many times your initial vector is greater than your lagged vector
library(dplyr)
sum(x>lag(x), na.rm = TRUE)
In details, lag(x) does:
> lag(x)
[1] NA 1 2 1 1 3
so x > lag(x) does
> x>lag(x)
[1] NA TRUE FALSE FALSE TRUE TRUE
The sum of the above is 3.
Fake data for illustration:
df <- data.frame(a=c(1,2,3,4,5), b=(c(2,2,2,2,NA)),
c=c(NA,2,3,4,5)))
This would get me the answer I want IF it weren't for the NA values:
df$count <- with(df, (a==1) + (b==2) + (c==3))
Also, would there be an even more elegant way if I was only interested in, e.g. variables==2?
df$count <- with(df, (a==2) + (b==2) + (c==2))
Many thanks!
The following works for your specific example, but I have a suspicion that your real use case is more complicated:
df$count <- apply(df,1,function(x){sum(x == 1:3,na.rm = TRUE)})
> df
a b c count
1 1 2 NA 2
2 2 2 2 1
3 3 2 3 2
4 4 2 4 1
5 5 NA 5 0
but this general approach should work. For instance, your second example would be something like this:
df$count <- apply(df,1,function(x){sum(x == 2,na.rm = TRUE)})
or more generally you could allow yourself to pass in a variable for the comparison:
df$count <- apply(df,1,function(x,compare){sum(x == compare,na.rm = TRUE)},compare = 1:3)
Another way is to subtract your target vector from each row of your data.frame, negate and then do rowSums with na.rm=TRUE:
target <- 1:3
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 2 1 2 1 0
target <- rep(2,3)
rowSums(!(df-rep(target,each=nrow(df))),na.rm=TRUE)
[1] 1 3 1 1 0
I'm not sure how to do this without getting an error. Here is a simplified example of my problem.
Say I have this data frame DF
a b c d
1 2 3 4
2 3 4 5
3 4 5 6
Then I have a variable
x <- min(c(1,2,3))
Now I want do do the following
y <- DF[a == x]
But when I try to refer to some variable like "x" I get an error because R is looking for a column "x" in my data frame. I get the "undefined columns selected" error
How can I do what I am trying to do in R?
You may benefit from reading an Introduction to R, especially on matrices, data.frames and indexing. Your a is a column of a data.frame, your x is a scalar. The comparison you have there does not work.
Maybe you meant
R> DF$a == min(c(1,2,3))
[1] TRUE FALSE FALSE
R> DF[,"a"] == min(c(1,2,3))
[1] TRUE FALSE FALSE
R>
which tells you that the first row fits but not the other too. Wrapping this in which() gives you indices instead.
I think this is what you're looking for:
> x <- min(DF$a)
> DF[DF$a == x,]
a b c d
1 1 2 3 4
An easier way (avoiding the 'x' variable) would be this:
> DF[which.min(DF$a),]
a b c d
1 1 2 3 4
or this:
> subset(DF, a==min(a))
a b c d
1 1 2 3 4