I try to "combine" multiple sequenzes like 'large numeric' or 'large char' to a single 'large numeric' or 'large char' sequence while keeping the duplicates of the combined object elements and not changing the order of elements.
union() does almost what i want
x <- c(0,1,6,2,3,4,5)
y <- c(6,0,0,1,3,0,4,5,1,3,-1)
z <- union(x,y)
z
#results in
#[1] 0 1 6 2 3 4 5 -1
#but what i need is:
#[1] 0 1 6 2 3 4 5 6 0 0 1 3 0 4 5 1 3 -1
Since x and y become huge (up to millions of values) a loop attempt might fail due to computation time.
In R there are a lot of functions to combine all kinds of data; therefore hours of search yielded not what i needed, but the solution can´t be so hard to find (frustration)
We can use concatenate
c(x, y)
Related
I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.
I'm using:
x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))
but this feels klugey to me. Does anyone know a better solution or a different approach?
How about cut:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
You question is inconsistent.
In description 10 belongs to "10+" group, but in code 10 is separated level.
If 10 should be in the "10+" group then you code should be
as.factor(ifelse(x >= 10,"10+",x))
In this case you could truncate data to 10 (if you don't want a factor):
pmin(x, 10)
# [1] 0 1 3 4 2 4 2 5 10 10 10 2 10 2 10 3 4 2 10
x[x>=10]<-"10+"
This will give you a vector of strings. You can use as.numeric(x) to convert back to numbers ("10+" become NA), or as.factor(x) to get your result above.
Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.
This question already has answers here:
R: Get the min/max of each item of a vector compared to single value
(1 answer)
Replace negative values by zero
(5 answers)
Closed 1 year ago.
NOTE: I technically know how to do this, but I feel like there has to be a "nicer" way to do this. If such questions are not allowed here just delete it, but I would really like to improve my R style, so any suggestions are welcome.
I have a dataframe data <- data.frame(foo=rep(c(-1,2),5))
foo
1 -1
2 2
3 -1
4 2
5 -1
6 2
7 -1
8 2
9 -1
10 2
Now I would like to be able to set the entries of foo to a certain value (for this example, let's say 1) if the current entry is smaller than that value.
So my desired output would be
foo
1 1
2 2
3 1
4 2
5 1
6 2
7 1
8 2
9 1
10 2
I feel like there should be something like data$foo <- max(data$foo,1) that does the job (but ofc, it "maxes" over the whole column).
Is there an elegant way to do this?
data$foo <- ifelse(data$foo < 1,1,data$foo) and data$foo <- lapply(data$foo,function(x) max(1,x)) just feel somewhat "ugly".
max gives you maximum of the whole column but for your case you need pmax(parallel maximum) so it gives you maximum of 1 or each number in the vector.
data$foo <- pmax(data$foo, 1)
data
# foo
#1 1
#2 2
#3 1
#4 2
#5 1
#6 2
#7 1
#8 2
#9 1
#10 2
This works:
data <- data.frame(foo=rep(c(-1,2),5))
val <- 1
data[data$foo < val, ] <- val
Let's break this down. data$foo takes the column and makes it into a vector. data$foo < val checks which elements of this vector are smaller than val, creating a new vector of similar lenghts filled with TRUE and FALSE at the correct positions.
Finally, the entire line data[data$foo < val, ] <- val uses that vector of TRUE and FALSE to select the rows (using the [, ]) of data to which val is now used.
I have a data frame that contains multiple values in each spot, like this:
ID<-c(1,1,1,2,2,2,2,3,3,4,4,4,5,6,6)
W<-c(29,72,32,33,34,44,42,78,32,42,18,26,10,34,39)
df1<-data.frame(ID, W)
df<-ddply(df1, .(ID), summarize,
X=paste(unique(W),collapse=","))
ID X
1 1 29,72,32
2 2 33,34,44,42
3 3 78,32
4 4 42,18,26
5 5 10
6 6 34,39
I am trying to generate another column using an if-else function so that every ID that has an X value greater than 70 will show a 1, and all others will show a 0, like this:
ID X Y
1 1 29,72,32 1
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
This is the code that I tried:
df$Y <- ifelse(df$X>=70, 1, 0)
But it doesn't work; it only seems to put the first value of each spot through the function:
ID X Y
1 1 29,72,32 0
2 2 33,34,44,42 0
3 3 78,32 1
4 4 42,18,26 0
5 5 10 0
6 6 34,39 0
It worked fine on my one column that has only one value per spot. Is there a way to get to the if-else function to evaluate every value in each spot and assign a 1 if any of them fit the statement?
Thank you, I'm sorry that I do not know a lot of R vocabulary yet.
As 'X' is a string, we can split the 'X' at the , to create a list of vectors, loop over the list with map check if there are any numeric converted values are greater than 70
library(dplyr)
library(purrr)
df %>%
mutate(Y = map_int(strsplit(X, ","), ~ +(any(as.numeric(.x) > 70))))
I have task to multiply numbers in vector, but only those that can be divided by 3 modulo 0. I figured out how to replace certain elements in vector by different numbers, but it works only if i replace with certain number. I wasn't able to find any answer here http://www.r-tutor.com/r-introduction/vector or even on this site. Everyone only extracting values to another vector.
x <- c(1,1,2,2,2,3,3)
x[x%%2==0] = 5
# [1] 1 1 5 5 5 3 3
why this doesn't work ?
x[x%%3==0] = x*3
I expect to get this:
c(1,1,5,5,5,9,9)
The assignment vectors are not the same on the lhs and rhs of the assignment operator.
length(x*3)
#[1] 7
length(x[x%%3 ==0])
#[1] 2
We need to do
x[x%%3==0] <- x[x%%3==0]*3
x
#[1] 1 1 5 5 5 9 9
Instead of repeating the logical vector, an object can be created and then do the substitution
i1 <- x%%3 == 0
x[i1] <- x[i1]*3
In the first assignment, there was only a single element and it was assigned to replace the values returned by the logical condition is met
Another option is
pmax(x, x*(!x%%3)*3)
#[1] 1 1 5 5 5 9 9
Why is it that I can't assign a value to an entire column of a data frame, and then a single element in the same "within" statement? The code:
foo <- data.frame( a=seq(1,10) )
foo <- within(foo, {
b <- 1 # set all of b to 1
})
foo <- within(foo, {
c <- 1 # set all of c to 1
c[2] <- 20 # set one element to 20
b[2] <- 20
})
foo
Gives:
a b c
1 1 1 1
2 2 20 20
3 3 1 1
4 4 1 20
5 5 1 1
6 6 1 20
7 7 1 1
8 8 1 20
9 9 1 1
10 10 1 20
The value of b is what I expected. The value of c is strange. It seems to do what I expect if the assignment to the entire column (ie b <- 1) is in a different "within" statement than the assignment to a single element (ie b[2] <- 20). But not if they're in the same "within".
Is this a bug, or something I just don't understand about R?
My guess is that the assignments to new columns are done as you "leave" the function. When doing
c <- 1
c[2] <- 20
all you have really created is a vector c <- c(1, 20). When R has to assign this to a new column, the vector is recycled, creating the 1,20,1,20,... pattern you are seeing.
That's an interesting one.
It has to do with the fact that c is defined only up to length 2, and after that the typical R "recycling rule" takes over and repeats c until it matches the length of the data frame. (And as an aside, this only works for whole multiples: you would not be able to replicate a vector of length 3 or 4 in a data frame of ten 10 rows.)
Recycling has its critics. I think it is an asset for a dynamically-typed interpreted language R, particularly when one wants to interactively explore data. "Expanding" data to fit a container and expression is generally a good thing -- even if it gives the odd puzzle as it does here.