Why when I use colon operator to generate a sequence it will give me different results from using (from,to) if starting number is less than 0?
i.e.:
seq1 = seq(-1,10)
returns
-1 0 1 2 3 4 5 6 7 8 9 10
whereas
seq = seq(-1:10)
returns
1 2 3 4 5 6 7 8 9 10 11 12
I'm not sure what you are expecting with seq(-1:10). The : operator is a shortcut to seq itself. So that's the same as seq(seq(-1, 10)) which is also the same as
x <- -1:10
seq(x)
and when you only pass a single parameter to seq() and that single parameter has a length greater than 1, it will return a sequence of the same length at that vector starting at one. Basically it behaves like seq_along in that case. See the ?seq help page for more info. See also
seq(c("a","b","c"))
#[1] 1 2 3
Related
I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.
I'm using:
x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))
but this feels klugey to me. Does anyone know a better solution or a different approach?
How about cut:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
You question is inconsistent.
In description 10 belongs to "10+" group, but in code 10 is separated level.
If 10 should be in the "10+" group then you code should be
as.factor(ifelse(x >= 10,"10+",x))
In this case you could truncate data to 10 (if you don't want a factor):
pmin(x, 10)
# [1] 0 1 3 4 2 4 2 5 10 10 10 2 10 2 10 3 4 2 10
x[x>=10]<-"10+"
This will give you a vector of strings. You can use as.numeric(x) to convert back to numbers ("10+" become NA), or as.factor(x) to get your result above.
Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.
I have a function that fetches some data from a database. It takes a single parameter and returns a data.frame. I would like to use an input vector of these parameters and pipe them to map or similar function that takes each elment and returns the db results. The results can differ in rows but columns are always the same. How do I go about without looping and row-binding? (for i in ..)
I tried the following route:
myfuncSingleRow<-function(nbr){
data.frame(a=nbr,b=nbr^2,c=nbr^3)}
myfuncMultipleRow<-function(nbr){
data.frame(a=rep(nbr,3),b=rep(nbr^2,3),c=rep(nbr^3,3))}
a<-data.frame(count=c(1,2,3))
myfuncSingleRow(2)
myfuncMultipleRow(2)
a %>% select(count) %>% map_dfr(.f=myfuncSingleRow) #output as expected
a %>% select(count) %>% map_dfr(.f=myfuncMultipleRow) #output not as expected
Now this does not work as intended either. Example myFuncMultipleRow, I was expecting the first 3 rows to be equal, the next 3 equal, and the same for the final 3. Example using myFuncMultipleRow:
Getting
a b c
1 1 1 1
2 2 4 8
3 3 9 27
4 1 1 1
5 2 4 8
6 3 9 27
7 1 1 1
8 2 4 8
9 3 9 27
Wanting:
a b c
1 1 1 1
2 1 1 1
3 1 1 1
4 2 4 8
5 2 4 8
6 2 4 8
7 3 9 27
8 3 9 27
9 3 9 27
As usual, I am probably not using the functions correctly, but a bit stuck here a do not want to resolve to the old loop and rbind which would probably be a performance bottleneck. Any takers?
EDIT: As pointed out "each" argument in "rep" does solve this one, but does not solve the main issue. If map did iterate and call the function for each element, then using parameter "each" and "times" for function "rep" should yield the same result. The function passed to map is not vectorized, but assumes a single parameter of length 1.
The solution need to do:
res<-data.frame()
for(i in a) res<-rbind(res,myfuncMultipleRow(i))
So, after looking at latest purrr 0.3.0 (was on older version) map_depth pointed to the right direction.
a %>% select(count)%>% map_depth(.depth=2,.f=myfuncMultipleRow) %>% map_dfr(.f=bind_rows)
Dropping map_depth() , bind_rows() and nesting instead:
a %>% select(count)%>% map_dfr(~map_dfr(.,myfuncMultipleRow))
a %>% select(count)%>% map_dfr(.f=function(x) map_dfr(x,.f=myfuncMultipleRow))
I have task to multiply numbers in vector, but only those that can be divided by 3 modulo 0. I figured out how to replace certain elements in vector by different numbers, but it works only if i replace with certain number. I wasn't able to find any answer here http://www.r-tutor.com/r-introduction/vector or even on this site. Everyone only extracting values to another vector.
x <- c(1,1,2,2,2,3,3)
x[x%%2==0] = 5
# [1] 1 1 5 5 5 3 3
why this doesn't work ?
x[x%%3==0] = x*3
I expect to get this:
c(1,1,5,5,5,9,9)
The assignment vectors are not the same on the lhs and rhs of the assignment operator.
length(x*3)
#[1] 7
length(x[x%%3 ==0])
#[1] 2
We need to do
x[x%%3==0] <- x[x%%3==0]*3
x
#[1] 1 1 5 5 5 9 9
Instead of repeating the logical vector, an object can be created and then do the substitution
i1 <- x%%3 == 0
x[i1] <- x[i1]*3
In the first assignment, there was only a single element and it was assigned to replace the values returned by the logical condition is met
Another option is
pmax(x, x*(!x%%3)*3)
#[1] 1 1 5 5 5 9 9
Suppose I have the following data.
x<- c(1,2, 3,4,5,1,3,8,2)
y<- c(4,2, 5,6,7,6,7,8,9)
data<-cbind(x,y)
x y
1 1 4
2 2 2
3 3 5
4 4 6
5 5 7
6 1 6
7 3 7
8 8 8
9 2 9
Now, if I subset this data to select only the observations with "x" between 1 and 3 I can do:
s1<- subset(data, x>=1 & x<=3)
and obtain my desired output:
x y
1 1 4
2 2 2
3 3 5
4 1 6
5 3 7
6 2 9
However, if I subset using the colon operator I obtained a different result:
s2<- subset(data, x==1:3)
x y
1 1 4
2 2 2
3 3 5
This time it only includes the first observation in which "x" was 1,2, or 3. Why?
I would like to use the ":" operator because I am writing a function so the user would input a range of values from which she wants to see an average calculated over the "y" variable. I would prefer if they can use ":" operator to pass this argument to the subset function inside my function but I don't know why subsetting with ":" gives me different results.
I'd appreciate any suggestions on this regard.
You can use %in% instead of ==
subset(data, x %in% 1:3)
In general, if we are comparing two vectors of unequal sizes, %in% would be used. There are cases where we can take advantage of the recycling (it can fail too) if the length of one of the vector is double that of the second. Some examples with some description is here.
I have a vector X that contains positive numbers that I want to bin/discretize. For this vector, I want the numbers [0, 10) to show up just as they exist in the vector, but numbers [10,∞) to be 10+.
I'm using:
x <- c(0,1,3,4,2,4,2,5,43,432,34,2,34,2,342,3,4,2)
binned.x <- as.factor(ifelse(x > 10,"10+",x))
but this feels klugey to me. Does anyone know a better solution or a different approach?
How about cut:
binned.x <- cut(x, breaks = c(-1:9, Inf), labels = c(as.character(0:9), '10+'))
Which yields:
# [1] 0 1 3 4 2 4 2 5 10+ 10+ 10+ 2 10+ 2 10+ 3 4 2
# Levels: 0 1 2 3 4 5 6 7 8 9 10+
You question is inconsistent.
In description 10 belongs to "10+" group, but in code 10 is separated level.
If 10 should be in the "10+" group then you code should be
as.factor(ifelse(x >= 10,"10+",x))
In this case you could truncate data to 10 (if you don't want a factor):
pmin(x, 10)
# [1] 0 1 3 4 2 4 2 5 10 10 10 2 10 2 10 3 4 2 10
x[x>=10]<-"10+"
This will give you a vector of strings. You can use as.numeric(x) to convert back to numbers ("10+" become NA), or as.factor(x) to get your result above.
Note that this will modify the original vector itself, so you may want to copy to another vector and work on that.