rep function strange error - r

When I perform:
a <- seq(1,1.5,0.1)
b <- c(1,1.1,1.4,1.5)
x <- rep(c(a,b),times=c(2,1))
Error in rep(c(a, b), c(2, 1)) : invalid 'times' argument
Why?

When we concatenate (c) two vectors, it becomes a single vector. If the idea would be to replicate 'a' by 2 and 'b' by 1, we place them in a list, and use rep. The output will be a list, which can be unlisted to get a vector.
unlist(rep(list(a,b), c(2,1)))

Marked answer is already perfect: Here an alternative using mapply
unlist(mapply(function(x,n)rep(x,n),list(a,b),c(2,1)))

Related

apply() works not as expected

I'm trying to get a hold on how the apply function works. Here is what I tried:
df = data.frame(x=c(1,2,3,4,5), x2=c(1,2,3,4,5))
apply(df$x2, 2, function(x) x*2) #doesn't work
apply(df["x2"], 2, function(x) x*2) #works
apply(df[,2], 2, function(x) (x*2)) #doesn't work
apply(df[2], 2, function(x) x*2) #works (suprisingly)
apply(df[2,], 1, function(x) x*2) #works, but gives me vertical vector
apply(df[2,], 2, function(x) x*2) #works; this gives me the output I expected in line above
Questions (as idicated by comments):
Why doesn't line 2 work although line 3 does?
Why can I use [2,] to refer to row 2 (line 6), but cannot use [,2] to refer to column 2 (line 4), but have to use [2] (line 5) instead?
In line 6 I expected to get what I got from line 7: row 2 (with double values) in a row. Why didn't I get this from line 6, I indicated row with
MARGIN=2?
apply needs to be used on something with a dimension of positive length. For simplicity some Object that has rows and columns.
That's why you have margin 1, 2. Standing for the row-wise and col-wise operation.
Check your Input values like this:
dim(df["x2"])
dim(df[,2]) #this is null, so it does not work
df[,2] gives you a vector same as df$x2. A vector does not have rows and cols. Therefore not working with apply.
In order to understand what you are doing wrong:
Type ?"[" into your console and read everything. Also play around... what you are already doing!
Have a closer look at the drop argument.
Lastly with df[2,] your subsetting a single row. It's still a dataframe.
Check dim(df[2,])
apply(df[2,], 1, function(x) x*2) #works, but gives me vertical vector
apply(df[2,], 2, function(x) x*2) #works; this gives me the output I expected in line above
The reason you don't get the same output. Is the WHOLE reason why apply exists. Please read ?apply to understand.
When you have questions after reading the two mentioned resources, feel free to ask more.
Here is a little example:
m <- matrix(1:9,nrow=3)
m
apply(m,1,max) #row-wise max value
apply(m,2,max) #col-wise max value
The problem is subsetting:
First:
df$x2 and df[, 2] are different from df["x2"] and df[2], as the former return a numeric vector, the latter return a data.frame.
Second:
df[2, ] returns the second row of your data.frame. If you use MARGIN = 1 you go through the rows, each row is represented as a (named) vector of length equal to the number of columns in your data.frame.
If you use MARGIN = 2 you go through the columns, again, each column is represented as a (named) vector of length equal to the number of rows in your data.frame.
Why doesn't line 2 work although line 3 does?
df$x2 is a vector i.e. c(1,2,3,4,5) whereas df["x2"] is a data frame with just one column. The vector has no second dimension to apply over. See ?'['] in R for details of how subsetting works, this isn't really related to the apply function
Why can I use [2,] to refer to row 2 (line 6), but cannot use [,2] to refer to column 2 (line 4), but have to use [2] (line 5) instead?
Again, see the subsetting help page, but df[,2,drop=FALSE] is probably what you need.
In line 6 I expected to get what I got from line 7: row 2 (with double values) in a row. Why didn't I get this from line 6, I indicated row with MARGIN=2?
The value section of ?apply explains the dimensions that you can expect as output from a call to apply:
If each call to FUN returns a vector of length n, then apply returns an array of dimension c(n, dim(X)[MARGIN]) if n > 1. If n equals 1, apply returns a vector if MARGIN has length 1 and an array of dimension dim(X)[MARGIN] otherwise.
In this case we see that:
> dim(df[2,])
# [1] 1 2
and so:
apply(df[2,], 1, function(x) x*2)
has n=2 and dim(df[2,])[1]=1, so you should expect an output with dimensions c(2,1).
You should look at each type and dimension of the expression
> typeof(df$x2)
[1] "double"
> dim(df$x2)
NULL
> typeof(df["x2"])
[1] "list"
> dim(df["x2"])
[1] 5 1
> typeof(df[, 2])
[1] "double"
> dim(df[, 2])
NULL
> typeof(df[2])
[1] "list
> dim(df[2])
[1] 5 1
> typeof(df[2, ])
[1] "list"
> dim(df[2,])
[1] 1 2
The line 2 does not work because you try to apply function to variable which has NULL dimension. (dim(X) must have positive length). The rest is similar. You must keep attention on the type of the expression in apply. I recommend you to simply print values to check if there are properly for the apply function.

what does the small x means in lapply

I have the variables:
trims<- c(0,0.1,0.2,0.5)
x<-rcauchy(100)
and the following operation:
lapply(trims, mean, x=x)
what does the small x refer to in this case? The documentation for lapply does not explain it well either. I do know that for lapply function, it takes a function and apply it to each element of the list, which I believe is trim in this case. How does x come in then?
If we use anonymous function, it will be clear.
res <- lapply(trims, function(y) mean(x, trim=y))
res1 <- lapply(trims, mean, x=x)
identical(res, res1)
#[1] TRUE
The lapply loops through each of the 'trims' and as mean has first argument of x and second argument of trim and the first argument is already mentioned with x=x i.e. the object created with rauncy, naturally the the second argument i.e. trim selects the values in 'trimws'

Using apply() over columns to output subsets

I have a data frame in R where the majority of columns are values, but there is one character column. For each column excluding the character column I want to subset the values that are over a threshold and obtain the corresponding value in the character column.
I'm unable to find a built-in dataset that contains the pattern of data I want, so a dput of my data can be accessed here.
When I use subsetting, I get the output I'm expecting:
> df[abs(df$PA3) > 0.32,1]
[1] "SSI_01" "SSI_02" "SSI_04" "SSI_05" "SSI_06" "SSI_07" "SSI_08" "SSI_09"
When I try to iterate over the columns of the data frame using apply, I get a recursion error:
> apply(df[2:10], 2, function(x) df[abs(df[[x]])>0.32, 1])
Error in .subset2(x, i, exact = exact) :
recursive indexing failed at level 2
Any suggestions where I'm going wrong?
The reason your solution didn't work is that the x being passed to your user-defined function is actually a column of df. Therefore, you could get your solution working with a small modification (replacing df[[x]] with x):
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])
You could use the ... argument to apply to pass an extra argument. In this case, you would want to pass the first column:
apply(df[2:10], 2, function(x, y) y[abs(x) > 0.32], y=df[,1])
Yet another variation:
apply(abs(df[-1]) > .32, 2, subset, x=df[[1]])
The cute trick here is to "curry" subset by specifying the x parameter. I was hoping I could do it with [ but that doesn't deal with named parameters in the typical way because it is a primitive function :..(
A quick and non-sophisticated solution might be:
sapply(2:10, function(x) df[abs(df[,x])>0.32, 1])
Try:
lapply(df[,2:10],function(x) df[abs(x)>0.32, 1])
Or using apply:
apply(df[2:10], 2, function(x) df[abs(x)>0.32, 1])

R apply statement does not work with a matrix

I'd like to know the reason why the following does not work on the matrix structure I have posted here (I've used the dput command).
When I try running:
apply(mymatrix, 2, sum)
I get:
Error in FUN(newX[, i], ...) : invalid 'type' (list) of argument
However, when I check to make sure it's a matrix I get the following:
is.matrix(mymatrix)
[1] TRUE
I realize that I can get around this problem by unlisting the data into a temp variable and then just recreating the matrix, but I'm curious why this is happening.
?is.matrix says:
'is.matrix' returns 'TRUE' if 'x' is a vector and has a '"dim"'
attribute of length 2) and 'FALSE' otherwise.
Your object is a list with a dim attribute. A list is a type of vector (even though it is not an atomic type, which is what most people think of as vectors), so is.matrix returns TRUE. For example:
> l <- as.list(1:10)
> dim(l) <- c(10,1)
> is.matrix(l)
[1] TRUE
To convert mymatrix to an atomic matrix, you need to do something like this:
mymatrix2 <- unlist(mymatrix, use.names=FALSE)
dim(mymatrix2) <- dim(mymatrix)
# now your apply call will work
apply(mymatrix2, 2, sum)
# but you should really use (if you're really just summing columns)
colSums(mymatrix2)
The elements of your matrix are not numeric, instead they are list, to see this you can do:
apply(m,2, class) # here m is your matrix
So if you want the column sum you have to 'coerce' them to be numeric and then apply colSums which is a shortcut for apply(x, 2, sum)
colSums(apply(m, 2, as.numeric)) # this will give you the sum you want.

How to plot values from atomic vectors (matrix) in R

I am having a problem plotting and accessing the following matrix I have created.
Here I have created a version everyone may follow w/o my data.
a<-rnorm(10,0,1)
b<-rnorm(10,2,1)
J<-matrix(0,10,2)
colnames(J)<-c("a","b")
J[,1]<-a
J[,2]<-b
And then wish to plot. but I get error messages I do not understand:
with(J,plot(a,b))
+Error in eval(substitute(expr), data, enclos = parent.frame()) :
+ numeric 'envir' arg not of length one
with
plot(J$a,J$b)
+plot(J$a,J$b)
+Error in J$a: $ operator is invalid for atomic vectors
Does anyone have any idea?
Kind Regards from GER
It would work if J were defined as a data.frame, with columns a and b:
a<-rnorm(10,0,1)
b<-rnorm(10,2,1)
J <- data.frame(a,b)
with(J,plot(a,b))
$ only works with list objects (including data.frame). If you stick with the matrix, then you grab from the columns using brackets with indices or names:
J <- cbind(a,b)
plot(J,[,1],J[,2])
plot(J[,"a"],J[,"b"])
In your case, were you have a 2 column matrix J
plot(J)
will work as will
plot(J[,'a'], J[,'b'])
The `$` operator is not defined for matrices, but is for lists or data.frames
with will not work with matrices because a matrix cannot be an environment or an enclosure
plot(J[,1], J[,2])
and
with(as.data.frame(J), plot(a,b))
both work
If you want to access the columns of a matrix by their names:
plot(J[ , colnames(J) %in% c("a", "b")])

Resources