What type returns table in R? - r

I wrote this lines of code below.
I want to get the most frequent value in matrix:
matrix7 <- matrix(sample(1:36, 100, replace = TRUE), nrow = 1)
t <- table(matrix7)
print(t)
a <- which.max(table(matrix7))
print(unlist(a))
it prints this:
> matrix7 <- matrix(sample(1:36, 100, replace = TRUE), nrow = 1)
> t <- table(matrix7)
> print(t)
matrix7
1 2 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 25 26 27 28 29 30 31 32 34 35 36
4 5 1 5 2 5 1 3 1 4 2 2 2 5 5 1 3 7 2 3 2 3 2 1 4 4 2 2 2 5 2 5 3
> a <- which.max(table(matrix7))
> print(unlist(a))
19
18
>
What type is my t variable and a variable,
and how can I get the most frequent value from matrix?

To know the "type" of variable use:
class(t)
class(a)
But notice you are already setting your matrix7 as table here: t <- table(matrix7) while your variable a is an integer.
To get the most common element on your variable (t in your case):
sort(table(as.vector(t)))

In general, if you want to know the "type" (more properly called the class) of an object, use the function class:
> class(t)
[1] "table"
There are a few ways you can find the most frequent value. Given that you have already calculated the which.max, you can take the corresponding name of t:
> as.numeric(names(t)[a])
[1] 5 ## I have a different random number seed to you :)
Note that you can't just take t[a] since that might return an integer code (factors are integers underneath, and the integer might not be what you expect).
In your example, the object a is an integer vector of length one. The "data" is 18, and it has the "name" 19. Hence another and perhaps simpler way to get the most frequent value is to take names(a).

You can either use class() to get the the class attribute of an R object or typeof() to get the type or storage mode.
Class and type of a are 'integer', the class of t is 'table' and the type is 'integer'.
Note that a is a named integer, this is why 2 values are printed. If you use names(a) it will only return the value (as a character) of a.
If you use which.max(tabulate(matrix7)) it will return the value without the need to change it further.
which.max(tabulate(matrix7))
[1] 16
(Side node: since no seed is in your code the result differs, you can set it using set.seed(x) where x is an integer).

Related

r - lapply divides a column by an integer value from different dataset, unexpected result

I have two data.frames, one with genotype counts and one with a number that I need to normalize my counts from the first dataset.
countsdata=data.frame(genotype1=rep(c(10,20,30,40),each=1),
genotype2=rep(c(100,200,300,400),each=1),
genotype3=rep(c(40,50,60,70),each=1),
genotype4=rep(c(40,50,60,70),each=1)
)
coldata = data.frame(Group =c('genotype1', 'genotype2', 'genotype3', 'genotype4'),
Treatment = rep(c("control","treated"),each = 2),
Norm=rep(c(1,2,5,5)))
I made sure my variables don't have factors
factorsCharacter <- function(d) modifyList(d, lapply(d[, sapply(d, is.factor)],
as.character))
coldata=factorsCharacter(coldata)
Then I see that lapply loops through my counts, one column at the time and through my coldata that contains the normalization value (Norm). All is looking good, until I combined the two action in the same step
> lapply(coldata['Group'],function(group_i){group_i})
$Group
[1] "genotype1" "genotype2" "genotype3" "genotype4"
> lapply(coldata['Group'],function(group_i){countsdata[,group_i]})
$Group
genotype1 genotype2 genotype3 genotype4
1 10 100 40 40
2 20 200 50 50
3 30 300 60 60
4 40 400 70 70
> lapply(coldata['Group'],function(group_i){as.integer(coldata[coldata$Group==group_i,'Norm'])})
$Group
[1] 1 2 5 5
> lapply(coldata['Group'],function(group_i){
+ countsdata[,group_i]/as.integer(coldata[coldata$Group==group_i,'Norm'])
+ })
$Group
genotype1 genotype2 genotype3 genotype4
1 10 100 40 40
2 10 100 25 25
3 6 60 12 12
4 8 80 14 14
Here the result is not what I was expecting (dividing each column by its normalization number). After further inspection I noticed it's normalizing by rows, in other words it's normalizing across different columns, which shouldn't be the case as I am looping through one column at the time. I am probably missing a basic concept but looking through other SO posts didn't find anything I could use. My goal is to fix the code to make the right calculation but I also would like to understand why this code above is not working. Thanks so much.
The problem is in using [ and not [[. So, instead of looping through each of the elements in 'Group' column, we have a list of length 1 with all the elements. So, either use coldata[, 'Group'] or coldata[['Group']] or coldata$Group for looping.
countsdataNew <- countsdata
countsdataNew[] <- lapply(coldata[['Group']],function(group_i)
countsdata[,group_i]/coldata$Norm[coldata$Group==group_i])
countsdataNew
# genotype1 genotype2 genotype3 genotype4
#1 10 50 8 8
#2 20 100 10 10
#3 30 150 12 12
#4 40 200 14 14
If the column name in 'countsdata' and 'Group' column from 'countsdata' are in the same order, we can do this easily with Map
Map(`/`, countsdata, coldata$Norm)
Or just replicate the 'Norm' and do a simple division
countsdata/coldata$Norm[col(countsdata)]
Or with sweep
sweep(countsdata, 2, coldata$Norm, "/")

Switch statement throws error

I would like to use a switch statement to convert a number. If the case is 1 the number should be 13, case 2 should be 14 etc...
Therefore I wrote the following statement:
settime <- function(time){
switch(time,
"1" = 13,
2 = 14,
3 = 15,
4 = 16,
5 = 17,
6 = 18,
7 = 19,
8 = 20,
9 = 21,
10 = 22,
11 = 23,
12 = 24)
}
This however gives me the following error:
Error: unexpected '=' in: " switch(time,
1 ="
Any thought on where I go wrong?
The more obvious way to get what you want is with time + 12.
It is simpler than switch and allows you to pass a vector and not just a value.
But as your problem might be more complicated than the one you put in your example and if you feel you need to use switch (for which you can pass only one value at a time), and to complete my comment, you have 2 options to do that, as stated in the below sample from help(switch):
switch works in two distinct ways depending whether the first argument
evaluates to a character string or a number.
If the value of EXPR is not a character string it is coerced to
integer. Note that this also happens for factors, with a warning, as
typically the character level is meant. If the integer is between 1
and nargs()-1 then the corresponding element of ... is evaluated and
the result returned: thus if the first argument is 3 then the fourth
argument is evaluated and returned.
If EXPR evaluates to a character string then that string is matched
(exactly) to the names of the elements in .... If there is a match
then that element is evaluated unless it is missing, in which case the
next non-missing element is evaluated, so for example switch("cc", a =
1, cc =, cd =, d = 2) evaluates to 2. If there is more than one match,
the first matching element is used. In the case of no match, if there
is a unnamed element of ... its value is returned. (If there is more
than one such argument an error is returned.)
Either time is a character variable:
time <- "3"
switch(time, "1"=13, "2"=14, "3"=15, "4"=16)
# [1] 15
Or time is numeric:
time <- 3
switch(time, 13, 14, 15, 16)
# [1] 15
We could do this without using any switch. I am not sure how efficient switch will be for large vectors. But, the below method should be fast enough.
res <- setNames(13:24, 1:12)[as.character(v1)]
res
#4 3 9 7 8 12 4 10 10 4 8 5 9 9 4 11 3 1 7 2 2 7 9 2 3 9
#16 15 21 19 20 24 16 22 22 16 20 17 21 21 16 23 15 13 19 14 14 19 21 14 15 21
From the above, it is easier to remove the name.
unname(res)
Or
as.vector(res)
We do not need to use as.character as the elements start from 1:12. But, in case, it is a different vector, then we may need to be extra careful.
data
set.seed(24)
v1 <- sample(1:12, 30, replace=TRUE)

Creating a numerical variable order

I have a set of data with 3 columns: index column (with no name), colour, colour of seed, and germination time.
How do I create a numerical variable called 'order' with values 1 to 22 (the number of data sets)?
I don't know if I get you right, but simplest way would be:
> order <- c(1:22)
> order
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
No, if you run:
class(order)
you will get:
[1] "integer"
but you can easily get every element of object order (especially in a loop)
for(i in 1:length(order)){
print(order[i])
}

returning a list in R and functional programming behavior

I have a basic questions regarding functional programming in R.
Given a function that returns a list, such as:
myF <- function(x){
return (list(a=11,b=x))
}
why is it that the list returned when calling the function with a range or vector is always the same lenght for 'a'
Ex:
myF(1:10)
returns:
$a
[1] 11
$b
[1] 1 2 3 4 5 6 7 8 9 10
How can one change the behavior so that the 'a' list has the sample length as b's.
I am actually working with a bunch of S4 objects that do I cannot easily convert to list (using as.list) so _apply is not my first choice.
Thanks for any insight or help!
EDIT (Added further explanations)
I am not necessarily looking to just pad 'a' to makes its length equal to b's. However using the solution
as.list(data.frame(a=myA,b=x)) pads the 'a' with the same value computed first.
myF <- function(x){
myA = ceiling(runif(1, max=100))
return (as.list(data.frame(a=myA
,b=x)))
}
myF(1:5)
$a
[1] 79 79 79 79 79 79 79 79 79 79
$b
[1] 1 2 3 4 5 6 7 8 9 10
I still am not sure why that happens!
Thanks
are you just looking to have 11 repeated so that a is the same length as b? if so:
> myF <- function(x){
+ return (list(a=rep(11,length(x)),b=x))
+ }
> myF(1:10)
$a
[1] 11 11 11 11 11 11 11 11 11 11
$b
[1] 1 2 3 4 5 6 7 8 9 10
EDIT based on OP's clarification/comments. If you want 'a' to instead be a random vector with length equal to 'b':
> myF <- function(x){
+ return (list(a=ceiling(runif(length(x),max=100)),b=x))
+ }
> myF(1:10)
$a
[1] 4 31 8 45 25 74 36 95 64 32
$b
[1] 1 2 3 4 5 6 7 8 9 10
I don't quite understand what you mean by not being able to use as.list. You should be able to get a version of your function satisfying the requirement that all components of the list be equally long by doing:
myF <- function(x){
return as.list(data.frame(a=11,b=x))
}
EDIT:
The reason list does not work the way you expect is that list applied to a number of lists/vectors/e.t.c. is just that, a list of those lists/vectors/e.t.c.; it does not "inspect" their structure.
What I think you want is the additional semantics that the vectors contained in the list should match up and produce a set of "rows", each with one corresponding element from each one of your vectors. This is exactly what a data frame is suppose to be (indeed how, I think, a data frame is represented in R). The final as.list call does little but change what type its tagged as.
EDIT2:
Note that if I'm wrong above (and that's not the general behaviour you want) then Mac's solution is more appropriate, as it gives you exactly the behaviour that both the vectors should have the same length, without implying that they should "line up".
This would both be confusing to anyone reading the code (as using a data.frame implies you think of your vectors as matching up) as well as forcing any additional elements you add to the list to be converted into vectors of the appropriate length (which may or may not be what you want)
In case I did not understand you correctly last time, here is another possibility:
If you want to generate a second vector, given some function/expression, of the same length as your argument you could do something like:
myF <- function(x){
return (list(a=replicate(length(x),f),b=x))
}
in your example f could be runif(1, max=100), though in the specific case of runif you could explicitly tell it to generate a vector of appropriate length by calling runif(length(x), max=100) inside the function.
replicate simply re-evaluates f the number of times you request, and gives you the vector of all the results.
It appears that your function is "hard coding" a. So no matter what you specify it will always give 11.
If for example you changed the function to:
myF <- function(x){ return (list(a=x,b=x)) }
myF(1:10)
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] 1 2 3 4 5 6 7 8 9 10
a is allowed to change like b.
or
myF <- function(x,y){ return (list(a=y,b=x)) }
myF(10:1,1:10)
$a
[1] 1 2 3 4 5 6 7 8 9 10
$b
[1] 10 9 8 7 6 5 4 3 2 1
Now a is allowed to change independent of b.

Hash or List-Backed Levels of a Factor

I'm dealing with a categorical variable retrieved from a database and am wanting to use factors to maintain the "fullness" of the data.
For instance, I have a table which stores colors and their associated numerical ID
ID | Color
------+-------
1 | Black
1805 | Red
3704 | White
So I'd like to use a factor to store this information in a data frame such as:
Car Model | Color
----------+-------
Civic | Black
Accord | White
Sentra | Red
where the color column is a factor and the underlying data stored, rather than being a string, is actually c(1, 3704, 1805) -- to IDs associated with each color.
So I can create a custom factor by modifying the levels attribute of an object of the factor class to achieve this effect.
Unfortunately, as you can see in the example, my IDs are not incremented. In my application, I have ~30 levels and the maximum ID for one level is ~9,000. Because the levels are stored in an array for a factor, that means I'm storing an integer vector of length 9,000 with only 30 elements in it.
Is there any way to use a hash or list to accomplish this effect more efficiently? i.e. if I were to use a hash in the levels attribute of a factor, I could store all 30 elements with whatever indices I please without having to create an array of size max(ID).
Thanks in advance!
Well, I'm pretty sure you can't change how factors work. A factor always has level ids that are integer numbers 1..n where n is the number of levels.
...but you can easily have a translation vector to get to your color ids:
# The translation vector...
colorIds <- c(Black=1,Red=1805,White=3704)
# Create a factor with the correct levels
# (but with level ids that are 1,2,3...)
f <- factor(c('Red','Black','Red','White'), levels=names(colorIds))
as.integer(f) # 2 1 2 3
# Translate level ids to your color ids
colorIds[f] # 1805 1 1805 3704
Technically, colorIds does not need to define the names of the colors, but it makes it easier to have in one place since the names are used when creating the levels for the factor. You want to specify the levels explicitly so that the numbering of them matches even if the levels are not in alphabetical order (as yours happen to be).
EDIT It is however possible to create a class deriving from factor that has the codes as an attribute. Lets call this new glorious class foo:
foo <- function(x = character(), levels, codes) {
f <- factor(x, levels)
attr(f, 'codes') <- codes
class(f) <- c('foo', class(f))
f
}
`[.foo` <- function(x, ...) {
y <- NextMethod('[')
attr(y, 'codes') <- attr(x, 'codes')
y
}
as.integer.foo <- function(x, ...) attr(x,'codes')[unclass(x)]
# Try it out
set.seed(42)
f <- foo(sample(LETTERS[1:5], 10, replace=TRUE), levels=LETTERS[1:5], codes=101:105)
d <- data.frame(i=11:15, f=f)
# Try subsetting it...
d2 <- d[2:5,]
# Gets the codes, not the level ids...
as.integer(d2$f) # 105 102 105 104
You could then also fix print.foo etc...
In thinking about it, the only feature that a "level" needs to implement in order to have a valid factor is the [ accessor. So any object implementing the [ accessor could be viewed as a vector from the standpoint of any interfacing function.
I looked into the hash class, but saw that it uses the normal R behavior (as is seen in lists) of returning a slice of the original hash when only using a single bracket (while extracting the actual value when using the double bracket). However, it I were to override this using setMethod(), I was actually able to get the desired behavior.
library(hash)
setMethod(
'[' ,
signature( x="hash", i="ANY", j="missing", drop = "missing") ,
function(
x,i,j, ... ,
drop
) {
if (class(i) == "factor"){
#presumably trying to lookup the values associated with the ordered keys in this hash
toReturn <- NULL
for (k in make.keys(as.integer(i))){
toReturn <- c(toReturn, get(k, envir=x#.xData))
}
return(toReturn)
}
#default, just make keys and get from the environment
toReturn <- NULL
for (k in make.keys(i)){
toReturn <- c(toReturn, get(k, envir=x#.xData))
}
return(toReturn)
}
)
as.character.hash <- function(h){
as.character(values(h))
}
print.hash <- function(h){
print(as.character(h))
}
h <- hash(1:26, letters)
df <- data.frame(ID=1:26, letter=26:1, stringsAsFactors=FALSE)
attributes(df$letter)$class <- "factor"
attributes(df$letter)$levels <- h
> df
ID letter
1 1 z
2 2 y
3 3 x
4 4 w
5 5 v
6 6 u
7 7 t
8 8 s
9 9 r
10 10 q
11 11 p
12 12 o
13 13 n
14 14 m
15 15 l
16 16 k
17 17 j
18 18 i
19 19 h
20 20 g
21 21 f
22 22 e
23 23 d
24 24 c
25 25 b
26 26 a
> attributes(df$letter)$levels
<hash> containing 26 key-value pair(s).
1 : a
10 : j
11 : k
12 : l
13 : m
14 : n
15 : o
16 : p
17 : q
18 : r
19 : s
2 : b
20 : t
21 : u
22 : v
23 : w
24 : x
25 : y
26 : z
3 : c
4 : d
5 : e
6 : f
7 : g
8 : h
9 : i
>
> df[1,2]
[1] z
Levels: a j k l m n o p q r s b t u v w x y z c d e f g h i
> as.integer(df$letter)
[1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2
[26] 1
Any feedback on this? As best I can tell, everything's working. It looks like it works properly as far as printing, and the underlying data stored in the actual data.frame is untouched, so I don't feel like I'm jeopardizing anything there. I may even be able to get away with adding a new class into my package which just implements this accessor to avoid having to add a dependency on the hash class.
Any feedback or points on what I'm overlooking would be much appreciated.

Resources