I cannot seem to convert a list to a matrix. I load a .csv file using:
dat = read.csv("games.csv", header = TRUE)
> typeof(dat)
[1] "list"
But when I try to convert it into a numeric matrix using:
games = data.matrix(dat)
The entries' values are all changed for some reason. What is the problem?
While Nathaniel's solution worked for you, I think it's important to point out that you might need to adjust your perception of what is going on.
The typeof(dat) might be a list but the class is a data.frame.
This might help illustrate the difference:
# typeof() & class() of `pts` is `list`
# whereas typeof() `dat` in your example is `list` but
# class() of `dat` in your example is a `data.frame`
pts <- list(x = cars[,1], y = cars[,2])
as.matrix(pts)
## [,1]
## x Numeric,50
#3 y Numeric,50
head(as.matrix(data.frame(pts)))
## x y
## [1,] 4 2
## [2,] 4 10
## [3,] 7 4
## [4,] 7 22
## [5,] 8 16
## [6,] 9 10
Those are two substantially different outcomes from the 'as.matrix()` function.
Just making sure you don't get disappointed of the outcome if you try this in a different context outside of read.csv.
Without any other information being provided, perhaps you might try:
games <- as.matrix(dat)
Related
I want to assure that the result of which(..., arr.ind = TRUE) is always ordered, specifically: arranged ascending by (col, row). I do not see such a remark in the which function documentation, whereas it seems to be the case based on some experiments I made. How I can check / learn if it is the case?
Example. When I run the code below, the output is a matrix in which the results are arranged ascending by (col, row) columns.
> set.seed(1)
> vals <- rnorm(10)
> valsall <- sample(as.numeric(replicate(10, vals)))
> mat <- matrix(valsall, 10, 10)
> which(mat == max(mat), arr.ind = TRUE)
row col
[1,] 1 1
[2,] 3 1
[3,] 1 2
[4,] 2 2
[5,] 10 2
[6,] 1 6
[7,] 2 8
[8,] 4 8
[9,] 1 9
[10,] 6 9
Part1:
Answering a part of your question on how to understand functions on a deeper level, if the documentation is not enough, without going into the detail of function which().
As match() is not a primitive function (which are written in C), i.e. written using the basic building blocks of R, we can check what's going on behind the scenes by printing the function itself. Note that using the backticks allows to check functions that have reserved names, e.g. +, and is therefore optional in this example. This dense R code can be extremely tiresome to read, but I've found it very educational and it does solve some mental knots every once in a while.
> print(`which`)
function (x, arr.ind = FALSE, useNames = TRUE)
{
wh <- .Internal(which(x))
if (arr.ind && !is.null(d <- dim(x)))
arrayInd(wh, d, dimnames(x), useNames = useNames)
else wh
}
<bytecode: 0x00000000058673e0>
<environment: namespace:base>
Part2:
So after giving up on trying to understand the which and arrayInd function in the way described above, I'm trying it with common sense. The most efficient way to check each value of a matrix/array that makes sense to me, is to at some point convert it to a one-dimensional object. Coercion from matrix to atomic vector, or any reduction of dimensions will always result in concatenating the complete columns of each dimension, so to me it is natural that higher-level functions will also follow this fundamental rule.
> testmat <- matrix(1:10, nrow = 2, ncol = 5)
> testmat
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
> as.numeric(testmat)
[1] 1 2 3 4 5 6 7 8 9 10
I found Hadley Wickham's Advanced R an extremely valuable resource in answering your question, especially the chapters about functions and data structures.
[http://adv-r.had.co.nz/][1]
I think that this is similiar but it is not the same as a previous question that I have asked here Pull specific rows
Here is the code that I am now working with:
City <- c("x","x","y","y","z","z")
Type <- c("a","b","a","b","a","b")
Value <- c(1,3,2,5,6,10)
cbind.data.frame(City,Type,Value)
Which produces:
City Type Value
1 x a 1
2 x b 3
3 y a 2
4 y b 5
5 z a 6
6 z b 10
I want to do something similar as before but now if two different conditions must be met to pull a specific number. Lets say we had a matrix,
testmat <- matrix(c("x","x","y","a","b","b"),ncol=2)
Which looks like this:
[,1] [,2]
[1,] "x" "a"
[2,] "x" "b"
[3,] "y" "b"
The desired outcome is
[,1] [,2] [,3]
[1,] "x" "a" 1
[2,] "x" "b" 3
[3,] "y" "b" 5
Another Question PLEASE ANSWER THIS PART
City <- c("x","x","x","x","y","y","x","z")
Type <- c("a","a","a","a","a","b","a","b")
Value <- c(1,3,2,5,6,10,11,15)
mat <- cbind.data.frame(City,Type,Value)
mat
testmat <- matrix(c("y","x","b","a"),ncol=2)
testmat <- data.frame(testmat)
testmat
test <- inner_join(mat,testmat,by = c("City"="X1", "Type"="X2"))
How come when I try to use the inner_join function it gives me a warning message. Here is the warning message that I get....
In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
This is the desired output, is...
City Type Value
1 y b 10
2 x a 1
3 x a 3
4 x a 2
5 x a 5
6 x a 11
but it is producing...
City Type Value
1 x a 1
2 x a 3
3 x a 2
4 x a 5
5 y b 10
6 x a 11
I want the inner_join function to produce the values in which they are presented first in the testmat, as shown above. So if since City "y" of type "b" comes first in the testmat I want it to come first in the values for "test"
The solution is to just switch the order of testmat and mat, like so..
test <- inner_join(testmat,mat,by = c("X1"="City", "X2"="Type"))
I find it interesting that the order of the by parameter needs to be in the same order of the data frames being passed throught the innerjoin function.
The warning is because R treats string vectors as factor type. you can change this behaviour by running the following code at the start of your script:
options(stringsAsFactors = FALSE)
Answer to second part:
The warning states, that you try to join on two factors with different levels. Therefor, the variables are coerced into "character" before joining, theres no problem with that. As Mostafa Rezaei mentioned in his answer R is coercing factors from character-vectors when creating a dataframe. Usually it's best to leave characters:
mat <- data.frame(City,Type,Value, stringsAsFactors=F)
testmat <- data.frame(testmat, stringsAsFactors=F)
Concerning your real question:
The order of the result of a join is not defined. If order is crucial to you, you can use an additional sorting variable:
mat %>%
mutate(rn = row_number()) %>%
semi_join(testmat, by = c("City"="X1", "Type"="X2")) %>%
arrange(rn)
btw: I think your looking for an semi_join rather than an inner_join, read the help file for differences.
My question is a continuation of this:
Split a vector into chunks
What would be the best possible way to access of all these chunks. For example, is there an easy way to access these mini-vectors if I have around a hundred of them. I would be needing to find the minimum of each of these chunks and store the results in a new vector.
Look at the plyr package there is family of function to process lists or vectors. In the post you mentioned, you see lists. Thus, use llply to have input as a list and output as a list, for vectors aaply is your choice.
# Examples from ?lapply
x <- list(a = 1:10, beta = exp(-3:3), logic = c(TRUE,FALSE,FALSE,TRUE))
> x
$a
[1] 1 2 3 4 5 6 7 8 9 10
$beta
[1] 0.04978707 0.13533528 0.36787944 1.00000000 2.71828183 7.38905610 20.08553692
$logic
[1] TRUE FALSE FALSE TRUE
llply(x, mean)
llply(x, quantile, probs = 1:3/4)
I have an algorithm that takes data, sorts it, analyzes it, and then returns scores for the sorted data. However, the scores correspond to the sorted data, and I'd really like to return scores that correspond to the unsorted data. I figured there had to be some default R function that does this, but I've had no luck finding anything. Here's a MWE, and code I wrote that works but is really slow:
orig = rnorm(10)
ord = order(orig)
new = orig[ord]
reprod = sapply(1:length(orig), function(x)new[which(ord==x)] )
all(reprod==orig)
Are there any ways to "un-sort" data more efficiently?
what about just:
orig = rnorm(100000)
ord = order(orig)
new = orig[ord]
reprod = rep(0,length(new))
reprod[ord] = new
One nice thing about the ordering vector (the result of the order function) is that if you run order on the result it gives you the inverse sorting, in other words it tells you how to unsort. Here is a quick example of what I think you are trying to do in a simple way
> orig <- rnorm(10)
> ord <- order(orig)
> score <- seq_along(orig)
> cbind(orig, score[ order(ord) ])
orig
[1,] -0.2429266384 4
[2,] 0.6346488818 8
[3,] 1.2956779160 9
[4,] -0.5563531517 3
[5,] 1.3299626650 10
[6,] -1.6062497717 1
[7,] -1.1444093167 2
[8,] -0.0004719915 5
[9,] 0.2734227278 7
[10,] 0.0357991850 6
You could also do something like:
new[ order(ord) ]
to see that it returns the sorted data to the original order.
I'm quite new to R and having some problems understanding the reorder function.
Lets say i have a list with 3 vectors like:
myList <- (c(7,5,2),c(2,3,4),c(1,1,1))
and I want my list to be reordered by the median of each vector so that boxplotting the list gives me an ordered plot. Now how would I do this? I read the Help description for ?reorder but I cant seem to adapt the given example for my list.
any help would be appreciated
I think you want
myList <- list(c(7,5,2),c(2,3,4),c(1,1,1))
unordered.median <- unlist(lapply(myList, median))
ordered.median <- order(unordered.median)
myList[ordered.median]
[[1]]
[1] 1 1 1
[[2]]
[1] 2 3 4
[[3]]
[1] 7 5 2