Import Excel Data into R - r

I'm working on an excel-file consisting of a 261 x 10 matrix. The matrix consists of the weekly returns of 10 stocks from 2010 to 2015. So, I have 10 variables (stocks) and 261 observations (weekly returns) for each variable.
For my master thesis I have to apply a "rearrangement algorithm" developed by Rüschendorf and Puccetti (2012) on my matrix. I'm not going into further details on the theorical side of that concept. The thing is that I downloaded a package capable of performing the rearrangement algorithm in R. I tested it out and it works perfectly.
Actually the only thing I need to know is how to import my excel-matrix into R in order to be capable of performing the rearrangement algorithm on it. I can rewrite my matrix into R (manually) just by encoding every element of the matrix by using the matrix programming formula in R:
A = matrix( c(), nrow= , ncol= , byrow=TRUE)
The problem is that doing so for such a big matrix (261 x 10) would be very time consuming. Is their any way to import my excel-matrix in R and that R recognizes it as matrix consisting of numerical values ready for calculations (similar to the case of doing it manually) ? In such a way that I just have to run the "rearrangement algorithm" function provided in R.
Thanks in advance.

I make a selection within an opened Excel sheet and copied to the clipboard. This then worked on a Mac:
> con=pipe("pbpaste")
> dat <- data.matrix( read.table(con) )
> dat
V1 V2 V3
[1,] 1 1 1
[2,] 2 2 2
[3,] 3 3 3
[4,] 4 4 4
[5,] 5 5 5
[6,] 6 6 6
[7,] 7 7 7
[8,] 8 8 8
[9,] 9 9 9
[10,] 10 10 10
[11,] 11 11 11
[12,] 12 12 12
[13,] 13 13 13
[14,] 14 14 14
The method is somewhat different on Windows devices but the help page for ?connections should have your OS-specific techniques.

You didn't provide a minimal reproducible example, so the answers are probably gonna of lesser quality. Anyway, you should be able to load the the excel file with something like:
require(XLConnect)
wrkbk <- loadWorkbook("path/to/your/data.xlsx")
df <- readWorksheet(wrkbk, sheet = 1, header = TRUE)
And then convert the data.frame to a matrix via
ans <- as.matrix(df)
Otherwise, you need to save your file as a .txt or .csv plain-text file and use read.table or read.csv and the like. Consult their respective help pages.

Related

How to get values on testdata in RSNNS

I have two files, "testi" containing few numbers and "testo" containing their square roots. I have another test named file which contains some numbers for which I want their square roots. I used the command
model <- mlp(testi,testo,size=50,learnFuncParams = c(0.001),maxit = 5000)
xyz <- predict(model,test)
The values which I get from "xyz" are
xyz
#[1,] 0.9971085
#[2,] 0.9992253
#[3,] 0.9992997
#[4,] 0.9993009
#[5,] 0.9993009
#[6,] 0.9993009
#[7,] 0.9993009
Whereas "test" contains
1 4
2 16
3 36
4 64
5 100
6 144
7 196
Please let me know why does this happen?
mlp has logistic output, you need to specify linOut=TRUE. In general, normalizing your data would also help.

Creating large matrices in R in reasonable time

I am working on a movie recommender predicts a user's movie rating for an unseen movie. Most of the work is done and I have created a 7000x3000 matrix userRatingsNew containing 7000 users and their ratings for 3000 movies, replacing all the missing values with the predicted rating.
I was provided two other files, mapping and test, and used read.csv() to load them into matrices of the following format.
mapping is a 8,400,000x3 matrix that contains id, user, movie, where id is basically the transaction id associated with a user's rating of movie x.
test is a 8,400,000x2 matrix that contains id, rating, where rating is the user's rating for that movie associated with id. The values in the rating column are empty and I need to fill those in using the predicted values that I have already calculated.
Here is my code
writeResult <- function(userRatingsNew, mapping, test, writeToFile = FALSE){
start <- Sys.time()
result <- test
entries <- nrow(test)
for (i in 1:entries){
result[i,2] <- userRatingsNew[mapping[i,2], mapping[i,3]]
}
if (writeToFile)
write.csv(result, "result.csv", row.names=FALSE)
print(Sys.time()-start)
return(result)
}
My problem is that for i=1:100, it takes ~7 seconds. So in order to process all 8.4 million entries, it'd take ~163 hours. I tried using doMC() and implemented parallel processing, but I ran into the problem where my computer ran out of memory. What exactly can I do to speed this process up?
You can index a matrix with another matrix, as in:
M <- matrix(1:25,nc=5,nr=5)
M
# [,1] [,2] [,3] [,4] [,5]
# [1,] 1 6 11 16 21
# [2,] 2 7 12 17 22
# [3,] 3 8 13 18 23
# [4,] 4 9 14 19 24
# [5,] 5 10 15 20 25
m <- cbind(1:5,5:1)
m
# [,1] [,2]
# [1,] 1 5
# [2,] 2 4
# [3,] 3 3
# [4,] 4 2
# [5,] 5 1
M[m]
# [1] 21 17 13 9 5
So try
result[,2] <- userRatingsNew[mapping[,2:3]]
You should not need a loop.
A thought:
Instead of the 3000-sized dimension attached directly to the 7000-sized dimension, for each user you can attach an array which specifies the movie id/number/place in array, and their rating, in a series of 2d datapoints. Presumably most users will not rate all 3000 films. Let's say they rate 20 movies on average, and in each of 20 cases now it calls the array of movie names by correctly referring to the location in the array, then now you only need (7000) x (20x2+20) things going on, where 20x2 refers to the 20 ratings plus the reference to the film, and the other 20 is the fact of retrieving the film name. You can compile all reports first using array location and attach the name referring to an array of film names.

Using rollapply() to find modal value

I've got panel data and have been playing around with k-means clustering. So now I've got a panel of factor values that are mostly stable but I'd like to smooth that out a bit more so that (for example) the data says "Wyoming was in group 1 in earlier years, moved into group 2, then moved into group 5" rather than "Wyoming was in group 1,1,1,2,3,2,2,5,5,5".
So the approach I'm taking is to use rollapply() to calculate the modal value. Below is code that works to calculate the mode ("Mode()"), and a wrapper for that ("ModeR()") that (perhaps clumsily) resolves the problem of multi-modal windows by randomly picking a mode. All that is fine, but when I put it into rollapply() I'm getting problems.
Mode <- function(vect){ # take a vector as input
temp <- as.data.frame(table(vect))
temp <- arrange(temp,desc(Freq)) # from dplyr
max.f <- temp[1,2]
temp <- filter(temp,Freq==max.f) # cut out anything that isn't modal
return(temp[,1])
}
ModeR <- function(vect){
out <- Mode(vect)
return(out[round(runif(1,min=0.5000001,max=length(out)+0.499999999))])
}
temp <- round(runif(20,min=1,max=10)) # A vector to test this out on.
cbind(temp,rollapply(data=temp,width=5,FUN=ModeR,fill=NA,align="right"))
which returned:
temp
[1,] 5 NA
[2,] 6 NA
[3,] 5 NA
[4,] 5 NA
[5,] 7 1
[6,] 6 1
[7,] 5 1
[8,] 5 1
[9,] 3 2
[10,] 1 3
[11,] 5 3
[12,] 7 3
[13,] 5 3
[14,] 4 3
[15,] 3 3
[16,] 4 2
[17,] 8 2
[18,] 5 2
[19,] 6 3
[20,] 6 3
Compare that with:
> ModeR(temp[1:5])
[1] 5
Levels: 5 6 7
> ModeR(temp[2:6])
[1] 6
Levels: 5 6 7
So it seems like the problem is in how ModeR is being applied in rollapply(). Any ideas?
Thanks!
Rick
Thanks to /u/murgs! His comment pointed me in the right direction (in addition to helping me streamline ModeR() using sample()).
ModeR() as written above returns a factor (as does Mode()). I need it to be a number. I can fix this by updating my code as follows:
Mode <- function(vect){ # take a vector as input
temp <- as.data.frame(table(vect))
temp <- arrange(temp,desc(Freq))
max.f <- temp[1,2]
temp <- filter(temp,Freq==max.f) # cut out anything that isn't modal
return(as.numeric(as.character(temp[,1]))) #HERE'S THE BIG CHANGE
}
ModeR <- function(vect){
out <- Mode(vect)
return(out[sample(1:length(out),1)]) #HERE'S SOME IMPROVED CODE!
}
Now rollapply() does what I expected it to do! There's still that weird as.character() bit (otherwise it rounds down the number). I'm not sure what's going on there, but the code works so I won't worry about it...

R Pooled DataFrame analysis

I'm trying to perform several analysis on subsets of data in a dataframe in R, and i was wondering if there is generic way for doing this.
Say, I have a dataframe like:
one two three four
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 11 18
[4,] 4 9 11 19
[5,] 5 10 15 20
how could I apply some computation (e.g. cumulative counting) based upon values in col "one" condition upon (grouped by) the value in col "three".
That is, I wanna do stuff to one column, based upon grouping in another column. I can do this with loops, but I feel there might be standard ways to do this all at once.
thank you in advance!
ddply(data, .(coln), Stat) does the trick exactly

Perform 'cross product' of two vectors, but with addition

I am trying to use R to perform an operation (ideally with similarly displayed output) such as
> x<-1:6
> y<-1:6
> x%o%y
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
where each entry is found through addition not multiplication.
I would also be interested in creating the 36 ordered pairs (1,1) , (1,2), etc...
Furthermore, I want to use another vector like
z<-1:4
to create all the ordered triplets possible between x, y, and z.
I am using R to look into likelihoods of possible total when rolling dice with varied numbers of sizes.
Thank you for all your help! This site has been a big help to me. I appreciate anyone that takes the time to answer a stranger's question.
UPDATE So I found that `outer(x,y,'+') will do what I wanted first. But I still don't know how to create ordered pairs or ordered triplets.
Your first question is easily handled by outer:
outer(1:6,1:6,"+")
For the others, I suggest you try expand.grid, although there are specialized combination and permutation functions out there as well if you do a little searching.
expand.grid can answer your second question:
expand.grid(1:6,1:6)
expand.grid(1:6,1:6,1:4)

Resources