I have a 207x7 xts object (called temp). I have a 207x3 matrix (called ac.topn), each row of which contains the columns I'd like from the corresponding row in the xts object.
For example, given the following top two rows of temp and ac.topn,
temp
v1 v2 v3 v4 v5 v6 v7
1997-09-30 14.5 8.7 -5.8 2.6 4.7 1.9 17.2
1997-10-31 6.0 -2.0 -25.7 2.9 4.9 9.6 8.4
head(ac.topn)
Rank1 Rank2 Rank3
1997-09-30 7 4 2
1997-10-31 6 5 7
I would like to get the result:
1997-09-30 17.2 2.6 8.7 (elements 7, 4, and 2 from the first row of temp)
1997-10-31 9.6 4.9 8.4 (elements 6, 5, 7 from the second row of temp)
My first attempt was temp[,ac.topn]. I've browsed for help, but am struggling to word my request effectively.
Thank you.
Well, this works, but I've got to think there's a better way...
result <- do.call(rbind,lapply(index(temp),function(i)temp[i,ac.topn[i]]))
colnames(result) <- colnames(as.topn)
result
# Rank1 Rank2 Rank3
# 1997-09-30 17.2 2.6 8.7
# 1997-10-31 9.6 4.9 8.4
You may subset a matrix version of the xts object, using indexing via a numeric matrix:
m <- as.matrix(temp)
cols <- as.vector(ac.topn)
rows <- rep(1:nrow(ac.topn), ncol(ac.topn))
vals <- m[cbind(rows, cols)]
xts(x = matrix(vals, nrow = nrow(temp)), order.by = index(temp))
# [,1] [,2] [,3]
# 1997-09-30 17.2 2.6 8.7
# 1997-10-31 9.6 4.9 8.4
However, I say the same as #jlhoward: I've got to think there's a better way...
Related
I'm trying to carry out the following action on the columns of a dataframe (df1):
term1+term2+term3*req_no
req_no is a range of numbers: 20:24
df1:
ID term1 term2 term3
X299 1.2 2.3 0.12
X300 1.4 0.6 2.4
X301 0.3 1.6 1.2
X302 0.9 0.6 0.4
X303 0.3 1.8 0.3
X304 1.3 0.3 2.1
I need help t get this output and here's my attempt:
Required output:
ID 20 21 22 23 24
X299 5.9 6.02 6.14 6.26 6.38
X300 50 52.4 54.8 57.2 59.6
X301 25.9 27.1 28.3 29.5 30.7
X302 9.5 9.9 10.3 10.7 11.1
X303 8.1 8.4 8.7 9 9.3
X304 43.6 45.7 47.8 49.9 52
Here's:
results <- list()
req_no <- 20:25
for(i in 1:nrow(df1){
for(j in rq_no){
res <- term1+term2+term3*j
results[j] <- res
}
results[[i]]
}
results2 <- do.call("rbind",result)
Help will be appreciated.
Here are a couple different approaches, though neither as succinct as Parfait's. Sample data:
df <- data.frame(ID=c("X299", "X300"),
term1=c(1.2, 1.4),
term2=c(2.3, 0.6),
term3=c(0.12, 2.4))
req_no <- 20:25
Loop approach
Your initial approach is headed in the right direction, but in the future, it would help to specify exactly what your error or problem is. For an iterated and perhaps easier-to-read approach, here's one answer:
results <- matrix(data=NA, nrow=nrow(df), ncol=length(req_no)) # Empty matrix to store our results
colnames(results) <- req_no # Optional; name columns based off of req_no values
for(i in 1:nrow(df)) {
# Do the calculation we want; returns a vector length 6
res <- df[i,]$term1 + df[i,]$term2 + (df[i,]$term3 * req_no)
# Save results for row i of df into row i of results matrix
results[i,] <- res
}
# Now bind the columns (named 20 through 25) to the respective rows of df
output <- cbind(df, results)
output
From your initial attempt, note:
We only do one loop, since it is easy to multiply by a vector in R
There are a few ways to subset data from a data frame in R. In this case, df[i,] gets everything in the i-th row, while $termX gets value in the column named termX
Using a results matrix instead of a list makes it very easy to copy the temporary computations (for each row) into rows of the matrix
Rather than rbind() (row bind), we want cbind() (column bind) to bind those results to new columns of the original rows.
Output:
ID term1 term2 term3 20 21 22 23 24 25
1 X299 1.2 2.3 0.12 5.9 6.02 6.14 6.26 6.38 6.5
2 X300 1.4 0.6 2.40 50.0 52.40 54.80 57.20 59.60 62.0
Dplyr/purrr functions
This could also be solved using tidy functions. In essence it's a pretty similar approach to Parfait's answer, but I've made the steps a bit more verbose to see what's going on.
# Use purrr's map functions to do the computation we want
nested_df <- df %>%
# Make new column holding term3 * req_no (stores a vector in each new cell)
mutate(term3r = map(term3, ~ .x * req_no)) %>%
# Make new column which sums the three columns of interest (stores a vector in each new cell)
mutate(sum = pmap(list(term1, term2, term3r), ~ ..1 + ..2 + ..3))
# "Unnest" those vectors which store our sums, and keep only those and ID
output <- nested_df %>%
# Creates six new columns (named ...1 to ...6) with the elements of each sum
unnest_wider(sum) %>%
# Keeps only the output data and IDs
select(ID, ...1:...6)
output
Output:
# A tibble: 2 x 7
ID ...1 ...2 ...3 ...4 ...5 ...6
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 X299 5.9 6.02 6.14 6.26 6.38 6.5
2 X300 50 52.4 54.8 57.2 59.6 62
Consider directly assigning new columns with sapply using your formula:
df[paste0(req_no)] <- sapply(req_no, function(r) with(df, term1 + term2 + term3 * r))
df
# ID term1 term2 term3 20 21 22 23 24
# 1 X299 1.2 2.3 0.12 5.9 6.02 6.14 6.26 6.38
# 2 X300 1.4 0.6 2.40 50.0 52.40 54.80 57.20 59.60
# 3 X301 0.3 1.6 1.20 25.9 27.10 28.30 29.50 30.70
# 4 X302 0.9 0.6 0.40 9.5 9.90 10.30 10.70 11.10
# 5 X303 0.3 1.8 0.30 8.1 8.40 8.70 9.00 9.30
# 6 X304 1.3 0.3 2.10 43.6 45.70 47.80 49.90 52.00
I have a small problem. I have a dataset with 8208 rows of data. It's a single column of data, I want to take every n rows as a block and add this to a new data frame.
So, for example:
newdf has column 1 to column 23.
column 1 is composed of rows 289:528 from the original dataset
column 2 is composed of rows 625:864 from the original dataset
And so on. The "block" size is 239 rows, the jump between blocks is every 336 rows.
I can do this manually, but it just becomes tedious. I have to repeat this entire procedure for another 11 sets of data so obviously a more automated approach would be preferable.
The trick here is to create an index of integers that refer to the row numbers you want to keep. This is simple enough with some use of rep, sequences and R's recycling rule.
Let me demonstrate using iris. Say you want to skip 25 rows, then return 3 rows:
skip <- 25
take <- 3
total <- nrow(iris)
reps <- total %/% (skip + take)
index <- rep(0:(reps-1), each=take) * (skip + take) + (1:take) + skip
The index now is:
index
[1] 26 27 28 54 55 56 82 83 84 110 111 112 138 139 140
And the rows of iris:
iris[index, ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
54 5.5 2.3 4.0 1.3 versicolor
55 6.5 2.8 4.6 1.5 versicolor
56 5.7 2.8 4.5 1.3 versicolor
82 5.5 2.4 3.7 1.0 versicolor
83 5.8 2.7 3.9 1.2 versicolor
84 6.0 2.7 5.1 1.6 versicolor
110 7.2 3.6 6.1 2.5 virginica
111 6.5 3.2 5.1 2.0 virginica
112 6.4 2.7 5.3 1.9 virginica
138 6.4 3.1 5.5 1.8 virginica
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
Update
Note the OP states the block size is 239 elements but it is clear from the examples rows indicated that the block size is 240
> length(289:528)
[1] 240
I'll leave the example below at a block length of 239, but adjust if it is really 240.
It isn't clear from the Question, but assuming that you have something like this
df <- data.frame(A = runif(8208))
a data frame with 8208 rows.
First compute the indices of the elements of A that you need to keep. This is done via
want <- sapply(seq(289, nrow(df)-239, by = 336),
function(x) x + (seq_len(239) - 1))
Then we can use the fact that R fills matrices by columns and convert the required elements of A to a matrix with 239 rows
mat <- matrix(df$A[want], nrow = 239)
This works
> all.equal(mat[,1], df$A[289:527])
[1] TRUE
but do note that I have taken a block length of 239 here (289:527) not the indices the OP quotes as that is a block size of 240 (see Update above)
If you want this is a data frame, just add
df2 <- as.data.frame(mat)
Try this:
1) Create a list of indices
lapply(seq(1, 8208, 336), function(X) X:(X+239)) -> Indices
2) Select Data
Columns <- lapply(Indices, function(X) OldDF[X,])
3) Combine selected data in columns
NewDF <- do.call(cbind, Columns)
Why not just:
as.dataframe(matrix(orig, nrow=528 )[289:528 ,])
Since the 8028 is not an exactl multiple of the row count we need to determine the columns:
> 8208/528
[1] 15.54545 # so either 15 or 16
> 8208-15*528
[1] 288 # all in the to-be-discarded section
as.dataframe(matrix(orig, nrow=528, col=15 )[289:528 ,])
Or:
as.dataframe(matrix(orig, nrow=528, col=8208 %/% 528)[289:528 ,])
I have some data to reshape in R but can not figure out how.
Here is the scenario:
I have data like this
a<- c("exam1", "exam2", "exam3","exam4")
date1<- c(8.2,4.3,6.7,3.9)
date2<- c(11.2,9.3,6.5,4.1)
date3<- c(8.2,9.1,4.3,4.4)
dr.df.a <- cbind(a,date1,date2,date3)
a date1 date2 date3
[1,] "exam1" "8.2" "11.2" "8.2"
[2,] "exam2" "4.3" "9.3" "9.1"
[3,] "exam3" "6.7" "6.5" "4.3"
[4,] "exam4" "3.9" "4.1" "4.4"
b<- c("exam1", "exam2", "exam3","exam4")
date1<- c(8.6,14.3,6.7,13.9)
date2<- c(11.2,8.3,16.5,14.1)
date3<- c(4.2,9.1,4.3,14.4)
dr.df.b <- cbind(b,date1,date2,date3)
b date1 date2 date3
[1,] "exam1" "8.6" "11.2" "4.2"
[2,] "exam2" "14.3" "8.3" "9.1"
[3,] "exam3" "6.7" "16.5" "4.3"
[4,] "exam4" "13.9" "14.1" "14.4"
mylist<–list(dr.df.a,dr.df.b)
The example is for reproducibly proposes. I get the data in this format (dr.df.a and dr.df.b) There are multiple data frames in list object.
Now I need to reshape it a way to get one single line and variable names like
exam1_date1, exam1_date2 , exam1_date3, exam2_date1,exam2_date2 ... and so on and essentially I would like to get data frame with rows of exam1_date1, exam1_date2 , exam1_date3, exam2_date1,exam2_date2 ... for every data frame in list object.
How I can reshape this data and which function should I use ?
Try this:
library(reshape2)
# convert the first row (the one defined by variable 'a' in post) into column names
dr.df.2 <- setNames(dr.df[-1,], dr.df[1, ])
m <- melt(dr.df.2)
d <- dcast(m, 1 ~ ...)[-1]
names(d) <- sub("_", "_exam", names(d)) # fix up names (optional)
Giving this:
> d
date1_exam1 date1_exam2 date1_exam3 date1_exam4 date2_exam1 date2_exam2
1 8.2 4.3 6.7 3.9 11.2 9.3
date2_exam3 date2_exam4 date3_exam1 date3_exam2 date3_exam3 date3_exam4
1 6.5 4.1 8.2 9.1 4.3 4.4
UPDATE: simplified dcast formula
If your dr.df object were a data.frame instead of a matrix, you can easily create a named vector as demonstrated below:
Create your data, but as a data.frame this time:
a <- c("exam1", "exam2", "exam3","exam4")
date1 <- c(8.2,4.3,6.7,3.9)
date2 <- c(11.2,9.3,6.5,4.1)
date3 <- c(8.2,9.1,4.3,4.4)
dr.df <- rbind(date1, date2, date3)
colnames(dr.df) <- a
dr.df <- as.data.frame(dr.df)
dr.df
# exam1 exam2 exam3 exam4
# date1 8.2 4.3 6.7 3.9
# date2 11.2 9.3 6.5 4.1
# date3 8.2 9.1 4.3 4.4
The "reshaping" step
You can now simply use stack to get the data in a long form.
dr.dfL <- data.frame(stack(dr.df), date = rownames(dr.df))
The values for the vector you want are in the "values" column, and the names for those values can be obtained using paste.
setNames(dr.dfL$values, paste(dr.dfL$ind, dr.dfL$date, sep = "_"))
# exam1_date1 exam1_date2 exam1_date3 exam2_date1 exam2_date2 exam2_date3
# 8.2 11.2 8.2 4.3 9.3 9.1
# exam3_date1 exam3_date2 exam3_date3 exam4_date1 exam4_date2 exam4_date3
# 6.7 6.5 4.3 3.9 4.1 4.4
Note that the result here is just a named vector, not a data.frame, as in the other answers.
You can use reshape from base R:
new <- reshape(dr, varying = list(c("date1","date2","date3")), direction = "long")
new$newname <- apply(new, 1, function(x) paste(x[1],paste("date",x[2],sep=""),sep="_"))
new <- new[,c("date1","newname")]
names(new) <- c("info","exam")
Outputs:
> new
info exam
1.1 8.2 exam1_date1
2.1 4.3 exam2_date1
3.1 6.7 exam3_date1
4.1 3.9 exam4_date1
1.2 11.2 exam1_date2
2.2 9.3 exam2_date2
3.2 6.5 exam3_date2
4.2 4.1 exam4_date2
1.3 8.2 exam1_date3
2.3 9.1 exam2_date3
3.3 4.3 exam3_date3
4.3 4.4 exam4_date3
I asked a question like this before but I decided to simplify my data format because I'm very new at R and didnt understand what was going on....here's the link for the question How to handle more than multiple sets of data in R programming?
But I edited what my data should look like and decided to leave it like this..in this format...
X1.0 X X2.0 X.1
0.9 0.9 0.2 1.2
1.3 1.4 0.8 1.4
As you can see I have four columns of data, The real data I'm dealing with is up to 2000 data points.....Columns "X1.0" and "X2.0" refer "Time"...so what I want is the average of "X" and "X.1" every 100 seconds based on my 2 columns of time which are "X1.0" and "X2.0"...I can do it using this command
cuts <- cut(data$X1.0, breaks=seq(0, max(data$X1.0)+400, 400))
by(data$X, cuts, mean)
But this will only give me the average from one set of data....which is "X1.0" and "X".....How will I do it so that I could get averages from more than one data set....I also want to stop having this kind of output
cuts: (0,400]
[1] 0.7
------------------------------------------------------------
cuts: (400,800]
[1] 0.805
Note that the output was done every 400 s....I really want a list of those cuts which are the averages at different intervals...please help......I just used data=read.delim("clipboard") to get my data into the program
It is a little bit confusing what output do you want to get.
First I change colnames but this is optional
colnames(dat) <- c('t1','v1','t2','v2')
Then I will use ave which is like by but with better output. I am using a trick of a matrix to index column:
matrix(1:ncol(dat),ncol=2) ## column1 is col1 adn col2...
[,1] [,2]
[1,] 1 3
[2,] 2 4
Then I am using this matrix with apply. Here the entire solution:
cbind(dat,
apply(matrix(1:ncol(dat),ncol=2),2,
function(x,by=10){ ## by 10 seconds! you can replace this
## with 100 or 400 in you real data
t.col <- dat[,x][,1] ## txxx
v.col <- dat[,x][,2] ## vxxx
ave(v.col,cut(t.col,
breaks=seq(0, max(t.col),by)),
FUN=mean)})
)
EDIT correct the cut and simplify the code
cbind(dat,
apply(matrix(1:ncol(dat),ncol=2),2,
function(x,by=10)ave(dat[,x][,1], dat[,x][,1] %/% by)))
X1.0 X X2.0 X.1 1 2
1 0.9 0.9 0.2 1.2 3.3000 3.991667
2 1.3 1.4 0.8 1.4 3.3000 3.991667
3 2.0 1.7 1.6 1.1 3.3000 3.991667
4 2.6 1.9 2.2 1.6 3.3000 3.991667
5 9.7 1.0 2.8 1.3 3.3000 3.991667
6 10.7 0.8 3.5 1.1 12.8375 3.991667
7 11.6 1.5 4.1 1.8 12.8375 3.991667
8 12.1 1.4 4.7 1.2 12.8375 3.991667
9 12.6 1.8 5.4 1.2 12.8375 3.991667
10 13.2 2.1 6.3 1.3 12.8375 3.991667
11 13.7 1.6 6.9 1.1 12.8375 3.991667
12 14.2 2.2 9.4 1.3 12.8375 3.991667
13 14.6 1.8 10.0 1.5 12.8375 10.000000
I have a data frame laid out in the follwing manner:
Species Trait.p Trait.y Trait.z
a 20.1 7.2 14.1
b 20.4 8.3 15.2
b 19.2 6.8 13.9
I would like to apply, for each species combination, (Xa) - (Xb) where is X is the trait value and the letter is the species and Xa > Xb. I.e has to be such that the larger value of each respective species combination has to come first, calculated for every trait
Would this be a multi-step process?
An example output could be
Combination Trait.p Trait.y Trait.z
a/b 0.3 1.1 1.1
I assumed you choose the largest value but David brings up a good point. I doubt this is the best approach but I think it gives you what you're after. Note I added a c as I'm sure your problem is a bit more complex that just a and b:
dat <- read.table(text="Species Trait.p Trait.y Trait.z
a 20.1 7.2 14.1
b 20.4 8.3 15.2
b 19.2 6.8 13.9
c 14.2 3.8 11.9", header=T)
li <- lapply(split(dat, dat$Species), function(x) apply(x[, -1], 2, max))
com <- expand.grid(names(li), names(li))
inds <- com[com[, 1] != com[, 2], ]
inds <- t(apply(inds, 1, sort))
inds <- inds[!duplicated(inds), ]
ans <- lapply(1:nrow(inds), function(i) {
abs(li[[inds[i, 1]]]-li[[inds[i, 2]]])
})
cbind(Combination = paste(inds[, 1], inds[, 2], sep="/"),
as.data.frame(do.call(rbind, ans)))
This gives us:
Combination Trait.p Trait.y Trait.z
1 a/b 0.3 1.1 1.1
2 a/c 5.9 3.4 2.2
3 b/c 6.2 4.5 3.3
Sorry for the lack of annotation but I'm heading to class.