R Compute Statistics on Lagged Partitions - r

I have a data.frame with one column containing categorical data, one column containing dates, and one column containing numeric values. For simplicity, see the sample below:
A B C
1 L 2015-12-01 5.7
2 M 2015-11-30 2.1
3 K 2015-11-01 3.2
4 L 2015-10-05 5.7
5 M 2015-12-05 1.2
6 L 2015-11-15 2.3
7 L 2015-12-03 4.4
I would like to, for each category in A, compute a lagging average (e.g. average of the previous 30 days' values in column C).
I cannot for the life of me figure this one out. I have tried using sapply and a custom function that subsets the data.frame on category and date (or a deep copy of it) and returns the statistic (think mean or sd) and that works fine for single values, but it returns all NA's from inside sapply.
Any help you can give is appreciated.

This could be done more compactly, but here I have drawn it out to make it easiest to understand. The core is the split, lapply/apply, and then putting it back together. It uses a date window rather than a solution based on sorting, so it is very general. I also put the object back to its original order to enable direct comparison.
# set up the data
set.seed(100)
# create a data.frame with about a two-month period for each category of A
df <- data.frame(A = rep(c("K", "L", "M"), each = 60),
B = rep(seq(as.Date("2015-01-01"), as.Date("2015-03-01"), by="days"), 3),
C = round(runif(180)*6, 1))
head(df)
## A B C
## 1 K 2015-01-01 1.8
## 2 K 2015-01-02 1.5
## 3 K 2015-01-03 3.3
## 4 K 2015-01-04 0.3
## 5 K 2015-01-05 2.8
## 6 K 2015-01-06 2.9
tail(df)
## A B C
## 175 M 2015-02-24 4.8
## 176 M 2015-02-25 2.0
## 177 M 2015-02-26 5.7
## 178 M 2015-02-27 3.9
## 179 M 2015-02-28 2.8
## 180 M 2015-03-01 3.6
# preserve original order
df$originalOrder <- 1:nrow(df)
# randomly shuffle the order
randomizedOrder <- order(runif(nrow(df)))
df <- df[order(runif(nrow(df))), ]
# split on A - your own data might need coercion of A to a factor
df.split <- split(df, df$A)
# set the window size
window <- 30
# compute the moving average
listD <- lapply(df.split, function(tmp) {
apply(tmp, 1, function(x) mean(tmp$C[tmp$B <= as.Date(x["B"]) & tmp$B (as.Date(x["B"]) - window)]))
})
# combine the result with the original data
result <- cbind(do.call(rbind, df.split), rollingMean = unlist(listD))
# and tidy up:
# return to original order
result <- result[order(result$originalOrder), ]
result$originalOrder <- NULL
# remove the row names
row.names(result) <- NULL
result[c(1:5, 59:65), ]
## A B C rollingMean
## 1 K 2015-01-01 1.8 1.800000
## 2 K 2015-01-02 1.5 1.650000
## 3 K 2015-01-03 3.3 2.200000
## 4 K 2015-01-04 0.3 1.725000
## 5 K 2015-01-05 2.8 1.940000
## 59 K 2015-02-28 3.6 3.080000
## 60 K 2015-03-01 1.3 3.066667
## 61 L 2015-01-01 2.8 2.800000
## 62 L 2015-01-02 3.9 3.350000
## 63 L 2015-01-03 5.8 4.166667
## 64 L 2015-01-04 4.1 4.150000
## 65 L 2015-01-05 2.7 3.860000

Related

For loop with a function for a moving/rolling average?

Essentially (in R), I want to apply a moving average function over a period of time (eg. date and time variables) to see how a particular metric changes over time. However, the metric in itself is a function. The scores can either be 1 (pro), 0 (neutral), or -1 (neg). The function for the metric is:
function(pro, neg, total) {
x <- (pro / total) * 100
y <- (neg / total) * 100
x - y
}
So the percentage of 1's minus the percentage of -1's is the metric value.
Given timestamps for each recorded score, I want to evaluate the metric as a moving average across all rows. I assumed that a for loop would be the best way to apply this but I am stuck in how to do this.
Does anyone have any thoughts / advice?
As mentioned in the comments, rollapply() from zoo is a good option. I took the liberty to generate some example data, apologies if it doesn't resemble yours.
library(zoo)
f <- function(x, l) {
p <- sum(x == 1) / l
n <- sum(x == -1) / l
(p - n)*100
}
# Or more efficiently
f <- function(x, l=length(x)) {
(sum(x)/l)*100
}
set.seed(1)
N <- 25
dtf <- data.frame(time=as.Date(15000+(1:N)), score=sample(-1:1, N, rep=TRUE))
score <- read.zoo(dtf)
l <- 8
zts <- cbind(score, rolling=rollapply(score, l, f, l, fill=NA))
zts
# score rolling
# 2011-01-27 -1 NA
# 2011-01-28 0 NA
# 2011-01-29 0 NA
# 2011-01-30 1 12.5
# 2011-01-31 -1 25.0
# 2011-02-01 1 12.5
# 2011-02-02 1 0.0
# 2011-02-03 0 -25.0
# 2011-02-04 0 0.0
# 2011-02-05 -1 -12.5
# 2011-02-06 -1 -12.5
# 2011-02-07 -1 -12.5
# 2011-02-08 1 0.0
# 2011-02-09 0 25.0
# 2011-02-10 1 37.5
# 2011-02-11 0 62.5
# 2011-02-12 1 62.5
# 2011-02-13 1 50.0
# 2011-02-14 0 37.5
# 2011-02-15 1 25.0
# 2011-02-16 1 0.0
# 2011-02-17 -1 NA
# 2011-02-18 0 NA
# 2011-02-19 -1 NA
# 2011-02-20 -1 NA

Pull subset of rows of dataframe based on conditions from other columns

I have a dataframe like the one below:
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","call"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Tickers Type Strike Other
1: A put 35.0 6
2: A call 37.5 5
3: A put 37.5 13
4: B call 10.0 15
5: B call 11.0 12
6: B put 11.0 4
7: B call 12.0 20
8: D put 40.0 7
9: D call 40.0 11
10: D put 42.0 10
11: D call 42.0 1
I am trying to analyze a subset of the data. The subset I would like to take is data where the ticker and strike are the same. But I also only want to grab this data if both a put and a call exists under type. With the data above for example, I would like to return the following result:
x[c(2,3,5,6,8:11),]
Tickers Type Strike Other
1: A call 37.5 5
2: A put 37.5 13
3: B call 11.0 12
4: B put 11.0 4
5: D put 40.0 7
6: D call 40.0 11
7: D put 42.0 10
8: D call 42.0 1
I'm not sure what the best way to go about doing this. My thought process is that I should create another column vector like
x$id <- paste(x$Tickers,x$Strike,sep="_")
Then use this vector to only pull values where there are multiple ids.
x[x$id %in% x$id[duplicated(x$id)],]
Tickers Type Strike Other id
1: A call 37.5 5 A_37.5
2: A put 37.5 13 A_37.5
3: B call 11.0 12 B_11
4: B put 11.0 4 B_11
5: D put 40.0 7 D_40
6: D call 40.0 11 D_40
7: D put 42.0 10 D_42
8: D call 42.0 1 D_42
I'm not sure how efficient this is, as my actual data consists of a lot more rows.
Also, this solution does not check for the type condition of there being one put and one call.
also the wording of the title could be a lot better, I apologize
EDIT::: having checked out this post Finding ALL duplicate rows, including "elements with smaller subscripts"
I could also use this solution:
x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]
You could try something like:
x[, select := (.N >= 2 & all(c("put", "call") %in% unique(Type))), by = .(Tickers, Strike)][which(select)]
# Tickers Type Strike Other select
#1: A call 37.5 17 TRUE
#2: A put 37.5 16 TRUE
#3: B call 11.0 11 TRUE
#4: B put 11.0 20 TRUE
#5: D put 40.0 1 TRUE
#6: D call 40.0 12 TRUE
#7: D put 42.0 6 TRUE
#8: D call 42.0 2 TRUE
Another idea might be a merge:
x[x, on = .(Tickers, Strike), select := (length(Type) >= 2 & all(c("put", "call") %in% Type)),by = .EACHI][which(select)]
I'm not entirely sure how to get around the group-by operations since you want to make sure for each group they have both "call" and "put". I was thinking about using keys, but haven't been able to incorporate the "call"/"put" aspect.
An edit to your data to give a case where both put and call does not exist (I changed the very last "call" to "put"):
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","put"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Since you are using data.table, you can use the built in counter .N along with by variables to count groups and subset with that. If by counting Type you can reliably determine there is both put and call, this could work:
x[, `:=`(n = .N, types = uniqueN(Type)), by = c('Tickers', 'Strike')][n > 1 & types == 2]
The part enclosed in the first set of [] does the counting, and then the [n > 1 & types == 2] does the subsetting.
I am not a user of package data.table so this code is base R only.
agg <- aggregate(Type ~ Tickers + Strike, data = x, length)
result <- merge(x, subset(agg, Type > 1)[1:2], by = c("Tickers", "Strike"))[, c(1, 3, 2, 4)]
result
# Tickers Type Strike Other
#1: A call 37.5 17
#2: A put 37.5 7
#3: B call 11.0 14
#4: B put 11.0 20
#5: D put 40.0 15
#6: D call 40.0 2
#7: D put 42.0 8
#8: D call 42.0 1
rm(agg) # final clean up

How to lookup by row and column > column names?

I am thinking how to lookup time data by University name (first row: A,...,F), Field name (first column: Acute,...,En) and/or graduation time (time) in the following file DS.csv.
I am thinking dplyr approach but could not expand numerical ID lookup (thread answer How to overload function parameters in R?) to the lookup by three variables.
Challenges
How to lookup by the first row? Maybe, something similar to $1 == "A".
How to Expand university lookup to two columns? Pseudocode $1 == "A" is about the second and third column, ..., $1 == "F" about two last columns.
Do lookup by 3 lookup criterias: first row (no header), first column with header Field and for the header time. Pseudocode
times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
Data DS.csv has the data. The first column denotes experiment. The data below is in crosstab format such that
,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3
and in the straight table format such that
Field,time,T,Experiment
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F
Pseudocode
library('dplyr')
ow <- options("warn")
DF <- read.csv("/home/masi/CSV/DS.csv", header = T)
# Lookup by first row, Lookup by Field, lookup by Field's first column?
times <- getTimes($1 == "A", Field == "Ane", by = "desc(time)")
Expected output: 9
Expected output generalised: a, b, c, ...
## Data where values marked by small letters a,b,c, ... are wanted
# uni1 uni2 ...
# time T time T ...
#Field1 a c
#Field2 b ...
#... ...
R: 3.3.3 (2017-03-06)
OS: Debian 8.7
Hardware: Asus Zenbook UX303UA
Taking your initial raw data as starting point:
# read the data & skip 1st & 2nd line which contain only header information
DF <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, skip=2)
# read the first two lines which contain the header information
headers <- read.csv(text=",A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3", header=FALSE, stringsAsFactors=FALSE, nrow=2)
# extract the university names for the 'headers' data.frame
universities <- unlist(headers[1,])
universities <- universities[universities != '']
# create column names from the 'headers' data.frame
vec <- headers[2,][headers[2,] == 'T']
headers[2,][headers[2,] == 'T'] <- paste0(vec, seq_along(vec))
names(DF) <- paste0(headers[2,],headers[1,])
You dataframe now looks as follows:
> DF
Field timeA T1 timeB T2 timeC T3 timeD T4 timeE T5 timeF T6
1: Acute 0.0 0 8.3 1 7.5 1 8.6 2 0.0 0 8.3 4
2: Ane 9.0 120 7.7 26 7.9 43 7.8 77 7.9 60 8.2 326
3: En 15.6 2 12.9 1 0.0 0 0.0 0 14.3 1 14.6 4
4: Fo 9.2 2 0.0 0 5.4 1 0.0 0 0.0 0 7.9 3
As it is better to transform you data into long format:
library(data.table)
DT <- melt(setDT(DF), id = 1,
measure.vars = patterns('^time','^T'),
variable.name = 'university',
value.name = c('time','t')
)[, university := universities[university]][]
Now your data looks like:
> DT
Field university time t
1: Acute A 0.0 0
2: Ane A 9.0 120
3: En A 15.6 2
4: Fo A 9.2 2
5: Acute B 8.3 1
6: Ane B 7.7 26
7: En B 12.9 1
8: Fo B 0.0 0
9: Acute C 7.5 1
10: Ane C 7.9 43
11: En C 0.0 0
12: Fo C 5.4 1
13: Acute D 8.6 2
14: Ane D 7.8 77
15: En D 0.0 0
16: Fo D 0.0 0
17: Acute E 0.0 0
18: Ane E 7.9 60
19: En E 14.3 1
20: Fo E 0.0 0
21: Acute F 8.3 4
22: Ane F 8.2 326
23: En F 14.6 4
24: Fo F 7.9 3
Now you can select the required info:
DT[university == 'A' & Field == 'Ane']
which gives:
Field university time t
1: Ane A 9 120
Several dplyr examples to filter the data:
library(dplyr)
DT %>%
filter(Field=="En" & t > 1)
gives:
Field university time t
1 En A 15.6 2
2 En F 14.6 4
Or:
DT %>%
arrange(desc(time)) %>%
filter(time < 14 & t > 3)
gives:
Field university time t
1 Ane A 9.0 120
2 Acute F 8.3 4
3 Ane F 8.2 326
4 Ane C 7.9 43
5 Ane E 7.9 60
6 Ane D 7.8 77
7 Ane B 7.7 26
Change your crosstab
,A,,B,,C,,D,,E,,F,
Field,time,T,time,T,time,T,time,T,time,T,time,T
Acute,0,0,8.3,1,7.5,1,8.6,2,0,0,8.3,4
Ane,9,120,7.7,26,7.9,43,7.8,77,7.9,60,8.2,326
En,15.6,2,12.9,1,0,0,0,0,14.3,1,14.6,4
Fo,9.2,2,0,0,5.4,1,0,0,0,0,7.9,3
into a straight data format
Field,time,T,Experiment
Acut,0,0,A
An,9,120,A
En,15.6,2,A
Fo,9.2,2,A
Acute,8.3,1,B
An,7.7,26,B
En,12.9,1,B
Fo,0,0,B
Acute,7.5,1,C
An,7.9,43,C
En,0,0,C
Fo,5.4,1,C
Acute,8.6,2,D
An,7.8,77,D
En,0,0,D
Fo,0,0,D
Acute,0,0,E
An,7.9,60,E
En,14.3,1,E
Fo,0,0,E
Acute,8.3,4,F
An,8.2,326,F
En,14.6,4,F
Fo,7.9,3,F
where I used Vim.csv plugin and visual-block mode.
Multiple ways to do the selection
This is very easy to do in multiple ways after tidying the data into easy-to-format straight table (not crosstab), I would prefer SQL. I demonstare SQLDDF-package below that is very inefficient with large data but this is small so it will work.
Also instead of the very inefficient builtin functions, such as read.csv, I would refer the very efficient fread in data.table package for reading files.
SQLDF
> library(data.table);
> a<-fread("~/DS_straight_table.csv");
> sqldf("select time from a where Experiment='A' and Field='An'")
time
1 9
Other without sqldf
> library(data.table);
> a<-fread("~/DS_straight_table.csv");
> a[Experiment=='A' & Field=='An']
Field time T Experiment
1: An 9 120 A
Using the "Tall" (straight table) format and library dplyr. Your data only has one value per Field, Experiment.
library(dplyr)
## this is the more general result
df %>%
group_by(Field, Experiment) %>%
top_n(1, wt = -time)
## example function
getTimes<- function(data, field, experiment) {
data %>%
filter(Field == field, Experiment == experiment) %>%
top_n(1, wt = -time)
}
getTimes(df, 'An', 'A')
# Field time T Experiment
# 1 An 9 120 A

How to calculate mean with expending window

I have a data frame below. I wondered how to calculate the mean for column 'value_t' by expanding window starting from '2014-1-5'. e.g. val(1)=mean(1:5), value(2)=mean(1:6), value(3)=mean(1:7). I hope the algorithm is efficient (w/o loop).
df<-data.frame(date_t=paste('2014-01-',1:15,sep=""),value_t=1:15)
> df
date_t value_t
1 2014-01-1 1
2 2014-01-2 2
3 2014-01-3 3
4 2014-01-4 4
5 2014-01-5 5
6 2014-01-6 6
7 2014-01-7 7
8 2014-01-8 8
9 2014-01-9 9
10 2014-01-10 10
11 2014-01-11 11
12 2014-01-12 12
13 2014-01-13 13
14 2014-01-14 14
15 2014-01-15 15
How about sapply(5:NROW(df), function(.) mean(df$value_t[1:.]))? It involves kind of a loop (sapply) but it should be reasonable fast.
Have a look at this
df$val <- cumsum(df$value_t) / 1:nrow(df)
df$val[1:4] <- NA
# date_t value_t val
# 2014-01-1 1 NA
# 2014-01-2 2 NA
# 2014-01-3 3 NA
# 2014-01-4 4 NA
# 2014-01-5 5 3.0
# 2014-01-6 6 3.5
# 2014-01-7 7 4.0
# 2014-01-8 8 4.5
# 2014-01-9 9 5.0
# 2014-01-10 10 5.5
# 2014-01-11 11 6.0
# 2014-01-12 12 6.5
# 2014-01-13 13 7.0
# 2014-01-14 14 7.5
# 2014-01-15 15 8.0
If you just want the vector, and you don't want to tamper with df, do
val <- (cumsum(df$value_t) / 1:nrow(df))[-(1:4)]
# 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
The sapply(...) solution is faster than the for(...) loop, but only just (about 2% - well within the margin of error). It turns out that extracting the column from the data frame at every step slows things down considerably. If you grab that column as a vector first, you get a ~25% improvement.
df <- data.frame(value=1:1e4)
f.sapply <- function() sapply(5:nrow(df), function(.) mean(df$value[1:.]))
f.loop <- function() {result <- numeric(nrow(df)-4)
for (i in 5:nrow(df)) result[i-4] <- mean(df$value[1:i])
result
}
f.vec <- function() {vec<-df$value
sapply(5:nrow(df), function(.) mean(vec[1:.]))
}
# do they produce identical results?
identical(f.sapply(),f.loop())
# [1] TRUE
identical(f.sapply(),f.vec())
# [1] TRUE
# which is faster?
library(microbenchmark)
microbenchmark(f.sapply(),f.loop(),f.vec())
# Unit: milliseconds
# expr min lq median uq max neval
# f.sapply() 904.2934 929.7361 947.7621 978.8775 1496.455 100
# f.loop() 927.5288 950.3632 963.5926 1012.2407 1347.889 100
# f.vec() 669.5615 697.3639 711.1498 751.2634 1060.056 100

How to approach loop with increasing variable name in R

My dataset is currently a set of answers to twenty questions with 300 observations.
Each of the questions are labled q1, q2, q3, etc.
Each observation gives a 1 to 10 response.
The code below is what I have. What I want is for the q1 to change when the counter changes in R.
totaltenq1 <- sum(UpdatedQualtrix$tenq1)
totalnineq1 <- sum(UpdatedQualtrix$nineq1)
totaleightq1 <- sum(UpdatedQualtrix$eightq1)
totalsevenq1 <- sum(UpdatedQualtrix$sevenq1)
totalsixq1 <- sum(UpdatedQualtrix$sixq1)
totalfiveq1 <- sum(UpdatedQualtrix$fiveq1)
totalfourq1 <- sum(UpdatedQualtrix$fourq1)
totalthreeq1 <- sum(UpdatedQualtrix$threeq1)
totaltwoq1 <- sum(UpdatedQualtrix$twoq1)
totaloneq1 <- sum(UpdatedQualtrix$oneq1)
totaltenq2 <- sum(UpdatedQualtrix$tenq2)
totalnineq2 <- sum(UpdatedQualtrix$nineq2)
totaleightq2 <- sum(UpdatedQualtrix$eightq2)
totalsevenq2 <- sum(UpdatedQualtrix$sevenq2)
totalsixq2 <- sum(UpdatedQualtrix$sixq2)
totalfiveq2 <- sum(UpdatedQualtrix$fiveq2)
totalfourq2 <- sum(UpdatedQualtrix$fourq2)
totalthreeq2 <- sum(UpdatedQualtrix$threeq2)
totaltwoq2 <- sum(UpdatedQualtrix$twoq2)
totaloneq2 <- sum(UpdatedQualtrix$oneq2)
I would like to have code that is
count = 20
for (i in 1:count){
totaltenq(i) <- sum(UpdatedQualtrix$tenq(i)
totalninq(I) <- sum(UpdatedQuatlrix$nineq(I)
etc
}
That way, when I do it again in the future, I can tell R how many questions it has the next time and it will change it. That way I don't have 10,000 lines of code from copying and pasting my code 20 times.
I don't think you need any loops at all. It just all depends on how you want to store those value. I'm a big fan of not having more variables than necessary.
Here's some sample data. I'll just make 10 rows (observations) with values 1-5.
set.seed(15)
Q<-3
numbs<-c("one","two","three","four","five","six","seven","eight","nine","ten")
qs<-paste0("q",1:Q)
qnumbs <- outer(numbs, qs, paste0)
UpdatedQualtrix <-data.frame(ID=1:10,
matrix(sample(1:5, 10*length(numbs)*Q, replace=T), nrow=10))
colnames(UpdatedQualtrix) <- c("ID",qnumbs)
Now I can sum up each of the columns with
( Qsums<-colSums(UpdatedQualtrix[, qnumbs]) )
# oneq1 twoq1 threeq1 fourq1 fiveq1 sixq1 sevenq1 eightq1 nineq1 tenq1
# 37 35 29 26 32 39 40 33 40 26
# oneq2 twoq2 threeq2 fourq2 fiveq2 sixq2 sevenq2 eightq2 nineq2 tenq2
# 37 31 19 29 25 38 36 35 28 27
# oneq3 twoq3 threeq3 fourq3 fiveq3 sixq3 sevenq3 eightq3 nineq3 tenq3
# 37 30 31 31 24 31 29 31 25 41
And if we want the totals per question we can do
sapply(qs, function(a, b) sum(Qsums[paste0(b,a)]), b=numbs)
# q1 q2 q3
# 337 305 310
Or if we want the counts per response we can do
sapply(numbs, function(a, b) sum(Qsums[paste0(a,b)]), b=qs)
# one two three four five six seven eight nine ten
# 111 96 79 86 81 108 105 99 93 94
You might want to also consider melting your data since it's so structured. You can use the reshape2 library to help. You can do
require(reshape2)
mm <- melt(UpdatedQualtrix, id.vars="ID")
mm <- cbind(mm[,-2], colsplit(mm$variable, "q", c("resp","q")))
mm$resp <- factor(mm$resp, levels=numbs)
to turn your data into a "tall" format so each value has it's own row with a column for ID, value, response and question.
str(mm)
# 'data.frame': 300 obs. of 4 variables:
# $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
# $ value: int 4 1 5 4 2 5 5 2 4 5 ...
# $ resp : Factor w/ 10 levels "one","two","three",..: 1 1 1 1 1 1 1 1 1 1 ...
# $ q : int 1 1 1 1 1 1 1 1 1 1 ...
And then we can more easily do other calculations. Of you want the total scores by question, you could do
aggregate(value~q, mm, sum)
# q value
# 1 1 337
# 2 2 305
# 3 3 310
If you wanted the average value for each question/response you could do
with(mm, tapply(value, list(q,resp), mean))
# one two three four five six seven eight nine ten
# 1 3.7 3.5 2.9 2.6 3.2 3.9 4.0 3.3 4.0 2.6
# 2 3.7 3.1 1.9 2.9 2.5 3.8 3.6 3.5 2.8 2.7
# 3 3.7 3.0 3.1 3.1 2.4 3.1 2.9 3.1 2.5 4.1

Resources