I have a data frame where each observation is comprehended in two columns. In this way, columns 1 and 2 represents the individual 1, 3 and 4 the individual 2 and so on.
Basically what I want to do is to add two contigous columns so I have the individual real score.
In this example V1 and V2 represent individual I and V3 and V4 represent individual II. So for the result data frame I will have the half of columns, the same number of rows and each value will be the addition of each value between two contigous colums.
Data
V1 V2 V3 V4
1 0 0 1 1
2 1 0 0 0
3 0 1 1 1
4 0 1 0 1
Desire Output
I II
1 0 2
2 1 0
3 1 2
4 1 1
I tried something like this
a <- data.frame(V1= c(0,1,0,0),V2=c(0,0,1,1),V3=c(1,0,1,0),V4=c(1,0,1,1))
b <- data.frame(NA, nrow = nrow(a), ncol = ncol(data))
for (i in seq(2,ncol(a),by=2)){
for (k in 1:nrow(a)){
b[k,i] <- a[k,i] + a[k,i-1]
}
}
b <- as.data.frame(b)
b <- b[,-c(seq(1,length(b),by=2))]
Is there a way to make it simplier?
We could use split.default to split the data and then do rowSums by looping over the list
sapply(split.default(a, as.integer(gl(ncol(a), 2, ncol(a)))), rowSums)
1 2
[1,] 0 2
[2,] 1 0
[3,] 1 2
[4,] 1 1
You can use vector recycling to select columns and add them.
res <- a[c(TRUE, FALSE)] + a[c(FALSE, TRUE)]
names(res) <- paste0('col', seq_along(res))
res
# col1 col2
#1 0 2
#2 1 0
#3 1 2
#4 1 1
dplyr's approach with row-wise operations (rowwise is a special type of grouping per row)
a <- data.frame(V1= c(0,1,0,0),V2=c(0,0,1,1),V3=c(1,0,1,0),V4=c(1,0,1,1))
library(dplyr)
a%>%
rowwise()%>%
transmute(I=sum(c(V1,V2)),
II=sum(c(V3,V4)))
or alternatively with a built-in row-wise variant of the sum
a %>% transmute(I = rowSums(across(1:2)),
II = rowSums(across(3:4)))
This question already has answers here:
Table by row with R
(4 answers)
Closed 6 years ago.
Imagine a group of three of machines (a,b,c) capture data in a series of tests. I need to count per test how many of each possible outcome has happened.
Using this test data and sample output, how might you solve it (assume that the test results may be numbers or alpha).
tests <- data.table(
a = c(1,2,2,3,0),
b = c(1,2,3,0,3),
c = c(2,2,3,0,2)
)
sumry <- data.table(
V0 = c(0,0,0,2,1),
V1 = c(2,0,0,0,0),
V2 = c(1,3,1,0,1),
V3 = c(0,0,2,1,1),
v4 = c(0,0,0,0,0)
)
tests
sumry
The output from sumry shows a column for each possible outcome/value (prefixed with V as in 'value' measured). Note: the sumry output indicates that there is the potential for a value of 4 but that is not observed in any of the test data here and therefore is always zero.
> tests
a b c
1: 1 1 2
2: 2 2 2
3: 2 3 3
4: 3 0 0
5: 0 3 2
> sumry
V0 V1 V2 V3 v4
1: 0 2 1 0 0
2: 0 0 3 0 0
3: 0 0 1 2 0
4: 2 0 0 1 0
5: 1 0 1 1 0
the V0 column from sumry indicates how many times the value zero is observed from any machine in test #1. For this set of test data zero is only observed in the 4th and 5th tests. The same holds true for V1-V4
I'm sure there's a simple name for this.
Here's one solution built around tabulate():
res <- suppressWarnings(do.call(rbind,apply(tests+1L,1L,tabulate)));
colnames(res) <- paste0('V',seq(0L,len=ncol(res)));
res;
## V0 V1 V2 V3
## [1,] 0 2 1 0
## [2,] 0 0 3 0
## [3,] 0 0 1 2
## [4,] 2 0 0 1
## [5,] 1 0 1 1
I am trying to split one column in a data frame in to multiple columns which hold the values from the original column as new column names. Then if there was an occurrence for that respective column in the original give it a 1 in the new column or 0 if no match. I realize this is not the best way to explain so, for example:
df <- data.frame(subject = c(1:4), Location = c('A', 'A/B', 'B/C/D', 'A/B/C/D'))
# subject Location
# 1 1 A
# 2 2 A/B
# 3 3 B/C/D
# 4 4 A/B/C/D
and would like to expand it to wide format, something such as, with 1's and 0's (or T and F):
# subject A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
I have looked into tidyr and the separate function and reshape2 and the cast function but seem to getting hung up on giving logical values. Any help on the issue would be greatly appreciated. Thank you.
You may try cSplit_e from package splitstackshape:
library(splitstackshape)
cSplit_e(data = df, split.col = "Location", sep = "/",
type = "character", drop = TRUE, fill = 0)
# subject Location_A Location_B Location_C Location_D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
You could take the following step-by-step approach.
## get the unique values after splitting
u <- unique(unlist(strsplit(as.character(df$Location), "/")))
## compare 'u' with 'Location'
m <- vapply(u, grepl, logical(length(u)), x = df$Location)
## coerce to integer representation
m[] <- as.integer(m)
## bind 'm' to 'subject'
cbind(df["subject"], m)
# subject A B C D
# 1 1 1 0 0 0
# 2 2 1 1 0 0
# 3 3 0 1 1 1
# 4 4 1 1 1 1
This is proving to be a monster for me with zero experience in R script. I have a data frame with 57 columns, 30 rows of data
Here is what I am trying to do:
1) Go to each column:
2) Count the number of times 2/3/4/5/6/7/8/9 consecutive values are less than -1
3) Print the result as a text file
4) Repeat step 2 and 3 for the second column and so on
I looked around and also on r stackoverflow
check number of times consecutive value appear based on a certain criteria
This is one column of my data:
data<-c(-0.996,-1.111,-0.638,0.047,0.694,1.901,2.863,2.611,2.56,2.016,0.929,-0.153,-0.617,-0.143
0.199,0.556,0.353,-0.638,0.347,0.045,-0.829,-0.882,-1.143,-0.869,0.619,0.923,-0.474,0.227
0.394,0.789,1.962,1.132,0.1,-0.278,-0.303,-0.606,-0.705,-0.858,-0.723,-0.081,1.206,2.329
1.863,2.1,1.547,2.026,0.015,-0.441,-0.371,-0.304,-0.668,-0.953,-1.256,-1.185,-0.891,-0.569
0.485,0.421,-0.004,0.024,-0.39,-0.58,-1.178,-1.101,-0.882,0.01,0.052,-0.166,-1.703,-1.048
-0.718,-0.036,-0.561,-0.08,0.272,-0.041,-0.811,-0.929,-0.853,-1.047,0.431,0.576,0.642,1.62
2.324,1.251,1.384,0.195,-0.081,-0.335,-0.176,1.089,-0.602,-1.134,-1.356,-1.203,-0.795,-0.752
-0.692,-0.813,-1.172,-0.387,-0.079,-0.374,-0.157,0.263,0.313,0.975,2.298,1.71,0.229,-0.313
-0.779,-1.12,-1.102,-1.01,-0.86,-1.118,-1.211,-1.081,-1.156,-0.972)
When I run the following code:
for (col in 1:ncol(data)) {
runs <- rle(data[,col])
print(runs$lengths[which(runs$values < -1)])
}
It gives me this:
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
It has counted the number of values <-1 but not runs. Is it something that I am during wrong here?
(massive edit)
Fixed data vector (was missing commas):
data <- c(-0.996,-1.111,-0.638,0.047,0.694,1.901,2.863,2.611,2.56,2.016,0.929,-0.153,-0.617,-0.143,
0.199,0.556,0.353,-0.638,0.347,0.045,-0.829,-0.882,-1.143,-0.869,0.619,0.923,-0.474,0.227,
0.394,0.789,1.962,1.132,0.1,-0.278,-0.303,-0.606,-0.705,-0.858,-0.723,-0.081,1.206,2.329,
1.863,2.1,1.547,2.026,0.015,-0.441,-0.371,-0.304,-0.668,-0.953,-1.256,-1.185,-0.891,-0.569,
0.485,0.421,-0.004,0.024,-0.39,-0.58,-1.178,-1.101,-0.882,0.01,0.052,-0.166,-1.703,-1.048,
-0.718,-0.036,-0.561,-0.08,0.272,-0.041,-0.811,-0.929,-0.853,-1.047,0.431,0.576,0.642,1.62,
2.324,1.251,1.384,0.195,-0.081,-0.335,-0.176,1.089,-0.602,-1.134,-1.356,-1.203,-0.795,-0.752,
-0.692,-0.813,-1.172,-0.387,-0.079,-0.374,-0.157,0.263,0.313,0.975,2.298,1.71,0.229,-0.313,
-0.779,-1.12,-1.102,-1.01,-0.86,-1.118,-1.211,-1.081,-1.156,-0.972)
Doing data < -1 gives you a logical vector, and we can count runs of TRUE & FALSE:
runs <- rle(data < -1)
print(runs)
## Run Length Encoding
## lengths: int [1:21] 1 1 20 1 29 2 8 2 4 2 ...
## values : logi [1:21] FALSE TRUE FALSE TRUE FALSE TRUE ...
Then extract the length of only the TRUE runs:
print(runs$lengths[which(runs$values)])
## [1] 1 1 2 2 2 1 3 1 3 4
and, iterate over columns of a data frame as previously shown:
# make a data frame from sampled versions of data
set.seed(1492) # repeatable
df <- data.frame(V1=data,
V2=sample(data, length(data), replace=TRUE),
V3=sample(data, length(data), replace=TRUE),
V4=sample(data, length(data), replace=TRUE))
# do the extraction
for (col in 1:ncol(df)) {
runs <- rle(df[, col] < -1)
print(runs$lengths[which(runs$values)])
}
## [1] 1 1 2 2 2 1 3 1 3 4
## [1] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1
## [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1