i need your help, i have a data frame like this
int x y z
1 0 1 0
2 1 0 0
3 0 0 1
and the result that i need must be like this
int letter
1 y
2 x
3 z
my code is:
for (i in 1:nrow(samples))
for(j in 1:ncol(samples))
if(samples[i,][,j] == 1) print(c(i,names(samples[i,j])))
but it is not showing the second column and i need save in a new data.frame, any suggestion? thanks.
You can use max.col:
dat$newcol <- names(DF)[-1][max.col(DF[-1])]
This gives
int x y z newcol
1 1 0 1 0 y
2 2 1 0 0 x
3 3 0 0 1 z
I'm sure there are plenty of ways, but here's one:
samples <- read.table(text="int x y z
1 0 1 0
2 1 0 0
3 0 0 1",
header=TRUE)
# int x y z
#1 1 0 1 0
#2 2 1 0 0
#3 3 0 0 1
data.frame(
samples[1],
letter=colnames(samples[-1][apply(samples[-1],1,which.max)])
)
# int letter
#1 1 y
#2 2 x
#3 3 z
A solution to this similar question.
tdf <- data.frame(
A = c(1,1,0,0),
B = c(0,0,1,0),
C = c(0,0,0,1)
)
library(magrittr)
tdf %>%
lapply(sum) %>%
(function(x){
a <- c()
for(i in 1:length(x)){
a <- c(a, rep(names(x[i]), x[i]))
}
return(a)
})
Related
I have the following data frame in R:
Row number A B C D E F G H I J
1 1 1 0 0 1 0 0 1 1
2 1 0 0 0 1 0 0 1
3 1 0 0 0 1 0 0 1 1
I am trying to calculate the number of times the number changes between 1 and 0 excluding the Nulls
The result I am expecting is this
Row Number No of changes
---------- --------------
1 4
2 4
3 4
An explanation for row 1
In row 1, A has a null so we exclude that.
B and C have 1 which is our first set of values.
D and E have 0 which is our second set of values. Now Change = 1
F has our third set of values which is 1. Now Change = 1+1
G and H have 0 which is our third set of values. Now Change = 1+1+1
I and J have 1 which is our fourth set of values. Now Change = 1+1+1+1 =4
Here's a tidyverse approach.
I gather into longer format (from tidyr::pivot_longer), then add a helper column noting when we have a change from 0 to 1 or from 1 to 0, and then sum those by row.
library(tidyverse)
df %>%
# before tidyr 1.0, this would be gather(col, value, -1)
pivot_longer(-1, "col") %>%
group_by(Row.number) %>%
mutate(chg = value == 1 & lag(value) == 0 |
value == 0 & lag(value) == 1) %>%
summarize(no_chgs = sum(chg, na.rm = T))
# A tibble: 3 x 2
Row.number no_chgs
<int> <int>
1 1 4
2 2 4
3 3 4
Sample data:
df <- read.table(
header = T,
stringsAsFactors = F,
text = "'Row number' A B C D E F G H I J
1 NA 1 1 0 0 1 0 0 1 1
2 NA NA 1 0 0 0 1 0 0 1
3 NA 1 0 0 0 1 0 0 1 1")
Here's a data.table solution:
library(data.table)
dt <- as.data.table(df)
dt[,
no_change := max(rleid(na.omit(t(.SD)))) - 1,
by = RowNumber
]
dt
Alternatively, here's a base version:
apply(df[, -1],
1,
function(x) {
complete_case = complete.cases(x)
if (sum(complete_case) > 0) {
return(length(rle(x[complete_case])$lengths) - 1)
} else {
return (0)
}
}
)
I need to repeat a code 24 times (for 24 different participants), making sure that overall, for each Scene2 in each Trial and Route, I have the same number of 1 and 0 across the columns Random of each participant (i.e., Part.1, Part.2, Part.3, etc.) when the Target is equal to 0.
Here is the code I am using:
Scene2 = rep(c(1:10), times=9)
myDF2 <- data.frame(Scene2)
myDF2$Target <- rep(0,10, each=9)
myDF2$Target[myDF2$Scene2==7] <- 1
myDF2$Trial <- rep(c(1:9),each=10)
myDF2$Route <- rep(LETTERS[1:6], each=10, length=nrow(myDF2))
library(plyr)
myDF3 <- myDF2 %>% group_by(Trial, Route) %>% mutate(Random = ifelse(myDF2$Target==0,sample(c(rep(0,5),rep(1,5))),1)) %>% as.data.frame()
I need to obtain something like this:
Scene2 Target Trial Route Part.1 Part.2 Part.3 Part.4 … Part.24 Tot.1 Tot.0
1 0 1 A 0 1 1 0 0 12 12
2 0 1 A 1 0 1 0 0 12 12
3 0 1 A 1 0 0 0 0 12 12
4 0 1 A 0 1 0 1 0 12 12
5 0 1 A 1 0 1 1 0 12 12
6 0 1 A 1 0 0 0 1 12 12
7 1 1 A 1 1 1 1 1 24 0
8 0 1 A 0 0 1 1 1 12 12
9 0 1 A 0 1 1 1 1 12 12
10 0 1 A 0 1 0 0 1 12 12
How to achieve this? Any suggestion would be very much appreciated.
Since there's some conditional logic here that needs to meet particular specifications, I think this is easier to do with a function.
Scene2 = rep(c(1:10), times=9)
myDF2 <- data.frame(Scene2)
myDF2$Target <- rep(0,10, each=9)
myDF2$Target[myDF2$Scene2==7] <- 1
myDF2$Trial <- rep(c(1:9),each=10)
myDF2$Route <- rep(LETTERS[1:6], each=10, length=nrow(myDF2))
library(tidyverse)
fill_random_columns <- function(df, reps) {
# Start a loop with a counter
for (i in 1:reps) {
# Create a vector of 1s and 0s for filling rows
bag <- c(rep(0, 12), rep(1, 12))
# Build up conditional data frame of 1s and 0s
row_vector <- as.data.frame(t(sapply(df$Target, function(v) {
if (v == 1) return(rep(1, reps))
else (return(sample(bag, reps)))
})))
}
# Create column names
colnames <- lapply(1:reps, function(i) {paste0("Part.", i)})
# Name columns and sum up rows
row_vector <- row_vector %>%
`colnames<-`(colnames) %>%
mutate(Total = rowSums(.))
# Attach to original data frame
df <- bind_cols(df, row_vector)
return(df)
}
myDF3 <- myDF2 %>%
group_by(Trial, Route) %>%
fill_random_columns(., 24)
I have a sequence which looks like this
SEQENCE
1 A
2 B
3 B
4 C
5 A
Now from this sequence, I want to get the matrix like this where i the row and jth column element denotes how many times movement occurred from ith row node to jth column node
A B C
A 0 1 0
B 0 1 1
C 1 0 0
How Can I get this in R
1) Use table like this:
s <- DF[, 1]
table(tail(s, -1), head(s, -1))
giving:
A B C
A 0 0 1
B 1 1 0
C 0 1 0
2) or like this. Since embed does not work with factors we convert the factor to character,
s <- as.character(DF[, 1])
do.call(table, data.frame(embed(s, 2)))
giving:
X2
X1 A B C
A 0 0 1
B 1 1 0
C 0 1 0
3) xtabs also works:
s <- as.character(DF[, 1])
xtabs(data = data.frame(embed(s, 2)))
giving:
X2
X1 A B C
A 0 0 1
B 1 1 0
C 0 1 0
Note: The input DF in reproducible form is:
Lines <- " SEQENCE
1 A
2 B
3 B
4 C
5 A"
DF <- read.table(text = Lines, header = TRUE)
I am new to R.
I would like to transform a binary matrix like this:
example:
" 1874 1875 1876 1877 1878 .... 2009
F 1 0 0 0 0 ... 0
E 1 1 0 0 0 ... 0
D 1 1 0 0 0 ... 0
C 1 1 0 0 0 ... 0
B 1 1 0 0 0 ... 0
A 1 1 0 0 0 ... 0"
Since, columns names are years I would like to aggregate them in decades and obtain something like:
"1840-1849 1850-1859 1860-1869 .... 2000-2009
F 1 0 0 0 0 ... 0
E 1 1 0 0 0 ... 0
D 1 1 0 0 0 ... 0
C 1 1 0 0 0 ... 0
B 1 1 0 0 0 ... 0
A 1 1 0 0 0 ... 0"
I am used to python and do not know how to do this transformation without making loops!
Thanks, isabel
It is unclear what aggregation you want, but using the following dummy data
set.seed(42)
df <- data.frame(matrix(sample(0:1, 6*25, replace = TRUE), ncol = 25))
names(df) <- 1874 + 0:24
The following counts events in each 10-year period.
Get the years as a numeric variable
years <- as.numeric(names(df))
Next we need an indicator for the start of each decade
ind <- seq(from = signif(years[1], 3), to = signif(tail(years, 1), 3), by = 10)
We then apply over the indices of ind (1:(length(ind)-1)), select columns from df that are the current decade and count the 1s using rowSums.
tmp <- lapply(seq_along(ind[-1]),
function(i, inds, data) {
rowSums(data[, names(data) %in% inds[i]:(inds[i+1]-1)])
}, inds = ind, data = df)
Next we cbind the resulting vectors into a data frame and fix-up the column names:
out <- do.call(cbind.data.frame, tmp)
names(out) <- paste(head(ind, -1), tail(ind, -1) - 1, sep = "-")
out
This gives:
> out
1870-1879 1880-1889 1890-1899
1 4 5 6
2 4 6 6
3 2 5 5
4 5 5 7
5 3 3 7
6 5 5 4
If you want simply a binary matrix with a 1 indicating at least 1 event happened in that decade, then you can use:
tmp2 <- lapply(seq_along(ind[-1]),
function(i, inds, data) {
as.numeric(rowSums(data[, names(data) %in% inds[i]:(inds[i+1]-1)]) > 0)
}, inds = ind, data = df)
out2 <- do.call(cbind.data.frame, tmp2)
names(out2) <- paste(head(ind, -1), tail(ind, -1) - 1, sep = "-")
out2
which gives:
> out2
1870-1879 1880-1889 1890-1899
1 1 1 1
2 1 1 1
3 1 1 1
4 1 1 1
5 1 1 1
6 1 1 1
If you want a different aggregation, then modify the function applied in the lapply call to use something other than rowSums.
This is another option, using modular arithmetic to aggregate the columns.
# setup, borrowed from #GavinSimpson
set.seed(42)
df <- data.frame(matrix(sample(0:1, 6*25, replace = TRUE), ncol = 25))
names(df) <- 1874 + 0:24
result <- do.call(cbind,
by(t(df), as.numeric(names(df)) %/% 10 * 10, colSums))
# add -xxx9 to column names, for each decade
dimnames(result)[[2]] <- paste(colnames(result), as.numeric(colnames(result)) + 9, sep='-')
# 1870-1879 1880-1889 1890-1899
# V1 4 5 6
# V2 4 6 6
# V3 2 5 5
# V4 5 5 7
# V5 3 3 7
# V6 5 5 4
If you wanted to aggregate with something other than sum, replace the call to
colSums with something like function(cols) lapply(cols, f), where f is the aggregating
function, e.g., max.
I have a dataframe that looks like this:
step var1 score1 score2
1 a 0 0
2 b 1 1
3 d 1 1
4 e 0 0
5 g 0 0
1 b 1 1
2 a 1 0
3 d 1 0
4 e 0 1
5 f 1 1
1 g 0 1
2 d 1 1
etc.
Because I need to correlate variabeles a-g (their scores are in score1) with score2 in only step 5 I think i need to schange my dataset into this first:
a b c d e f g score2_step5
0 1 1 0 0 0
1 1 1 0 1 1
1 0
etc.
I am pretty sure that the Reshape package should be able to help me to do the job, but I haven't been able to make it work yet.
Can anyone help me? Many thanks in advance!
Here's another version. In case there is no step = 5, the value for score2_step = 0. Assuming your data.frame is df:
require(reshape2)
out <- do.call(rbind, lapply(seq(1, nrow(df), by=5), function(ix) {
iy <- min(ix+4, nrow(df))
df.b <- df[ix:iy, ]
tt <- dcast(df.b, 1 ~ var1, fill = 0, value.var = "score1", drop=F)
tt$score2_step5 <- 0
if (any(df.b$step == 5)) {
tt$score2_step5 <- df.b$score2[df.b$step == 5]
}
tt[,-1]
}))
> out
a b d e f g score2_step5
2 0 1 1 0 0 0 0
21 1 1 1 0 1 0 1
22 0 0 1 0 0 0 0
It looks like you want 7 correlations between the variables a-g and score2_step5--is that correct? First, you're going to need another variable. I'm assuming that step repeats continuously from 1 to 5; if not, this is going to be more complicated. I'm assuming your data is called df. I also prefer the newer reshape2 package, so I'm using that.
df$block <- rep(1:(nrow(df)/5),each=5)
df.molten <- melt(df,id.vars=c("var1", "step", "block"),measure.vars=c("score1"))
df2 <- dcast(df.molten, block ~ var1)
score2_step5 <- df$score2[df$step==5]
and then finally
cor(df2, score2_step5, use='pairwise')
There's an extra column (block) in df2 that you can get rid of or just ignore.
I added another row to your example data because my code doesn't work unless there is a step-5 observation in every block.
dat <- read.table(textConnection("
step var1 score1 score2
1 a 0 0
2 b 1 1
3 d 1 1
4 e 0 0
5 g 0 0
1 b 1 1
2 a 1 0
3 d 1 0
4 e 0 1
5 f 1 1
1 g 0 1
2 d 1 1
5 a 1 0"),header=TRUE)
Like #JonathanChristensen, I made another variable (I called it rep instead of block), and I made var1 into a factor (since there are no c values in the example data set given and I wanted a placeholder).
dat <- transform(dat,var1=factor(var1,levels=letters[1:7]),
rep=cumsum(step==1))
tapply makes the table of score1 values:
tab <- with(dat,tapply(score1,list(rep,var1),identity))
add the score2, step-5 values:
data.frame(tab,subset(dat,step==5,select=score2))