need help in N number or column wise subtraction, Below are the columns in a input dataframe.
input dataframe:
1 4 6 2
3 3 3 4
1 2 2 2
4 4 4 4
5 2 3 2
Expected Output:
1 3 2 -4
3 0 0 1
1 1 0 0
4 0 0 0
5 -3 1 -1
similarly there will be many column upto 10.
i am able to write the code for 2 columns:
df$(B-A) <- df$B - df$A
df$(C-B) <- df$C - df$B
and so on... but in this should come in loop as there are almost 10 to 12 columns. Please help me.

Here is a Vectorized way to do this,
cbind.data.frame(df[1], df[-1] - df[-ncol(df)])
which gives,
1 1 3 2 -4
2 3 0 0 1
3 1 1 0 0
4 4 0 0 0
5 5 -3 1 -1

Here is the instructive/pedagogic straightforward solution:
df <- data.frame(A=c(1,3,1,4,5), B=c(4,3,2,4,2), C=c(6,3,2,4,3), D=c(2,4,2,4,2))
Get the pattern:
cbind(df[1], df[2] - df[1], df[3] - df[2], df[4] - df[3]) # solved
Now, use dynamic programming in R to finish (in the general case):
cbind(df[1], sapply(1:(ncol(df)-1), function(i) df[i+1] - df[i]))
1 1 3 2 -4
2 3 0 0 1
3 1 1 0 0
4 4 0 0 0
5 5 -3 1 -1

Using apply() you can also try this
cbind(df[1], t(apply(df, 1, diff)))
1 1 3 2 -4
2 3 0 0 1
3 1 1 0 0
4 4 0 0 0
5 5 -3 1 -1


File entire vector by a certain appearance of another vector

I have the following data:
players trial choice
1 1 1 1
2 1 2 0
3 1 3 0
4 2 1 0
5 2 2 0
6 2 3 0
7 3 1 0
8 3 2 1
9 3 3 0
Now I want to create a new vector:
for each participant who have at least one choice of "1", to get the value "3" and "0" otherwise:
players trial choice win
1 1 1 1 3
2 1 2 0 3
3 1 3 0 3
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 3
8 3 2 1 3
9 3 3 0 3
In the simple example above, player "1", had "1" in the first trial, while player 3 in the second trial, thus for all their choices the value is "3" in the new vector.
Any ideas how to do it? thanks!
A base R option using ave + ifelse
win <- ave(choice,players,FUN = function(x) ifelse(any(x==1),3,0))
players trial choice win
1 1 1 1 3
2 1 2 0 3
3 1 3 0 3
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 3
8 3 2 1 3
9 3 3 0 3
If you criteria is depending on the first two values of choice, you can try
win <- ave(choice,players,FUN = function(x) ifelse(all(head(x,2)==1),3,0))
which gives
players trial choice win
1 1 1 1 0
2 1 2 0 0
3 1 3 0 0
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 0
8 3 2 1 0
9 3 3 0 0
Try this dplyr approach:
gamematrix <- gamematrix %>% group_by(players) %>%
# A tibble: 9 x 4
# Groups: players [3]
players trial choice win
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 3
2 1 2 0 3
3 1 3 0 3
4 2 1 0 0
5 2 2 0 0
6 2 3 0 0
7 3 1 0 3
8 3 2 1 3
9 3 3 0 3
There is no reason for this data to be a data.frame. Keep it as a numeric matrix. If you do so you can do in one line using only vectorized functions.
cbind(gamematrix, win = (rowSums(gamematrix == 1) > 0) * 3)
for your second case:
I would like it to be only for those players who had "choice=1" in the first N (e.g., first 2 trials)
cbind(gamematrix, win = (rowSums(gamematrix[,c(1,2)] == 1) > 0) * 3)
Vectorized solutions are usually more performant than solutions incorporating a buried loop (e.g. ave).
An option with rowsum from base R
gamematrix$win <- with(gamematrix, 3 * players %in%
names(which(rowsum(choice, players)[,1] > 0)))
#[1] 3 3 3 0 0 0 3 3 3

How can I create a new variable which identifies rows where another variable changes sign?

I have a question regarding data preparation. I have the following data set (in long format; one row per measurement point, therefore several rows per person):
dd <- read.table(text=
"ID time
1 -4
1 -3
1 -2
1 -1
1 0
1 1
2 -3
2 -1
2 2
2 3
2 4
3 -3
3 -2
3 -1
4 -1
4 1
4 2
4 3
5 0
5 1
5 2
5 3
5 4", header=TRUE)
Now I would like to create a new variable that has a 1 in the row, in which a sign change on the time variable happens for the first time for this person, and a 0 in all other rows. If a person has only negative values on time, the should not be any 1 on the new variable. For a person that has only positive values on time, the first row should have a 1 on the new variable and all other rows should be coded with 0. For my example above the new data frame should look like this:
dd <- read.table(text=
"ID time new.var
1 -4 0
1 -3 0
1 -2 0
1 -1 0
1 0 1
1 1 0
2 -3 0
2 -1 0
2 2 1
2 3 0
2 4 0
3 -3 0
3 -2 0
3 -1 0
4 -1 0
4 1 1
4 2 0
4 3 0
5 0 1
5 1 0
5 2 0
5 3 0
5 4 0", header=TRUE)
Does anyone know how to do this? I thought about using dplyr and group_by, however I am pretty new to R and did not make it. Any help is much appreciated!
There are 2 different operations you want done to create new.var, so you need to do them in 2 steps. I'll break this into 2 separate mutate calls for simplicity, but you can put both of them into the same mutate
First, we group by ID and then find the rows where the sign changes. We need to use time >= 0 instead of sign as recommended in this answer: R identifying a row prior to a change in sign because you want a sign change to be counted only when going from -1 <-> 0, not from 0 <-> 1:
dd2 <- dd %>%
group_by(ID) %>%
mutate(new.var = as.numeric((time >= 0) != (lag(time) >= 0)))
# A tibble: 23 x 3
# Groups: ID [5]
ID time new.var
<int> <int> <dbl>
1 1 -4 NA
2 1 -3 0
3 1 -2 0
4 1 -1 0
5 1 0 1
6 1 1 0
7 2 -3 NA
8 2 -1 0
9 2 2 1
10 2 3 0
# … with 13 more rows
Then we use case_when to modify the first row based on your desired rules. Due to the way lag works, the first row will always have NA (since there is no previous row to look at) which makes it a good way to pick out that first row to change it based on the time values in that group:
dd3 <- dd2 %>%
mutate(new.var = case_when(
!is.na(new.var) ~ new.var,
all(time >= 0) ~ 1,
TRUE ~ 0)
print(dd3, n = 100) #n=100 because tibbles are truncated to 10 rows by print
# A tibble: 23 x 3
# Groups: ID [5]
ID time new.var
<int> <int> <dbl>
1 1 -4 0
2 1 -3 0
3 1 -2 0
4 1 -1 0
5 1 0 1
6 1 1 0
7 2 -3 0
8 2 -1 0
9 2 2 1
10 2 3 0
11 2 4 0
12 3 -3 0
13 3 -2 0
14 3 -1 0
15 4 -1 0
16 4 1 1
17 4 2 0
18 4 3 0
19 5 0 1
20 5 1 0
21 5 2 0
22 5 3 0
23 5 4 0
You can try this:
dd %>% left_join(dd %>% group_by(ID) %>% summarise(index=min(which(time>=0)))) %>%
group_by(ID) %>% mutate(new.var=ifelse(row_number(ID)==index,1,0)) %>% select(-index)-> DF
# A tibble: 23 x 3
# Groups: ID [5]
ID time new.var
<int> <int> <dbl>
1 1 -4 0
2 1 -3 0
3 1 -2 0
4 1 -1 0
5 1 0 1
6 1 1 0
7 2 -3 0
8 2 -1 0
9 2 2 1
10 2 3 0
The following ave instruction does what the question asks for.
dd$new.var <- with(dd, ave(time, ID, FUN = function(x){
y <- integer(length(x))
if(any(x >= 0)) y[which.max(x[1]*x <= 0)] <- 1L
# ID time new.var
#1 1 -4 0
#2 1 -3 0
#3 1 -2 0
#4 1 -1 0
#5 1 0 1
#6 1 1 0
#7 2 -3 0
#8 2 -1 0
#9 2 2 1
#10 2 3 0
#11 2 4 0
#12 3 -3 0
#13 3 -2 0
#14 3 -1 0
#15 4 -1 0
#16 4 1 1
#17 4 2 0
#18 4 3 0
#19 5 0 1
#20 5 1 0
#21 5 2 0
#22 5 3 0
#23 5 4 0
If the expected output is renamed dd2 then
identical(dd, dd2)
#[1] TRUE

Count values in column by group R

I want to transform the following dataframe into a dataframe that adds a column of index numbers and counts the values in the rows. Like this:
A B C D E value A B C D E
1 2 3 4 4 0 2 2 0 1 1
1 4 4 2 1 => 1 3 0 0 0 2
1 2 2 2 0 2 0 2 2 2 1
0 0 2 0 1 3 0 0 1 1 0
0 0 4 3 2 4 0 1 2 1 1
I am pretty much a beginner in R and can't figure out how to do this.
Thanks in advance :)
You can do:
df <- read.table(header=TRUE, text=
"A B C D E
1 2 3 4 4
1 4 4 2 1
1 2 2 2 0
0 0 2 0 1
0 0 4 3 2")
sapply(df+1, tabulate, nbins=5)
# > sapply(df+1, tabulate, nbins=5)
# A B C D E
# [1,] 2 2 0 1 1
# [2,] 3 0 0 0 2
# [3,] 0 2 2 2 1
# [4,] 0 0 1 1 0
# [5,] 0 1 2 1 1
Eventually you want correct the rownames:
result <- sapply(df+1, tabulate, nbins=5)
rownames(result) <- (1:nrow(result))-1

Converting data to longitudinal data

Hi i am having difficulties trying to convert my data into longitudinal data using the Reshape package. Would be grateful if anyone could help me, thank you!
Data is as follows:
m <- matrix(sample(c(0, 0:), 100, replace = TRUE), 10)
m<- cbind(ID,m)
d <- as.data.frame(m)
names(d)<-c('ID', 'litter1', 'litter2', 'litter3', 'litter4', 'litter5', 'litter6', 'litter7', 'litter8', 'litter9', 'litter10')
ID litter1 litter2 litter3 litter4 litter5 litter6 litter7 litter8 litter9 litter10
1 0 0 0 3 1 0 2 0 0 3
2 0 2 1 2 0 0 0 2 0 0
3 1 0 1 2 0 3 3 3 2 0
4 2 1 2 3 0 2 3 3 1 0
5 0 1 2 0 0 0 3 3 1 0
6 2 1 2 0 3 3 0 0 0 0
7 0 1 0 3 0 0 1 2 2 0
8 0 1 3 3 2 1 3 2 3 0
9 0 2 0 2 2 3 2 0 0 3
10 2 2 2 2 1 3 0 3 0 0
I wish to convert the above data into a longitudinal data with columns 'ID', 'litter category' which tells us the category of the litter, i.e. 1-10 and 'litter number' which tells us the number of pieces for each litter category:
ID littercategory litternumber
1 4 3
1 5 1
1 7 2
1 10 3
2 2 2
2 3 1
2 4 2
2 8 2
and so on.
Would really appreciate your help thank you!
You could do that as follows:
d = melt(d, id.vars=c("ID"))
colnames(d) = c('ID','littercategory','litternumber')
# remove the text in the littercategory column, keep only the number.
d$littercategory = gsub('litter','',d$littercategory)
d = d[d$litternumber!=0]
ID littercategory litternumber
1 1 4
2 1 8
3 1 6
4 1 4
7 1 6
8 1 5
10 1 10
1 2 6
2 2 9
As you can see, only the ordering is different as the output you requested, but I'm sure you can fix that yourself. (If not, there are plenty of resources on how to do that).
Hope this helps!
To get desired output you have to melt your data and filter out values larger than 0.
result <- setDT(melt(d, "ID"))[value != 0][order(ID)]
# To get exact structure modify result
result[, .(ID,
littercategory = sub("litter", "", variable),
litternumber = value)]

building matrix out of a vector with the difference of each value

dataset2 <- data.frame(bird=c("A","B","C","D","E","F"), rank=c(1:6))
I have this example dataset and now i want to build a 6*6 matrix with the rank difference between each bird. How can i do this?
Is this what you want?
m <- with(dataset2, outer(rank, rank, '-'))
rownames(m) <- colnames(m) <- dataset2$bird
# A B C D E F
# A 0 -1 -2 -3 -4 -5
# B 1 0 -1 -2 -3 -4
# C 2 1 0 -1 -2 -3
# D 3 2 1 0 -1 -2
# E 4 3 2 1 0 -1
# F 5 4 3 2 1 0
You might also want to do this afterwards:
m[upper.tri(m)] <- 0
To get:
# A B C D E
#B 1 0 0 0 0
#C 2 1 0 0 0
#D 3 2 1 0 0
#E 4 3 2 1 0
#F 5 4 3 2 1
This is kind of the definition of the distance matrix, no?
dist(dataset2, method="maximum")
2 1
3 2 1
4 3 2 1
5 4 3 2 1
With the distinction that it returns positive distance only... maybe it doesn't suits the OP..
