Reshape a data frame in R but not with aggregated functions [duplicate] - r

This question already has answers here:
Reshape three column data frame to matrix ("long" to "wide" format) [duplicate]
(6 answers)
Closed 7 years ago.
I'm trying to build a pivot table from this data frame below. "VisitID" is the unique ID for a user who came to visit a website, "PageName" is the page they visited, and "Order" is the sequence of the page they visited. For example, the first row of this data frame means "user 001 visited Homepage, which is the 1st page he/she visted".
VisitID PageName Order
001 Homepage 1
001 ContactUs 2
001 News 3
002 Homepage 1
002 Careers 2
002 News 3
The desired output should cast "VisitID" as rows and "Order" as columns, and fill the table with the "PageName":
1 2 3
001 Homepage ContactUs News
002 Homepage Careers News
I've thought about using reshape::cast to do the task, but I believe it only works when you give it an aggregated function. I might be wrong though. Thanks in advance for anyone who can offer help.

You don't need to aggregate. As long as there's only one row for each combination of columns in the casting formula, you'll get the value of value.var inserted in the output.
library(reshape2)
dcast(mydata, VisitID ~ Order, value.var="PageName")
Here's an example:
# Fake data
dat = data.frame(group1=rep(LETTERS[c(1,1:3)],each=2), group2=rep(letters[c(1,1:3)]),
values=1:8)
dat
group1 group2 values
1 A a 1
2 A a 2
3 A b 3
4 A c 4
5 B a 5
6 B a 6
7 C b 7
8 C c 8
Note that rows 1 and 2 have the same values of the group columns, as do rows 5 and 6. As a result, dcast aggregates by counting the number of values in each cell.
dcast(dat, group1 ~ group2, value.var="values")
Aggregation function missing: defaulting to length
group1 a b c
1 A 2 1 1
2 B 2 0 0
3 C 0 1 1
Now lets remove rows 1 and 5 to get rid of the duplicated group combinations. Since there's now only one value per cell, dcast returns the actual value, rather than a count of the number of values.
dcast(dat[-c(1,5),], group1 ~ group2, value.var="values")
group1 a b c
1 A 2 3 4
2 B 6 NA NA
3 C NA 7 8

Related

R Group dataframe according to certain conditions and each group has the same number of each condition

My dataframe has 324 different images with unique imageID. And there are 3*3 =9 conditions, each image belonging to one of the conditions. For example, Image 1 belongs to 1A condition and Image 5 belongs to 2B condition. What I try to achieve is to group images into 6 blocks randomly but in each block, there is the same number of each condition. Then, when group the dataframe by blokNo, they will be presented in a random order. And I want to generate multiple orders of presentation from the same dataframe.
My data frame looks like this:
ImageID Catagory1 Category2 BlokNo
1 1 A
4 1 A
6 1 A
5 2 B
8 2 B
3 2 B
14 3 C
12 3 C
17 3 C
I would like my data to look like this:
ImageID Catagory1 Category2 BlokNo
1 1 A 2
4 1 A 1
6 1 A 3
5 2 B 3
8 2 B 2
3 2 B 1
14 3 C 1
12 3 C 3
17 3 C 2
Below is the code I tried. It actually can realize part of my requirement, but since I actually have 3*3=9 conditions in total, I am wondering if there are other quick ways to do it. Thank you in advance!
Cond1 <- df %>% filter (Category1 == 1 & Category2 == A) #filter out one condition
Cond1$BlokNo <- sample(rep(1:6, each = ceiling(36/6))[1:36]) #randomly assign a number from 1:6 to each image in certain condition
Instead of filtering by each unique combinations, do a group_by on those 'Category2' columns and get the sample of row_number()
library(dplyr)
df <- df %>%
group_by(Category1, Category2) %>%
mutate(BlockNo = sample(row_number())) %>%
ungroup

How to assign IDs for consecutive rows in R split by a given kind of row? [duplicate]

This question already has answers here:
Creation of a specific vector without loop or recursion in R
(2 answers)
Split data.frame by value
(2 answers)
Closed 4 years ago.
I have a dataframe whose rows represent people. For a given family, the first row has the value 1 in the column A, and all following rows contain members of the same family until another row in in column A has the value 1. Then, a new family starts.
I would like to assign IDs to all families in my dataset. In other words, I would like to take:
A
1
2
3
1
3
3
1
4
And turn it into:
A family_id
1 1
2 1
3 1
1 2
3 2
3 2
1 3
4 3
I'm playing with a dataframe of 3 million rows, so a simple for-loop solution I came up with falls short of necessary efficiency. Also, the family_id need not be sequential.
I'll take a dplyr solution.
data:
df <- data.frame(A = c(1:3,1,3,3,1,4))
code:
df$familiy_id <- cumsum(c(-1,diff(df$A)) < 0)
result:
# A familiy_id
#1 1 1
#2 2 1
#3 3 1
#4 1 2
#5 3 2
#6 3 2
#7 1 3
#8 4 3
please note:
This solution starts a new group when a number occurs that is smaller than the previous one.
When its 100% sure that a new group always begins with a 1 consistently, then ronak's solution is perfect.

Sequence value in data frame column

I need some help writing R
I need to check whether a specif column in a data frame has ascending ordered correctly.
e.g
df$id | df$order | df$any
3 1 a
4 2 a
7 3 b
1 4 b
2 6 a
9 5 a # select this row - out of sequence in df$order
8 7 a
I would like to select the rows that don't follow the ascending sequence. In the example above, that would be the row with df$id equal to 9, because in df$order the value 5 is found after the value 6.
Obs. 1: in df$order, the numbers have range from 1 to N, where N is a number greater than 1.
Obs. 2: If possible I would like to use core libraries to solve the problem.
Any question, just ask on comments
Thanks in advance!
using Base R:
subset(df,c(0,diff(order))<0)
id order any
6 9 5 a
subset(df,c(0,diff(order))>=0)
id order any
1 3 1 a
2 4 2 a
3 7 3 b
4 1 4 b
5 2 6 a
7 8 7 a

Summing data for individuals over a series of rounds in R [duplicate]

This question already has answers here:
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 6 years ago.
I am currently working on my masters thesis and part of my data analysis is in R. I am completely new to it and so am learning as I go along.
The experiments we are running consist of individuals playing a token allocation game, over a series of rounds.
I need to change the current csv file in R so that each individual appears in one row, with ingroup, outgroup and self giving summed over the 40 rounds they played.
Currently, the data frame is as follows:
id roundno tokenstoingroup tokenstooutgroup tokenstoself
0001 1 1 0 0
0001 2 0 1 0
0002 1 0 0 1
etc...
There are many participants (over a thousand), and every round's allocation for each participant is entered.
My question is:
How do I sum this up so that the data frame looks more like this??
id totalrounds tokenstoingroup tokenstooutgroup tokenstoself
0001 40 25 13 2
002 40 13 13 14
etc...
As I have said, I am totally new to this. I have tried to look online for aggregating and summing things up, but I have idea where to start with something a bit more complex like this.
You can use the aggregate function with cbind. As an example, let's create a data frame:
test <- data.frame('id'=rep(c('A','B','C'),each=2),'C1'=rep(1,6),'C2'=1:6)
> test
id C1 C2
1 A 1 1
2 A 1 2
3 B 1 3
4 B 1 4
5 C 1 5
6 C 1 6
Then:
test <- aggregate(cbind(C1,C2)~id,data=test,sum)
> test
id C1 C2
1 A 2 3
2 B 2 7
3 C 2 11
We can use summarise_each from dplyr
library(dplyr)
df1 %>%
group_by(id) %>%
summarise_each(funs(sum), roundno, tokenstoingroup,tokenstooutgroup, tokenstoself)

Reshape data into long format, repeating range of ids for every variable [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I want to reshape my data into a long format, but I would like to repeat the entire range of id's for each variable in my data set, even for those id entries on which the variable takes no value. At the moment I can get narrow data, with ids for each variable on which there is a corresponding entry
Suppose my data has 15 variables, with 20 possible id's, I want to create a narrow form of this data that is 15*20 in length (the range of ids, repeated for each variable), whereby each repeated range of id's shows the values taken by variable, for id1, id2, id3 e.t.c until the end of the range of id's is reached, then variable2 is displayed for id1, id2, id3 e.t.c..
I am unsure of ohw to do this in R, I am currently using the reshape package.
You can use the replicate function which is explained here
v1 <- 1:5
v2 <- 1:6
rep(v1, each = 6)
# 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3 4 4 4 4 4 4 5 5 5 5 5 5
rep(v2, 5)
#1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6
Yeah, this is hard to work with, but you're looking for the melt function I think...
library(reshape2)
melt(yourdata, id.vars = 'ID COLUMN')
This will return a 300 x 3 data set that looks like:
ID COLUMN variable value
1 col2 7
1 col3 8
.... .... ....
20 col14 99
20 col15 100

Resources