Assume that I have a df similar to this:
# A tibble: 5 x 6
x1 x2 x3 y1 y2 y3
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 4 3 2 4 3 2
2 4 3 2 4 3 2
3 4 3 2 4 3 2
4 4 3 2 4 3 2
5 4 3 2 4 3 2
Is there a way to create a tibble with the columns x, y, z in a single pivot_longer comand?, right now I'm using a pivot_longer line for each group of columns but I'm sure there is an easier way to get it.
x y z
<dbl> <dbl> <dbl>
1 4 4 1
2 4 4 1
3 4 4 1
4 4 4 1
5 4 4 1
6 3 3 2
7 3 3 2
8 3 3 2
9 3 3 2
10 3 3 2
11 2 2 3
12 2 2 3
13 2 2 3
14 2 2 3
15 2 2 3
If you are using pivot_longer:
df %>% pivot_longer(everything(), names_to = c(".value", "z"), names_pattern = "(.)(.)")
You could used reshape from base R:
reshape(inp, direction="long", varying=list(1:3, 4:6), sep="")
time x1 y1 id
1.1 1 4 4 1
2.1 1 4 4 2
3.1 1 4 4 3
4.1 1 4 4 4
5.1 1 4 4 5
1.2 2 3 3 1
2.2 2 3 3 2
3.2 2 3 3 3
4.2 2 3 3 4
5.2 2 3 3 5
1.3 3 2 2 1
2.3 3 2 2 2
3.3 3 2 2 3
4.3 3 2 2 4
5.3 3 2 2 5
Two "tricks". One is to use sep="" which will split those column names between alpha and numeric. Second is the use of a list argument to varying. If you wanted to drop the first column which identifies the row of origin, then use [-1]. You could also use the v.names vector to name columns, viz:
reshape(inp, direction="long", varying=list(1:3, 4:6), sep="", v.names=c("X","Y"))[-1]
X Y id
1.1 4 4 1
2.1 4 4 2
3.1 4 4 3
4.1 4 4 4
5.1 4 4 5
1.2 3 3 1
2.2 3 3 2
3.2 3 3 3
4.2 3 3 4
5.2 3 3 5
1.3 2 2 1
2.3 2 2 2
3.3 2 2 3
4.3 2 2 4
5.3 2 2 5
Related
I want to Combine V1 and V2 with the matching ID number using R. What's the simplest way to go about it?
Below is an example how I want to combine my data. Hopefully this makes sense if not I can try to be more clear. I did try the group by but I dont know if thats the best way to go about it
ID V1 V2
1 3 2
2 3 4
3 5 1
3 2 3
4 2 3
4 5 7
4 1 3
This is what I would like it to look like
ID V3
1 3
1 2
2 3
2 4
3 5
3 1
3 2
3 3
4 2
4 3
4 5
4 7
4 1
4 3
Try using pivot_longer with names_to = NULL to remove the unwanted column.
tidyr::pivot_longer(df, V1:V2, values_to = "V3", names_to = NULL)
Output:
# ID V3
# <int> <int>
# 1 1 3
# 2 1 2
# 3 2 3
# 4 2 4
# 5 3 5
# 6 3 1
# 7 3 2
# 8 3 3
# 9 4 2
# 10 4 3
# 11 4 5
# 12 4 7
# 13 4 1
# 14 4 3
You may try
library(dplyr)
reshape2::melt(df, "ID") %>% select(ID, value) %>% arrange(ID)
ID value
1 1 3
2 1 2
3 2 3
4 2 4
5 3 5
6 3 2
7 3 1
8 3 3
9 4 2
10 4 5
11 4 1
12 4 3
13 4 7
14 4 3
i need some help:
i got this df:
df <- data.frame(month = c(1,1,1,1,1,2,2,2,2,2),
day = c(1,2,3,4,5,1,2,3,4,5),
flow = c(2,5,7,8,5,4,6,7,9,2))
month day flow
1 1 1 2
2 1 2 5
3 1 3 7
4 1 4 8
5 1 5 5
6 2 1 4
7 2 2 6
8 2 3 7
9 2 4 9
10 2 5 2
but i want to know the day of min per month:
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
this repetition is not a problem, i will use pivot fuction
tks people!
We can use which.min to return the index of 'min'imum 'flow' per group and use that to get the corresponding 'day' to create the column with mutate
library(dplyr)
df <- df %>%
group_by(month) %>%
mutate(dayminflowofthemonth = day[which.min(flow)]) %>%
ungroup
-output
df
# A tibble: 10 x 4
# month day flow dayminflowofthemonth
# <dbl> <dbl> <dbl> <dbl>
# 1 1 1 2 1
# 2 1 2 5 1
# 3 1 3 7 1
# 4 1 4 8 1
# 5 1 5 5 1
# 6 2 1 4 5
# 7 2 2 6 5
# 8 2 3 7 5
# 9 2 4 9 5
#10 2 5 2 5
Another option using indexing inside dplyr pipeline:
library(dplyr)
#Code
newdf <- df %>% group_by(month) %>% mutate(Val=day[flow==min(flow)][1])
Output:
# A tibble: 10 x 4
# Groups: month [2]
month day flow Val
<dbl> <dbl> <dbl> <dbl>
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
Here is a base R option using ave
transform(
df,
dayminflowofthemonth = ave(day*(ave(flow,month,FUN = min)==flow),month,FUN = max)
)
which gives
month day flow dayminflowofthemonth
1 1 1 2 1
2 1 2 5 1
3 1 3 7 1
4 1 4 8 1
5 1 5 5 1
6 2 1 4 5
7 2 2 6 5
8 2 3 7 5
9 2 4 9 5
10 2 5 2 5
One more base R approach:
df$dayminflowofthemonth <- by(
df,
df$month,
function(x) x$day[which.min(x$flow)]
)[df$month]
I was wondering if people coudld please help me. This is probably a simple issue, but I can't get passed it. If you look at my code, I have several variables of interest, Big0.7:Small0.9 and then AttractOne:AttractSix.
ptn Condition Big0.7 X_AttractONE Med0.7 AttractTWO Small0.7 AttractTHREE Big0.9 AttractFOUR Med0.9 AttractFIVE Small0.9 AttractSIX SECScore G.Health
1 1 1 2 5 2 5 1 5 2 5 1 5 2 5 72.8 6
2 2 2 3 4 2 5 3 4 2 5 1 2 1 4 79.8 6
3 3 2 2 4 3 4 3 4 2 4 1 3 2 4 48.8 7
4 4 2 5 1 1 4 1 4 1 4 1 3 1 4 55.4 5
5 5 2 3 4 3 4 3 4 2 4 1 4 2 4 61.3 6
6 6 1 2 4 2 4 2 4 2 4 2 4 2 4 45.4 6
What I'm trying to do is change this data from wide to long format, so that Big0.7:Big0.9 all occupy one columna and AttractOne:AttractSix all occupy another.
I've managed to do this for the Big0.7:Big0.9, but I can't fathom how to do it for both of them at the same time - perhaps as I don't quite understand the concept of names_sep or names_patern.
maybe %>% pivot_longer(cols = c(Big0.7, Med0.7, Small0.7, Big0.9, Med0.9, Small0.9), names_to= "PreferenceRate", values_to = "count")
If someone could please help, I would be grateful. This is my first post, so apologies for any inadequacies, and, I also don't know how best to post my data so I hope this is also okay.
Liam
Make Big0.x,Med0.x,Small0.x similar to Attractx, then use the special variable .value with names_pattern
library(tidyr)
df %>%
rename('BMSONE'=Big0.7, 'AttractONE'=X_AttractONE, 'BMSTWO'=Med0.7,'BMSTHREE'=Small0.7,'BMSFOUR'=Big0.9,'BMSFIVE'=Med0.9,'BMSSIX'=Small0.9) %>%
pivot_longer(BMSONE:AttractSIX, names_to = c('.value','set'), names_pattern = "(BMS|Attract)(.*)")
# A tibble: 36 x 7
ptn Condition SECScore G.Health set BMS Attract
<int> <int> <dbl> <int> <chr> <int> <int>
1 1 1 72.8 6 ONE 2 5
2 1 1 72.8 6 TWO 2 5
3 1 1 72.8 6 THREE 1 5
4 1 1 72.8 6 FOUR 2 5
5 1 1 72.8 6 FIVE 1 5
6 1 1 72.8 6 SIX 2 5
7 2 2 79.8 6 ONE 3 4
8 2 2 79.8 6 TWO 2 5
9 2 2 79.8 6 THREE 3 4
10 2 2 79.8 6 FOUR 2 5
# … with 26 more rows
I have a large data with raw responses and wanted to compare each element for subject 1 in group 1 with its corresponding element for subject 1 in group 2. Of course, the comparison needs to be kept between subject 2 in group 1 and subject 2 in group 2, and between subject 3 in group 1 and subject 3 in group 2, and so on. What makes the problem even complex is that there are 100 groups, which in turn are 50 paired groups.
The output needs to keep the original raw response if they are the same. If they are different, the raw response needs to be replaced with '9'.
I'm pretty sure I could do it with for-loop, but wondering if there is anything better than for-loop in r, such as ifelse or apply?
As making my data simple, it would look like below.
df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)
Thanks for any help.
#Initialization of data
df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)
>df
V1 V2 V3 V4 V5 subject group
1 3 3 3 4 5 1 1
2 4 4 3 1 3 2 1
3 3 2 2 4 2 3 1
4 4 4 3 5 3 1 2
5 3 2 1 5 1 2 2
6 2 5 4 4 1 3 2
7 3 2 3 2 2 1 3
8 1 2 3 3 3 2 3
9 2 2 2 2 5 3 3
10 3 3 3 5 4 1 4
11 5 3 5 4 2 2 4
12 5 3 1 1 3 3 4
Processing without for loop
#processing without for loop
# assumption: initial data is sorted by group (can be easily done)
coloumns<-!dimnames(x)[[2]] %in% c('group','subject');
subjects<-df[, 'subject']
tabl<-table(subjects)
rows<-order(subjects)
rows2<-cumsum(tabl)
rows1<-rows2-tabl+1
df[rows[-rows1],coloumns][df[rows[-rows1],coloumns]!=df[rows[-rows2],coloumns]]<-9
>df
V1 V2 V3 V4 V5 subject group
1 3 3 3 4 5 1 1
2 4 4 3 1 3 2 1
3 3 2 2 4 2 3 1
4 9 9 3 9 9 1 2
5 9 9 9 9 9 2 2
6 9 9 9 4 9 3 2
7 9 9 3 9 9 1 3
8 9 2 9 9 9 2 3
9 2 9 9 9 9 3 3
10 3 9 3 9 9 1 4
11 9 9 9 9 9 2 4
12 9 9 9 9 9 3 4
Below is what I did to get the output. Again, thanks to Stanislav
df<-as.data.frame(matrix(sample(c(1:5),60,replace=T),nrow=12))
df$subject<-rep(1:3)
df$group<-rep(1:4, each=3)
> df
V1 V2 V3 V4 V5 subject group
1 1 4 3 1 5 1 1
2 2 1 4 1 5 2 1
3 1 2 5 4 5 3 1
4 5 4 1 4 3 1 2
5 5 1 3 2 2 2 2
6 1 2 2 4 5 3 2
7 5 4 2 3 1 1 3
8 2 3 4 3 5 2 3
9 2 5 3 5 3 3 3
10 4 2 1 4 1 1 4
11 2 3 3 5 5 2 4
12 5 3 3 4 5 3 4
col<-!dimnames(df)[[2]] %in% c('subject','group')
n<-length(df[,1])
temp<-table(df$group)
n.sub<-temp[1]
temp<-seq(1,n,by=2*n.sub)
s1<-c(sapply(temp, function(x) seq.int(x, length.out=n.sub)))
temp<-seq(n.sub+1,n,by=2*n.sub)
s2<-c(sapply(temp, function(x) seq.int(x, length.out=n.sub)))
df[s2,col][df[s1,col]!=df[s2,col]]<-9
> df
V1 V2 V3 V4 V5 subject group
1 1 4 3 1 5 1 1
2 2 1 4 1 5 2 1
3 1 2 5 4 5 3 1
4 9 4 9 9 9 1 2
5 9 1 9 9 9 2 2
6 1 2 9 4 5 3 2
7 5 4 2 3 1 1 3
8 2 3 4 3 5 2 3
9 2 5 3 5 3 3 3
10 9 9 9 9 1 1 4
11 2 3 9 9 5 2 4
12 9 9 3 9 9 3 4
I have a table that looks like:
dat = data.frame(expand.grid(x = 1:10, y = 1:10),
z = sample(LETTERS[1:3], size = 100, replace = TRUE))
tabl <- with(dat, table(z, y))
tabl
y
z 1 2 3 4 5 6 7 8 9 10
A 5 3 1 1 3 6 3 7 2 4
B 4 5 3 6 5 1 3 1 4 4
C 1 2 6 3 2 3 4 2 4 2
Now how do I transform it into a data.frame that looks like
1 2 3 4 5 6 7 8 9 10
A 5 3 1 1 3 6 3 7 2 4
B 4 5 3 6 5 1 3 1 4 4
C 1 2 6 3 2 3 4 2 4 2
Here are a couple of options.
The reason as.data.frame(tabl) doesn't work is that it dispatches to the S3 method as.data.frame.table() which does something useful but different from what you want.
as.data.frame.matrix(tabl)
# 1 2 3 4 5 6 7 8 9 10
# A 5 4 3 1 1 3 3 2 6 2
# B 1 4 3 4 5 3 4 4 3 3
# C 4 2 4 5 4 4 3 4 1 5
## This will also work
as.data.frame(unclass(tabl))