R - Link row values to create an ID column - r

I have the following data frame:
X1 X2 X3 X4 X5
a 1 4 d e
f 2 5 i j
k 3 6 n o
I would like to create an ID column based on row values such that:
X1 X2 X3 X4 X5 ID
a 1 4 d e a14de
f 2 5 i j f25ij
k 3 6 n o k36no
Is there a way to do so?
Some variables are character and some numeric.

We can use paste to create the 'ID'
df1$ID <- do.call(paste0, df1)

Related

if else: else portion not returning output

I am currently trying to cycle through a dataframe of integers and characters and change one value of each row, conditionally. For all rows that do not meet the conditions I would just like to add them back into a new dataframe filled with the modified rows.
I've done this before with no trouble, but I feel as though I have been staring at this too long without any enlightenment.
a<-data.frame(cbind(1,'a',2,'c',3,'d'), stringsAsFactors = F)
b<-data.frame(cbind(1,'a',2,'c',3,'g'), stringsAsFactors = F)
c<-data.frame(cbind(1,'f',4,'g',5,'h'), stringsAsFactors = F)
x<-rbind(a,b,c)
fun<-function(x){
fin<-NULL
for(i in 1:nrow(x)){
v<-x[i+1,]
if ((x[i,1]== v[i,1]) & (x[i,2]==v[i,2]) ){
x[i,3]<-"f"
fin<-rbind(fin, x[i,])
}else {fin<-rbind(fin, x[i,]) }
return(fin)
}
}
fun(x)
X1 X2 X3 X4 X5 X6
1 1 a f c 3 d
>
The result I desire:
X1 X2 X3 X4 X5 X6
1 1 a f c 3 d
1 1 a 2 c 3 g
1 1 f 4 g 5 h
Or an alternative:
library(dplyr)
library(magrittr)
> z <- x %>% mutate(match = ifelse(( (lead(X1)==X1) & (lead(X2)==X2)),"YES","NO"))
> z %>% mutate(X3 = replace(X3, match=="YES", "f"))
X1 X2 X3 X4 X5 X6 match
1 1 a f c 3 d YES
2 1 a 2 c 3 g NO
3 1 f 4 g 5 h <NA>

Repeat data frame with varying date column

How can I repeat a data frame with varying date column at the end? If I apply one of the previously recommended ways, all the columns get repeated. For example:
df<-data.frame(x1=c(1:3), x2=c('z','g','h'), x3=c( rep( as.Date("2011-07-31"), by=1, len=3)) )
n=2
do.call("rbind", replicate(n, df, simplify = FALSE))
x1 x2 x3
1 1 z 2011-07-31
2 2 g 2011-07-31
3 3 h 2011-07-31
4 1 z 2011-07-31
5 2 g 2011-07-31
6 3 h 2011-07-31
Whereas what I need is:
x1 x2 x3
1 1 z 2011-07-31
2 2 g 2011-07-31
3 3 h 2011-07-31
4 1 z 2011-08-01
5 2 g 2011-08-01
6 3 h 2011-08-01
> n=2
> df1 <- df[rep(1:nrow(df), n),]
> transform(df1, x3=ave(x3, x1, FUN=function(x) x + 1:length(x) - 1L))
x1 x2 x3
1 1 z 2011-07-31
2 2 g 2011-07-31
3 3 h 2011-07-31
1.1 1 z 2011-08-01
2.1 2 g 2011-08-01
3.1 3 h 2011-08-01
or
> library(dplyr)
> df1 <- df[rep(1:nrow(df), n),]
> df1 %>% group_by(x1,x2) %>% mutate(x3= x3 + 1:n() - 1L)
Here is another base R method that works for your example.
# save result
dat <- do.call("rbind", replicate(n, df, simplify = FALSE))
# update x3 variable
dat$x3 <- dat$x3 + cumsum(dat$x1 == 1) - 1
The logic is that we use a cumulative sum that is incremented every time x1 returns to its initial value (here 1). We subtract 1 from the result as we don't want to alter the first block.
this returns
dat
x1 x2 x3
1 1 z 2011-07-31
2 2 g 2011-07-31
3 3 h 2011-07-31
4 1 z 2011-08-01
5 2 g 2011-08-01
6 3 h 2011-08-01
Using transform, this can be written
transform(dat, x3 = x3 + cumsum(x1 == 1) - 1)
As an alternative counting procedure, we could use seq_len together with rep like this
# update x3 variable
dat$x3 <- dat$x3 + rep(seq_len(n)-1L, each=nrow(df))

R replacing a column from a data frame with a row from another data frame

I want to replace the first column of A with the first row of B. For example:
A <- data.frame(matrix("a", 4, 4), stringsAsFactors = FALSE)
> A
X1 X2 X3 X4
1 a a a a
2 a a a a
3 a a a a
4 a a a a
B <- data.frame(matrix("b", 4, 4), stringsAsFactors = FALSE)
> B
X1 X2 X3 X4
1 b b b b < Take this row
2 b b b b
3 b b b b
4 b b b b
I want A to become:
> A
X1 X2 X3 X4
1 b a a a
2 b a a a
3 b a a a
4 b a a a
^
replace it with this column
I tried:
A[, 1] = B[1, ]
But I get the following warning message:
In `[<-.data.frame`(`*tmp*`, , 1, value = list(X1 = "b", X2 = "b", :
provided 4 variables to replace 1 variables
By default, R does not drop the dimension when there is just one row left (while it does when there is just one column).
From ?extract.data.frame:
drop: logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left.
You can see that doing:
A[, 1]
# [1] "a" "a" "a" "a"
The result is a vector
and
B[1, ]
# X1 X2 X3 X4
#1 b b b b
the result is still a data.frame
You need to unlist the result:
A[, 1] = unlist(B[1, ])
A
# X1 X2 X3 X4
#1 b a a a
#2 b a a a
#3 b a a a
#4 b a a a
This should also work, without changing row / col names:
A[, 1] = t(B)[,1]
This should do it
A[, 1] = t(B[1, ])

How to "unmelt" data with reshape r

I have a data frame that I melted using the reshape package that I would like to "un melt".
here is a toy example of the melted data (real data frame is 500x100 or larger) :
variable<-c(rep("X1",3),rep("X2",3),rep("X3",3))
value<-c(rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3),rep(rnorm(1,.5,.2),3))
dat <-data.frame(variable,value)
dat
variable value
1 X1 0.5285376
2 X1 0.5285376
3 X1 0.5285376
4 X2 0.1694908
5 X2 0.1694908
6 X2 0.1694908
7 X3 0.7446906
8 X3 0.7446906
9 X3 0.7446906
Each variable (X1, X2,X3) has values estimated at 3 different times (which in this toy example happen to be the same, but this is never the case).
I would like to get it (back) in the form of :
X1 X2 X3
1 0.5285376 0.1694908 0.7446906
2 0.5285376 0.1694908 0.7446906
3 0.5285376 0.1694908 0.7446906
Basically, I would like the variable column to be sorted on ID (X1, X2 etc) and become column headings. I have tried various permutations of cast, dcast, recast, etc.. and cant seem to get the data in the format that I want. It was easy enough to 'melt' data from the wide form to the longer form (e.g. the dat datset), but getting it back is proving difficult. Any ideas? I know this is relatively simple, but I am having a hard time conceptualizing how to do this in reshape or reshape2.
Thanks,
LP
I typically do this by creating an id column and then using dcast:
> dat
variable value
1 X1 0.4299397
2 X1 0.4299397
3 X1 0.4299397
4 X2 0.2531551
5 X2 0.2531551
6 X2 0.2531551
7 X3 0.3972119
8 X3 0.3972119
9 X3 0.3972119
> dat$id <- rep(1:3,times = 3)
> dcast(data = dat,formula = id~variable,fun.aggregate = sum,value.var = "value")
id X1 X2 X3
1 1 0.4299397 0.2531551 0.3972119
2 2 0.4299397 0.2531551 0.3972119
3 3 0.4299397 0.2531551 0.3972119
Depending on how robust you need this to be , the following will correctly cast for varying number of occurrences of variables (and in any order).
> variable<-c(rep("X1",5),rep("X2",4),rep("X3",3))
> value<-c(rep(rnorm(1,.5,.2),5),rep(rnorm(1,.5,.2),4),rep(rnorm(1,.5,.2),3))
> dat <-data.frame(variable,value)
> dat <- dat[order(rnorm(nrow(dat))),]
> dat
variable value
11 X3 1.0294454
8 X2 0.6147509
2 X1 0.3537012
7 X2 0.6147509
9 X2 0.6147509
5 X1 0.3537012
4 X1 0.3537012
12 X3 1.0294454
3 X1 0.3537012
1 X1 0.3537012
10 X3 1.0294454
6 X2 0.6147509
> dat$id = numeric(nrow(dat))
> for (i in 1:nrow(dat)){
+ dat_temp <- dat[1:i,]
+ dat[i,]$id <- nrow(dat_temp[dat_temp$variable == dat[i,]$variable,])
+ }
> cast(dat, id~variable, value = 'value')
id X1 X2 X3
1 1 0.3537012 0.6147509 1.029445
2 2 0.3537012 0.6147509 1.029445
3 3 0.3537012 0.6147509 1.029445
4 4 0.3537012 0.6147509 NA
5 5 0.3537012 NA NA

Extract data from column aggregate function in R

I have a large database from which I have extracted a data value (x) using the aggregate function:
library(plotrix)
aggregate(mydataNC[,c(52)],by=list(patientNC, siteNC, supNC),max)
OUTPUT:
Each (x) value has a corresponding distance value in located in a column titled (dist) in this database.
What is the easiest way to extract the value dist and added to the table?
I'd probably start with merge() first. Here's a small reproducible example you can use to see what's going on and modify it to use your data:
# generate bogus data and view it
x1 <- rep(c("A", "B", "C"), each = 4)
x2 <- rep(c("E", "E", "F", "F"), times = 3)
y1 <- rnorm(12)
y2 <- rnorm(12)
md <- data.frame(x1, x2, y1, y2)
> head(md)
x1 x2 y1 y2
1 A E -1.4603164 -0.9662473
2 A E -0.5247227 1.7970341
3 A F 0.8990502 1.7596285
4 A F -0.6791145 2.2900357
5 B E 1.2894863 0.1152571
6 B E -0.1981511 0.6388998
# aggregate by taking maximum of each unique (x1, x2) combination
md.agg <- with(md, aggregate(y1, by = list(x1, x2), FUN = max))
names(md.agg) <- c("x1", "x2", "y1")
> md.agg
x1 x2 y1
1 A E -0.5247227
2 B E 1.2894863
3 C E 0.9982510
4 A F 0.8990502
5 B F 2.5125956
6 C F -0.5916491
# merge y2 into the aggregated data
md.final <- merge(md, md.agg)
> md.final
x1 x2 y1 y2
1 A E -0.5247227 1.7970341
2 A F 0.8990502 1.7596285
3 B E 1.2894863 0.1152571
4 B F 2.5125956 -0.2217510
5 C E 0.9982510 0.6813261
6 C F -0.5916491 1.0348518

Resources