Subtracting 1 column from multiple columns - r

df <- data.frame(a=1:3, b=4:6, c=7:9, d=10:12, e=13:15)
a b c d e
1 4 7 10 13
2 5 8 11 14
3 6 9 12 15
Is it possible to subtract 'column a' from all of the other columns without doing each calculation individually?
I have a dataset of 1001 columns and would like to know if it is possible to do so without doing 1000 calculations manually.
Many Thanks

Try this:
#Data
df <- data.frame(a=1:3, b=4:6, c=7:9, d=10:12, e=13:15)
#Isolate
df1 <- df[,1,drop=F]
#Substract
dfr <- cbind(df1,as.data.frame(apply(df[,-1],2,function(x) x-df1)))
names(dfr)<-names(df)
a b c d e
1 1 3 6 9 12
2 2 3 6 9 12
3 3 3 6 9 12

Related

How to assign a value to a column based on a column index

Having a data frame I would like to assign a calculated value based on a given a column index
df <- data.frame(a = c(2,4,7,3,5,3), b = c(8,3,8,2,6,1))
> df
a b
1 2 8
2 4 3
3 7 8
4 3 2
5 5 6
6 3 1
max <- apply(df, 1, which.max)
> max
[1] 2 1 2 1 2 1
addition <- apply(df, 1, sum)
> addition
[1] 10 7 15 5 11 4
Then some operation which I cannot figure out with the following result being assigned to df2
> df2
a b
1 2 10
2 7 3
3 7 15
4 5 2
5 5 11
6 4 1
highly appreciate your ideas and your help. Thank you
You can use cbind to access your selected columns for each row:
df2 = df
df2[cbind(1:nrow(df2),max)] = addition
df2
a b
1 2 10
2 7 3
3 7 15
4 5 2
5 5 11
6 4 1
Here, cbind returns a matrix of 2 columns and 6 rows that we use to subset the dataframe using matrix subsetting.
You can also use vectorised ifelse directly:
with(df, cbind.data.frame(a = ifelse(a > b, a + b, a), b = ifelse(a > b, b, a + b)));
# a b
#1 2 10
#2 7 3
#3 7 15
#4 5 2
#5 5 11
#6 4 1

How to delete duplicates but keep most recent data in R

I have the following two data frames:
df1 = data.frame(names=c('a','b','c','c','d'),year=c(11,12,13,14,15), Times=c(1,1,3,5,6))
df2 = data.frame(names=c('a','e','e','c','c','d'),year=c(12,12,13,15,16,16), Times=c(2,2,4,6,7,7))
I would like to know how I could merge the above df but only keeping the most recent Times depending on the year. It should look like this:
Names Year Times
a 12 2
b 12 2
c 16 7
d 16 7
e 13 4
I'm guessing that you do not mean to merge these but rather combine by stacking. Your question is ambiguous since the "duplication" could occur at the dataframe level or at the vector level. You example does not display any duplication at the dataframe level but would at the vector level. The best way to describe the problem is that you want the last (or max) Times entry within each group if names values:
> df1
names year Times
1 a 11 1
2 b 12 1
3 c 13 3
4 c 14 5
5 d 15 6
> df2
names year Times
1 a 12 2
2 e 12 2
3 e 13 4
4 c 15 6
5 c 16 7
6 d 16 7
> dfr <- rbind(df1,df2)
> dfr <-dfr[order(dfr$Times),]
> dfr[!duplicated(dfr, fromLast=TRUE) , ]
names year Times
1 a 11 1
2 b 12 1
6 a 12 2
7 e 12 2
3 c 13 3
8 e 13 4
4 c 14 5
5 d 15 6
9 c 15 6
10 c 16 7
11 d 16 7
> dfr[!duplicated(dfr$names, fromLast=TRUE) , ]
names year Times
2 b 12 1
6 a 12 2
8 e 13 4
10 c 16 7
11 d 16 7
This uses base R functions; there are also newer packages (such as plyr) that many feel make the split-apply-combine process more intuitive.
df <- rbind(df1, df2)
do.call(rbind, lapply(split(df, df$names), function(x) x[which.max(x$year), ]))
## names year Times
## a a 12 2
## b b 12 1
## c c 16 7
## d d 16 7
## e e 13 4
We could also use aggregate:
df <- rbind(df1,df2)
aggregate(cbind(df$year,df$Times)~df$names,df,max)
# df$names V1 V2
# 1 a 12 2
# 2 b 12 1
# 3 c 16 7
# 4 d 16 7
# 5 e 13 4
In case you wanted to see a data.table solution,
# load library
library(data.table)
# bind by row and convert to data.table (by reference)
df <- setDT(rbind(df1, df2))
# get the result
df[order(names, year), .SD[.N], by=.(names)]
The output is as follows:
names year Times
1: a 12 2
2: b 12 1
3: c 16 7
4: d 16 7
5: e 13 4
The final line orders the row-binded data by names and year, and then chooses the last observation (.sd[.N]) for each name.

combine two different dimension of dataframes to one dataframe

I have a problem to combine two different dimension dataframes which each dataframe has huge rows. Let's say, the sample of my dataframes are d and e, and new expected dataframe is de. I would like to make pair between all value in same row both in d and e, and construct those pairs in a new dataframe (de). Any idea/help for solving my problem is really appreciated. Thanks
> d <- data.frame(v1 = c(1,3,5), v2 = c(2,4,6))
> d
v1 v2
1 1 2
2 3 4
3 5 6
> e <- data.frame(v1 = c(11, 14), v2 = c(12,15), v3=c(13,16))
> e
v1 v2 v3
1 11 12 13
2 14 15 16
> de <- data.frame(x = c(1,1,1,2,2,2,3,3,3,4,4,4), y = c(11,12,13,11,12,13,14,15,16,14,15,16))
> de
x y
1 1 11
2 1 12
3 1 13
4 2 11
5 2 12
6 2 13
7 3 14
8 3 15
9 3 16
10 4 14
11 4 15
12 4 16
One solution is to "melt" d and e into long format, then merge, then get rid of the extra columns. If you have very large datasets, data tables are much faster (no difference for this tiny dataset).
library(reshape2) # for melt(...)
library(data.table)
# add id column
d <- cbind(id=1:nrow(d),d)
e <- cbind(id=1:nrow(e),e)
# melt to long format
d.melt <- data.table(melt(d,id.vars="id"), key="id")
e.melt <- data.table(melt(e,id.vars="id"), key="id")
# data table join, remove extra columns
result <- d.melt[e.melt, allow.cartesian=T]
result[,":="(id=NULL,variable=NULL,variable.1=NULL)]
setnames(result,c("x","y"))
setkey(result,x,y)
result
x y
1: 1 12
2: 1 13
3: 1 14
4: 2 12
5: 2 13
6: 2 14
7: 3 15
8: 3 16
9: 3 17
10: 4 15
11: 4 16
12: 4 17
If your data are numeric, like they are in this example, this is pretty straightforward in base R too. Conceptually this is the same as #jlhoward's answer: get your data into a long format, and merge:
merge(cbind(id = rownames(d), stack(d)),
cbind(id = rownames(e), stack(e)),
by = "id")[c("values.x", "values.y")]
# values.x values.y
# 1 1 11
# 2 1 12
# 3 1 13
# 4 2 11
# 5 2 12
# 6 2 13
# 7 3 14
# 8 3 15
# 9 3 16
# 10 4 14
# 11 4 15
# 12 4 16
Or, with the "reshape2" package:
merge(melt(as.matrix(d)),
melt(as.matrix(e)),
by = "Var1")[c("value.x", "value.y")]

Maintaining order in split-apply-combine problems [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to ddply() without sorting?
I have the following data frame
dd1 = data.frame(cond = c("D","A","C","B","A","B","D","C"), val = c(11,7,9,4,3,0,5,2))
dd1
cond val
1 D 11
2 A 7
3 C 9
4 B 4
5 A 3
6 B 0
7 D 5
8 C 2
and now need to compute cumulative sums respecting the factor level in cond. The results should look like that:
> dd2 = data.frame(cond = c("D","A","C","B","A","B","D","C"), val = c(11,7,9,4,3,0,5,2), cumsum=c(11,7,9,4,10,4,16,11))
> dd2
cond val cumsum
1 D 11 11
2 A 7 7
3 C 9 9
4 B 4 4
5 A 3 10
6 B 0 4
7 D 5 16
8 C 2 11
It is important to receive the result data frame in the same order as the input data frame because there are other variables bound to that.
I tried ddply(dd1, .(cond), summarize, cumsum = cumsum(val)) but it didn't produce the result I expected.
Thanks
Use ave instead.
dd1$cumsum <- ave(dd1$val, dd1$cond, FUN=cumsum)
If doing this by hand is an option then split() and unsplit() with a suitable lapply() inbetween will do this for you.
dds <- split(dd1, dd1$cond)
dds <- lapply(dds, function(x) transform(x, cumsum = cumsum(x$val)))
unsplit(dds, dd1$cond)
The last line gives
> unsplit(dds, dd1$cond)
cond val cumsum
1 D 11 11
2 A 7 7
3 C 9 9
4 B 4 4
5 A 3 10
6 B 0 4
7 D 5 16
8 C 2 11
I separated the three steps, but these could be strung together or placed in a function if you are doing a lot of this.
A data.table solution:
require(data.table)
dt <- data.frame(dd1)
dt[, c.val := cumsum(val),by=cond]
> dt
# cond val c.val
# 1: D 11 11
# 2: A 7 7
# 3: C 9 9
# 4: B 4 4
# 5: A 3 10
# 6: B 0 4
# 7: D 5 16
# 8: C 2 11

Generate combination of data frame and vector

I know expand.grid is to create all combinations of given vectors. But is there a way to generate all combinations of a data frame and a vector by taking each row in the data frame as unique. For instance,
df <- data.frame(a = 1:3, b = 5:7)
c <- 9:10
how to create a new data frame that is the combination of df and c without expanding df:
df.c:
a b c
1 5 9
2 6 9
3 7 9
1 5 10
2 6 10
3 7 10
Thanks!
As for me the simplest way is merge(df, as.data.frame(c))
a b c
1 1 5 9
2 2 6 9
3 3 7 9
4 1 5 10
5 2 6 10
6 3 7 10
This may not scale when your dataframe has more than two columns per row, but you can just use expand.grid on the first column and then merge the second column in.
df <- data.frame(a = 1:3, b = 5:7)
c <- 9:10
combined <- expand.grid(a=df$a, c=c)
combined <- merge(combined, df)
> combined[order(combined$c), ]
a c b
1 1 9 5
3 2 9 6
5 3 9 7
2 1 10 5
4 2 10 6
6 3 10 7
You could also do something like this
do.call(rbind,lapply(9:10, function(x,d) data.frame(d, c=x), d=df)))
# or using rbindlist as a fast alternative to do.call(rbind,list)
library(data.table)
rbindlist(lapply(9:10, function(x,d) data.frame(d, c=x), d=df)))
or
rbindlist(Map(data.frame, c = 9:10, MoreArgs = list(a= 1:3,b=5:7)))
This question is really old but I found one more answer.
Use tidyr's expand_grid().
expand_grid(df, c)
# A tibble: 6 × 3
a b c
<int> <int> <int>
1 1 5 9
2 1 5 10
3 2 6 9
4 2 6 10
5 3 7 9
6 3 7 10

Resources