Reshape data frame by vector - r

Lets say a function called textstat_frequency{package:quanteda}
gives us following data frame.
data.frame(xx=1:4,yy=5:8,foo=c("A","A","B","C"),stringsAsFactors=FALSE)
xx yy foo
1 1 5 A
2 2 6 A
3 3 7 B
4 4 8 C
What's the best way to shape the data.frame according the vector
c("B","A","C"). I have made an index with match or %in% but without any luck.

df = data.frame(xx=1:4,yy=5:8,foo=c("A","A","B","C"),stringsAsFactors=FALSE)
temp = factor(df$foo, levels = c("B", "A", "C"))
df = df[order(temp),]
df
# xx yy foo
#3 3 7 B
#1 1 5 A
#2 2 6 A
#4 4 8 C

Related

Simplest way to replace a list of values in a data frame with a list of new values

Say we have a data frame with a factor (Group) that is a grouping variable for a list of IDs:
set.seed(123)
data <- data.frame(Group = factor(sample(5,10, replace = T)),
ID = c(1:10))
In this example, the ID's belong to one of 5 Groups, labeled 1:5. We simply want to replace 1:5 with A:E. In other words, if Group == 1, we want to change it to A, if Group == 2, we want to change it to B, and so on. What is the simplest way to achieve this?
You may assign new labels= in a names list using factor once again.
data$Group1 <- factor(data$Group, labels=list("1"="A", "2"="B", "3"="C", "4"="D", "5"="E"))
## more succinct:
data$Group2 <- factor(data$Group, labels=setNames(list("A", "B", "C", "D", "E"), 1:5))
data
# Group ID Group1 Group2 Group3
# 1 3 1 C C C
# 2 3 2 C C C
# 3 2 3 B B B
# 4 2 4 B B B
# 5 3 5 C C C
# 6 5 6 E E E
# 7 4 7 D D D
# 8 1 8 A A A
# 9 2 9 B B B
# 10 3 10 C C C
This for general, if indeed capital letters are wanted see #RonakShah's solution.
You can use the built-in constant in R LETTERS :
data$new_group <- LETTERS[data$Group]
data
# Group ID new_group
#1 3 1 C
#2 3 2 C
#3 2 3 B
#4 2 4 B
#5 3 5 C
#6 5 6 E
#7 4 7 D
#8 1 8 A
#9 2 9 B
#10 3 10 C
Created a new column (new_group) here for comparison purposes. You can overwrite the same column if you wish to.

From axis values to coodinates pairs [duplicate]

I have two vectors of integers, say v1=c(1,2) and v2=c(3,4), I want to combine and obtain this as a result (as a data.frame, or matrix):
> combine(v1,v2) <--- doesn't exist
1 3
1 4
2 3
2 4
This is a basic case. What about a little bit more complicated - combine every row with every other row? E.g. imagine that we have two data.frames or matrices d1, and d2, and we want to combine them to obtain the following result:
d1
1 13
2 11
d2
3 12
4 10
> combine(d1,d2) <--- doesn't exist
1 13 3 12
1 13 4 10
2 11 3 12
2 11 4 10
How could I achieve this?
For the simple case of vectors there is expand.grid
v1 <- 1:2
v2 <- 3:4
expand.grid(v1, v2)
# Var1 Var2
#1 1 3
#2 2 3
#3 1 4
#4 2 4
I don't know of a function that will automatically do what you want to do for dataframes(See edit)
We could relatively easily accomplish this using expand.grid and cbind.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
expand.grid(df1, df2) # doesn't work so let's try something else
id <- expand.grid(seq(nrow(df1)), seq(nrow(df2)))
out <-cbind(df1[id[,1],], df2[id[,2],])
out
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#1.1 1 3 6 b
#2.1 2 4 6 b
Edit: As Joran points out in the comments merge does this for us for data frames.
df1 <- data.frame(a = 1:2, b=3:4)
df2 <- data.frame(cat = 5:6, dog = c("a","b"))
merge(df1, df2)
# a b cat dog
#1 1 3 5 a
#2 2 4 5 a
#3 1 3 6 b
#4 2 4 6 b

cumulative product in R across column

I have a dataframe in the following format
> x <- data.frame("a" = c(1,1),"b" = c(2,2),"c" = c(3,4))
> x
a b c
1 1 2 3
2 1 2 4
I'd like to add 3 new columns which is a cumulative product of the columns a b c, however I need a reverse cumulative product i.e. the output should be
row 1:
result_d = 1*2*3 = 6 , result_e = 2*3 = 6, result_f = 3
and similarly for row 2
The end result will be
a b c result_d result_e result_f
1 1 2 3 6 6 3
2 1 2 4 8 8 4
the column names do not matter this is just an example. Does anyone have any idea how to do this?
as per my comment, is it possible to do this on a subset of columns? e.g. only for columns b and c to return:
a b c results_e results_f
1 1 2 3 6 3
2 1 2 4 8 4
so that column "a" is effectively ignored?
One option is to loop through the rows and apply cumprod over the reverse of elements and then do the reverse
nm1 <- paste0("result_", c("d", "e", "f"))
x[nm1] <- t(apply(x, 1,
function(x) rev(cumprod(rev(x)))))
x
# a b c result_d result_e result_f
#1 1 2 3 6 6 3
#2 1 2 4 8 8 4
Or a vectorized option is rowCumprods
library(matrixStats)
x[nm1] <- rowCumprods(as.matrix(x[ncol(x):1]))[,ncol(x):1]
temp = data.frame(Reduce("*", x[NCOL(x):1], accumulate = TRUE))
setNames(cbind(x, temp[NCOL(temp):1]),
c(names(x), c("res_d", "res_e", "res_f")))
# a b c res_d res_e res_f
#1 1 2 3 6 6 3
#2 1 2 4 8 8 4

add column to existing column in r

How do I convert 2 columns from a data.frame onto 2 different columns?
I.E:
Data
A B C D
1 3 5 7
2 4 6 8
to
Data
A B
1 3
2 4
5 7
6 8
You can use rbind
rbind(df[,1:2], data.frame(A = df$C, B = df$D))
You can use a fast version of rbind, rbindlist from data.table:
library(data.table)
rbindlist(lapply(seq(1, ncol(df), 2), function(i) df[,i:(i+1)]))
Here is my solution but it requires to change names of the columns.
names(dat) <- c("A", "B", "A", "B")
merge(dat[1:2], dat[3:4], all = T)
A B
1 1 3
2 2 4
3 5 7
4 6 8
And here is another solution more easy.
dat[3:4, ] <- dat[ ,3:4]
dat <- dat[1:2]
dat
A B
1 1 3
2 2 4
3 5 7
4 6 8
For scalability, a solution that will halve any even size data frame and append the rows:
half <- function(df) {m <- as.matrix(df)
dim(m) <- c(nrow(df)*2,ncol(df)/2)
nd <- as.data.frame(m)
names(nd) <- names(df[(1:dim(nd)[2])]);nd}
half(Data)
A B
1 1 5
2 2 6
3 3 7
4 4 8

Condensing Data Frame in R

I just have a simple question, I really appreciate everyones input, you have been a great help to my project. I have an additional question about data frames in R.
I have data frame that looks similar to something like this:
C <- c("","","","","","","","A","B","D","A","B","D","A","B","D")
D <- c(NA,NA,NA,2,NA,NA,1,1,4,2,2,5,2,1,4,2)
G <- list(C=C,D=D)
T <- as.data.frame(G)
T
C D
1 NA
2 NA
3 NA
4 2
5 NA
6 NA
7 1
8 A 1
9 B 4
10 D 2
11 A 2
12 B 5
13 D 2
14 A 1
15 B 4
16 D 2
I would like to be able to condense all the repeat characters into one, and look similar to this:
J B C E
1 2 1
2 A 1 2 1
3 B 4 5 4
4 D 2 2 2
So of course, the data is all the same, it is just that it is condensed and new columns are formed to hold the data. I am sure there is an easy way to do it, but from the books I have looked through, I haven't seen anything for this!
EDIT I edited the example because it wasn't working with the answers so far. I wonder if the NA's, blanks, and unevenness from the blanks are contributing??
hereĀ“s a reshape solution:
require(reshape)
cast(T, C ~ ., function(x) x)
Changed T to df to avoid a bad habit. Returns a list, which my not be what you want but you can convert from there.
C <- c("A","B","D","A","B","D","A","B","D")
D <- c(1,4,2,2,5,2,1,4,2)
my.df <- data.frame(id=C,val=D)
ret <- function(x) x
by.df <- by(my.df$val,INDICES=my.df$id,ret)
This seems to get the results you are looking for. I'm assuming it's OK to remove the NA values since that matches the desired output you show.
T <- na.omit(T)
T$ind <- ave(1:nrow(T), T$C, FUN = seq_along)
reshape(T, direction = "wide", idvar = "C", timevar = "ind")
# C D.1 D.2 D.3
# 4 2 1 NA
# 8 A 1 2 1
# 9 B 4 5 4
# 10 D 2 2 2
library(reshape2)
dcast(T, C ~ ind, value.var = "D", fill = "")
# C 1 2 3
# 1 2 1
# 2 A 1 2 1
# 3 B 4 5 4
# 4 D 2 2 2

Resources