R combine nx4 into nx2 - r

I have a dataset that has 1 factors (4 levels). However each factor level and data is currently in its own column, with a factor level label at the top (Matrix of n by 4).
To do an anova I want to change this to a n by 2 with all the factor labels in column A and all the data in column B.
I could easily cut and paste this in Excel, then back into a csv- but assume there is a way to do this with cbind.
Sample data:
A B C D
2 4 6 8
3 5 7 9
What I require:
A 2
A 3
B 4
B 5
C 6
C 7
D 8
D 9

You should use stack:
stack(df) # where `df` is your data.frame

stack is better here but also:
library(reshape2)
melt(df)

Related

cbind named vectors in R by name

I have two named vectors similar to these ones:
x <- c(1:5)
names(x) <- c("a","b","c","d","e")
t <- c(6:10)
names(t) <- c("e","d","c","b","a")
I would like to combine them so to get the following outcome:
x t
a 1 10
b 2 9
c 3 8
d 4 7
e 5 6
Unfortunately when I run cbind(x,t) the result just combines them in the order they are disregarding the names of t and only keeping those of x. Giving the following result:
x t
a 1 6
b 2 7
c 3 8
d 4 9
e 5 10
I'm pretty sure there must be an easy solution, but I cannot find it. As this passage is part of a long and tedious loop (and the vectors I'm working with are much longer), it is important to have the least convoluted and quicker to compute options.
We can use the names of 'x' to change the order the 't' elements and cbind with 'x'
cbind(x, t = t[names(x)])
# x t
#a 1 10
#b 2 9
#c 3 8
#d 4 7
#e 5 6

populate Data Frame based on lookup data frame in R

How does one go about switching a data frame based on column names between to tables with a lookup table in between.
Orig
A B C
1 2 3
2 2 2
4 5 6
Ret
D E
7 8
8 9
2 4
lookup <- data.frame(Orig=c('A','B','C'),Ret=c('D','D','E'))
Orig Ret
1 A D
2 B D
3 C E
So that the final data frame would be
A B C
7 7 8
8 8 9
2 2 4
We can match the 'Orig' column in 'lookup' with the column names of 'Orig' to find the numeric index (although, it is in the same order, it could be different in other cases), get the corresponding 'Ret' elements based on that. We use that to subset the 'Ret' dataset and assign the output back to the original dataset. Here I made a copy of "Orig".
OrigN <- Orig
OrigN[] <- Ret[as.character(lookup$Ret[match(as.character(lookup$Orig),
colnames(Orig))])]
OrigN
# A B C
#1 7 7 8
#2 8 8 9
#3 2 2 4
NOTE: as.character was used as the columns in 'lookup' were factor class.
I believe that the following will work as well.
OrigN <- Orig
OrigN[, as.character(lookup$Orig)] <- Ret[, as.character(lookup$Ret)]
This method applies a column shuffle to Orig (actually a copy OrigN following #Akrun) and then fills these columns with the appropriately ordered columns of Ret using the lookup.

Extract data from data.frame based on coordinates in another data.frame

So here is what my problem is. I have a really big data.frame woth two columns, first one represents x coordinates (rows) and another one y coordinates (columns), for example:
x y
1 1
2 3
3 1
4 2
3 4
In another frame I have some data (numbers actually):
a b c d
8 7 8 1
1 2 3 4
5 4 7 8
7 8 9 7
1 5 2 3
I would like to add a third column in first data.frame with data from second data.frame based on coordinates from first data.frame. So the result should look like this:
x y z
1 1 8
2 3 3
3 1 5
4 2 8
3 4 8
Since my data.frames are really big the for loops are too slow. I think there is a way to do this with apply loop family, but I can't find how. Thanks in advance (and sorry for ugly message layout, this is my first post here and I don't know how to produce this nice layout with code and proper data.frames like in another questions).
This is a simple indexing question. No need in external packages or *apply loops, just do
df1$z <- df2[as.matrix(df1)]
df1
# x y z
# 1 1 1 8
# 2 2 3 3
# 3 3 1 5
# 4 4 2 8
# 5 3 4 8
A base R solution: (df1 and df2 are coordinates and numbers as data frames):
df1$z <- mapply(function(x,y) df2[x,y], df1$x, df1$y )
It works if the last y in the first data frame is corrected from 5 to 4.
I guess it was a typo since you don't have 5 columns in the second data drame.
Here's how I would do this.
First, use data.table for fast merging; then convert your data frames (I'll call them dt1 with coordinates and vals with values) to data.tables.
dt1<-data.table(dt)
vals<-data.table(vals)
Second, put vals into a new data.table with coordinates:
vals_dt<-data.table(x=rep(1:dim(vals)[1],dim(vals)[2]),
y=rep(1:dim(vals)[2],each=dim(vals)[1]),
z=matrix(vals,ncol=1)[,1],key=c("x","y"))
Now merge:
setkey(dt1,x,y)[vals_dt,z:=z]
You can also try the data.table package and update df1 by reference
library(data.table)
setDT(df1)[, z := df2[cbind(x, y)]][]
# x y z
# 1: 1 1 8
# 2: 2 3 3
# 3: 3 1 5
# 4: 4 2 8
# 5: 3 4 8

R ggplot2 number of rows of the same values in a column

I'm new to R and plotting in R. This might be a very simple question but here it is,
Suppose I have a data frame like this:
a b c d
1 5 6 7
2 3 5 7
1 4 6 2
2 3 5 NA
1 4 4 2
2 2 4 2
1 2 5 1
2 3 4 NA
Here a, b, c, d are column names. I want to plot a bar chart that has values in column d on the x axis, and the number of rows with that value on y axis. So 7 has 2 rows, 1 has 1 and 2 has 3. It's not important to include missing values in between(3, 4, 5, 6).
So the result would be something like a histogram. I know I can do counting on column d and then do the plotting but I feel there must be a better way to do this.
Here's an approach--if I understand your question, columns A, B, and C are immaterial to what you are doing, which is plotting frequencies of column D.
library(ggplot2)
library(reshape)
##get frequencies of col d
test.summary<-table(test$d)
## re-shape the data
test.summary.m<-melt(test.summary)
ggplot(test.summary.m,aes(x=as.factor(Var.1),y=value))+
geom_bar(stat='identity')

sort and number within levels of a factor in r

if i have the following data frame G:
z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4
I am trying to get:
z type x y
3 a 6 3
2 a 5 2
1 a 4 1
4 b 1 2
5 b 0.9 1
6 c 4 1
I.e. i want to sort the whole data frame within the levels of factor type based on vector x. Get the length of of each level a = 3 b=2 c=1 and then number in a decreasing fashion in a new vector y.
My starting place is currently with sort()
tapply(y, x, sort)
Would it be best to first try and use sapply to split everything first?
There are many ways to skin this cat. Here is one solution using base R and vectorized code in two steps (without any apply):
Sort the data using order and xtfrm
Use rle and sequence to genereate the sequence.
Replicate your data:
dat <- read.table(text="
z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4
", header=TRUE, stringsAsFactors=FALSE)
Two lines of code:
r <- dat[order(dat$type, -xtfrm(dat$x)), ]
r$y <- sequence(rle(r$type)$lengths)
Results in:
r
z type x y
3 3 a 6.0 1
2 2 a 5.0 2
1 1 a 4.0 3
4 4 b 1.0 1
5 5 b 0.9 2
6 6 c 4.0 1
The call to order is slightly complicated. Since you are sorting one column in ascending order and a second in descending order, use the helper function xtfrm. See ?xtfrm for details, but it is also described in ?order.
I like Andrie's better:
dat <- read.table(text="z type x
1 a 4
2 a 5
3 a 6
4 b 1
5 b 0.9
6 c 4", header=T)
Three lines of code:
dat <- dat[order(dat$type), ]
x <- by(dat, dat$type, nrow)
dat$y <- unlist(sapply(x, function(z) z:1))
I Edited my response to adapt for the comments Andrie mentioned. This works but if you went this route instead of Andrie's you're crazy.

Resources