Transpose with multiple variables and more than one metrics in R - r

I'm previously a SAS user - since I don't have SAS anymore I need to learn to use R for work.
The dataset has the following column:
market date sitename impression clicks
I want to transpose it into:
market date sitename-impression sitename-clicks
I think in SAS I used to do:
Proc Transpose
by market date;
id sitename;
var impression clicks;
run;
I do have a book on R and googled a lot, but couldn't find the solution that works...
Would really appreciate if anyone can help.
Thanks in advance!!!

Let me start by saying welcome to stackoverflow. Glad to have anew user. When you ask a question it's helpful and encouraged for you to provide the code you're using and a reproducible data set that looks like the original. This is called a minimal reproducible example. To get a data set into here you can use several options, here are two: use dput() around the object name and cut and paste what is displayed in the console or just post the dataframe directly. For the code provide all the code necessary to replicate your problem. I hope you find this helpful for future questions you'll ask.
I may not fully understand but I think you want to transform, not transpose, the data.
dat <- data.frame(market=rnorm(10), date=rnorm(10), #let's create a data set
sitename=rnorm(10), impression=rnorm(10), clicks=rnorm(10))
dat #look at it (I pasted it below)
# > dat
# market date sitename impression clicks
# 1 -0.9593797 -0.08411994 1.6079129 -0.5204772 -0.31633966
# 2 -0.5088689 1.78799500 -0.2469315 1.3476964 -0.04344779
# 3 -0.1527465 0.81673996 1.7824969 -1.5531260 -1.28304384
# 4 -0.7026194 0.52072913 -0.1174356 0.5722210 -1.20474443
# 5 -0.4537490 -0.69139062 1.1124277 -0.2452974 -0.33025320
# 6 0.7466588 0.36318337 -0.4623319 -0.9036768 -0.65754302
# 7 0.8007612 2.59588554 0.1820732 0.4318629 -0.36308748
# 8 1.0781715 -1.01512734 0.2297475 0.9219439 -1.15687902
# 9 0.3731450 -0.19004572 0.5190749 -1.4020371 -0.97370295
# 10 0.7724259 1.76528303 0.5781786 -0.5490849 -0.83819036
#now to create the new columns (I think this is what you want)
#the easiest way is to use transform. ?tranform for more
dat.new <- transform(dat, sitename.clicks=sitename-clicks,
impression.clicks=impression-clicks)
dat.new #here's the new data set. Notice it has the new and old columns.
#To get rid of the old columns you can use indexing and specify the columns you want.
dat.new[, c(1:2, 6:7)]
#We could have also done:
dat.new[, c(1,2,6,7)]
#or said the columns not wanted with negative indexing:
dat.new[, -c(3:5)]
EDIT In looking at Brian's comments and the variables I would think that a long to wide transformation is what the poster desires. I would likely approach it using Wickham's reshape2 package as well, as this method is easier for me to work with and I imagine it would be easier for an R beginner as well. However, here is a base way to do the long to wide format using the same data set Brian provided:
wide <- reshape(DF, v.names=c("impression", "clicks"), idvar=c("market", "date"),
timevar="sitename", direction="wide")
reshape(wide)
The reshape function is very flexible but takes some getting used to to use appropriately. I'm leaving my previous response up as well to keep the history of this post though I now believe this is not the posters intent. It serves as a reminder that a reproducible example is very helpful in providing clarity to your query.

Example data, as Tyler said, is important. I interpreted your question differently because I thought your data was different. I didn't take the - as a literal subtraction of numerics, but a combination of variables.
DF <- expand.grid(market = LETTERS[1:5],
date = Sys.Date()+(0:5),
sitename = letters[1:2])
n <- nrow(DF)
DF$impression <- sample(100, n, replace=TRUE)
DF$clicks <- sample(100, n, replace=TRUE)
I find the reshape2 package useful for these sort of transpositions/transformations/rearrangements.
library("reshape2")
dcast(melt(DF, id.vars=c("market","date","sitename")),
market+date~sitename+variable)
gives
market date a_impression a_clicks b_impression b_clicks
1 A 2012-02-28 74 97 11 71
2 A 2012-02-29 34 30 88 35
3 A 2012-03-01 40 85 40 49
4 A 2012-03-02 46 12 99 20
5 A 2012-03-03 6 95 85 56
6 A 2012-03-04 61 61 42 64
7 B 2012-02-28 4 53 74 9
8 B 2012-02-29 43 27 92 59
9 B 2012-03-01 34 26 86 43
10 B 2012-03-02 81 47 84 35
11 B 2012-03-03 3 5 91 48
12 B 2012-03-04 19 26 99 21
13 C 2012-02-28 22 31 100 53
14 C 2012-02-29 40 83 95 27
15 C 2012-03-01 78 89 81 29
16 C 2012-03-02 57 55 79 87
17 C 2012-03-03 37 61 3 97
18 C 2012-03-04 83 61 41 77
19 D 2012-02-28 81 18 47 3
20 D 2012-02-29 90 100 17 83
21 D 2012-03-01 12 40 35 93
22 D 2012-03-02 85 14 63 67
23 D 2012-03-03 63 53 29 58
24 D 2012-03-04 40 79 56 70
25 E 2012-02-28 97 62 68 31
26 E 2012-02-29 24 84 17 63
27 E 2012-03-01 94 93 32 2
28 E 2012-03-02 6 26 86 26
29 E 2012-03-03 100 34 37 80
30 E 2012-03-04 89 87 72 11
The column names have a _ between them rather than a -, but you can change that if you want. I wouldn't recommend it, though, because then you will have problems later referencing the column since the - will be taken as subtraction (you would need to quote the name).

Related

How to change column names for mrset in R?

I am trying to create crosstabs I have a dataframe in which I have multiple select questions. I am importing the data frame from SPSS file using foreign and expss package. I am creating the multiple select questions using the mrset function. Here's the demo code for this to make it clear.
Banner1 = w %>%
tab_cells(mrset(as.category( temp1,counted_value = "Checked"))) %>%
tab_cols(total(),mrset(as.category( temp2, counted_value = "Checked"))) %>%
tab_stat_cases(total_row_position = "none",label = "")
tab_pivot(Banner1)
The datatable imported looks like this
Total Q12_1 Q12_2 Q12_3 Q12_4 Q12_5
A B C D E F
Total Cases 803 34 18 14 38 37
Q13_1 64 11 7 8 9 7
Q13_2 12 54 54 43 13 12
Q13_3 67 54 23 21 6 4
Sorry about the alignment here....So this is the imported dataset.
Coming to the problem, As you can see this dataset has column labels as Question numbers and not variable labels. For single select questions everything works fine. Is there any function I can change the colnames for mrset functions dynamically?
The desired output should be something like this. For eg,
Total Apple Mango Banana Orange Grapes
A B C D E F
Total Cases 803 34 18 14 38 37
Apple 64 11 7 8 9 7
Mango 12 54 54 43 13 12
banana 67 54 23 21 6 4
Any help would be greatly appreciated.

melt array with multiple value variable

I have a multidimensional array a, and I want to format it into out. I used melt followed by dcast, but I wonder if there is a better way of doing it, without or without using library(reshape)?
library(reshape2)
(a=array(1:3^4,c(3,3,3,3),dimnames=list("d1"=paste("d1",letters[1:3],sep="-"),
"d2"=paste("d2",letters[1:3],sep="-"),
"d3"=paste("d3",letters[1:3],sep="-"),
"d4"=paste("d4",letters[1:3],sep="-"))))
(out=dcast(melt(a,id.vars=c("d1","d2","d3")),d1+d2+d3~d4))
I am asking this question because
My solution feels somewhat repetitive because I am using melt followed by cast and specifying d1,d2,d3 two times. I wonder if there is a more straightforward way of doing things.
It would be a good if there is an at least as compact solution that doesn't require loading an extra package.
So to reiterate, I will be happy with any of the following:
A more straightforward solution that requires reshape2
A more straightforward solution that doesn't require reshape2
An at least as compact solution that doesn't require reshape2
I'm assuming you won't just be copying and pasting code, but rather, either sharing a collection of scripts that could be sourced, or even creating a package of your functions.
Keeping that in mind, it's easy for you to recreate the function that I referred to in the comments.
Here's ftable(a):
ftable(a)
# d4 d4-a d4-b d4-c
# d1 d2 d3
# d1-a d2-a d3-a 1 28 55
# d3-b 10 37 64
# d3-c 19 46 73
# d2-b d3-a 4 31 58
# d3-b 13 40 67
# d3-c 22 49 76
# d2-c ......................
# ................................
And its attributes:
attributes(ftable(a))
# $dim
# [1] 27 3
#
# $class
# [1] "ftable"
#
# $row.vars
# $row.vars$d1
# [1] "d1-a" "d1-b" "d1-c"
#
# $row.vars$d2
# [1] "d2-a" "d2-b" "d2-c"
#
# $row.vars$d3
# [1] "d3-a" "d3-b" "d3-c"
#
#
# $col.vars
# $col.vars$d4
# [1] "d4-a" "d4-b" "d4-c"
You can use these attributes to create a function that looks like this:
ftable2df <- function (mydata) {
if (class(mydata) != "ftable") mydata <- ftable(mydata)
dfrows <- rev(expand.grid(rev(attr(mydata, "row.vars"))))
dfcols <- as.data.frame.matrix(mydata)
names(dfcols) <- do.call(
paste, c(rev(expand.grid(rev(attr(mydata, "col.vars")))),
sep = "_"))
cbind(dfrows, dfcols)
}
ftable2df(a)
# d1 d2 d3 d4-a d4-b d4-c
# 1 d1-a d2-a d3-a 1 28 55
# 2 d1-a d2-a d3-b 10 37 64
# 3 d1-a d2-a d3-c 19 46 73
# 4 d1-a d2-b d3-a 4 31 58
# 5 d1-a d2-b d3-b 13 40 67
# 6 d1-a d2-b d3-c 22 49 76
# 7 d1-a d2-c d3-a 7 34 61
# 8 d1-a d2-c d3-b 16 43 70
# 9 d1-a d2-c d3-c 25 52 79
# 10 d1-b d2-a d3-a 2 29 56
# 11 d1-b d2-a d3-b 11 38 65
# 12 d1-b d2-a d3-c ............
# ................................
Update (non-base solution)
If you're not married to "reshape2" and are open to using a package as long as it's on CRAN, and if you are open to a solution that might be a little slower than melting and dcasting your data, you can also look at adply from "plyr".
library(plyr)
adply(a, 1:3)
An alternative that 1) is short 2) only uses base R
cbind(do.call(expand.grid, dimnames(a)[1:3]), apply(a, 4, identity))
# d1 d2 d3 d4-a d4-b d4-c
#1 d1-a d2-a d3-a 1 28 55
#2 d1-b d2-a d3-a 2 29 56
#3 d1-c d2-a d3-a 3 30 57
# etc
My original solution used reshape and was a bit goofy... I think this is preferable by a long way.

Looping through rows, creating and reusing multiple variables

I am building a streambed hydrology calculator in R using multiple tables from an Access database. I am having trouble automating and calculating the same set of indices for multiple sites. The following sample dataset describes my data structure:
> Thalweg
StationID AB0 AB1 AB2 AB3 AB4 AB5 BC1 BC2 BC3 BC4 Xdep_Vdep
1 1AAUA017.60 47 45 44 55 54 6 15 39 15 11 18.29
2 1AXKR000.77 30 27 24 19 20 18 9 12 21 13 6.46
3 2-BGU005.95 52 67 62 42 28 25 23 26 11 19 20.18
4 2-BLG011.41 66 85 77 83 63 35 10 70 95 90 67.64
5 2-CSR003.94 29 35 46 14 19 14 13 13 21 48 6.74
where each column represents certain field-measured parameters (i.e. depth of a reach section) and each row represents a different site.
I have successfully used the apply functions to simultaneously calculate simple functions on multiple rows:
> Xdepth <- apply(Thalweg[, 2:11], 1, mean) # Mean Depth
> Xdepth
1 2 3 4 5
33.1 19.3 35.5 67.4 25.2
and appending the results back to the proper station in a dataframe.
However, I am struggling when I want to calculate and save variables that are subsequently used for further calculations. I cannot seem to loop or apply the same function to multiple columns on a single row and complete the same calculations over the next row without mixing variables and data.
I want to do:
Residual_AB0 <- min(Xdep_Vdep, Thalweg$AB0)
Residual_AB1 <- min((Residual_AB0 + other_variables), Thalweg$AB1)
Residual_AB2 <- min((Residual_AB1 + other_variables), Thalweg$AB2)
Residual_AB3 <- min((Residual_AB2 + other_variables), Thalweg$AB3)
# etc.
Depth_AB0 <- (Thalweg$AB0 - Residual_AB0)
Depth_AB1 <- (Thalweg$AB1 - Residual_AB1)
Depth_AB2 <- (Thalweg$AB2 - Residual_AB2)
# etc.
I have tried and subsequently failed at for loops such as:
for (i in nrow(Thalweg)){
Residual_AB0 <- min(Xdep_Vdep, Thalweg$AB0)
Residual_AB1 <- min((Residual_AB0 + Stacks_Equation), Thalweg$AB1)
Residual_AB2 <- min((Residual_AB1 + Stacks_Equation), Thalweg$AB2)
Residual_AB3 <- min((Residual_AB2 + Stacks_Equation), Thalweg$AB3)
Residuals <- data.frame(Thalweg$StationID, Residual_AB0, Residual_AB1, Residual_AB2, Residual_AB3)
}
Is there a better way to approach looping through multiple lines of data when I need unique variables saved for each specific row that I am currently calculating? Thank you for any suggestions.
your exact problem is still a mistery to me...
but it looks like you want a double for loop
for(i in 1:nrow(thalweg)){
residual=thalweg[i,"Xdep_Vdep"]
for(j in 2:11){
residual=min(residual,thalweg[i,j])
}
}

Saving an output from R into excel format?

After running the predict function for glm i get an output in the below format:
1 2 3 4 5 6 7 8 9 10 11 12
3.954947e-01 8.938624e-01 7.775473e-01 1.294646e-02 3.954947e-01 9.625746e-01 9.144256e-01 4.739872e-01 1.443219e-01 1.180850e-04 2.138978e-01 7.775473e-01
13 14 15 16 17 18 19 20 21 22 23 24
5.425436e-03 2.069844e-04 2.723969e-01 4.739872e-01 9.144256e-01 1.091998e-01 2.070056e-02 5.114936e-01 1.443219e-01 5.922029e-01 7.578099e-02 8.937642e-01
25 26 27 28 29 30 31 32 33 34 35 36
6.069970e-02 6.069970e-02 1.337947e-01 1.090992e-01 4.841467e-02 9.205547e-01 3.954947e-01 3.874915e-05 3.855242e-02 1.344839e-01 6.318574e-04 2.723969e-01
37 38 39 40 41 42 43 44 45 46 47 48
7.400276e-04 8.593199e-01 6.666800e-01 2.069844e-04 8.161623e-01 4.916555e-05 3.060374e-02 3.402079e-01 2.256598e-03 9.363767e-01 6.116082e-01 3.940969e-03
49 50 51 52 53 54 55 56 57 58 59 60
7.336723e-01 2.425257e-02 3.369967e-03 5.624262e-02 1.090992e-01 1.357630e-06 1.278169e-04 3.046189e-01 8.938624e-01 4.535894e-01 5.132348e-01 3.220426e-01
61 62 63 64 65 66 67 68 69 70 71 72
3.366492e-03 1.357630e-06 1.014721e-01 1.294646e-02 9.144256e-01 1.636988e-02 2.070056e-02 1.012835e-01 5.000274e-03 8.165247e-02 1.357630e-06 8.033850e-03
IS there any code by which I can get the complete output vertically or in an excel format? Thank you in advance!
The simplest way is to write a character separated value file using a comma as the delimiter:
[Acknowledge Roland's comment] write.csv(data.frame(predict(yourGLM)), "file.csv")
Excel reads these automatically, especially if you save the file with a .csv extension.
If its just a matter of viewing it vertically first create the data:
# create test data
example(predict.glm)
pred <- predict(budworm.lg)
1) Separate R Window Use View to display it in a separate R window:
View(pred)
2) R Console to display it on the R console vertically:
data.frame(pred)
3) Browser to display it in the browser vertically:
library(R2HTML)
HTMLStart(); HTML(data.frame(pred)); w <- HTMLStop()
browseURL(w)
4) Excel to display it in Excel vertically using w we just computed:
shell(paste("start excel", w))

Barchart help in R

I am trying to set up a bar chart to compare control and experimental samples taken of specific compounds. The data set is known as 'hydrocarbon3' and contains the following information:
Exp. Contr.
c12 89 49
c17 79 30
c26 78 35
c42 63 3
pris 0.5 0.8
phy 0.5 0.9
nap 87 48
nap1 83 44
nap2 78 44
nap3 73 20
acen1 81 50
acen2 86 46
fluor 83 11
fluor1 68 13
fluor2 79 17
dibe 65 7
dibe1 67 6
dibe2 56 10
phen 82 13
phen1 70 12
phen2 65 15
phen3 53 14
fluro 62 9
pyren 48 11
pyren1 34 10
pyren2 19 8
chrys 22 3
chrys1 21 3
chrys2 21 3
When I create a bar chart with the formula:
barplot(as.matrix(hydrocarbon3),
main=c("Fig 1. Change in concentrations of different hydrocarbon compounds\nin sediments with and without the presence of bacteria after 21 days"),
beside=TRUE,
xlab="Oiled sediment samples collected at 21 days",
space=c(0,2),
ylab="% loss in concentration relative to day 0")
I receive this diagram, however I need the control and experimental samples of each chemical be next to each other allow a more accurate comparison, rather than the experimental samples bunched on the left and control samples bunched on the right: Is there a way to correct this on R?
Try transposing your matrix:
barplot(t(as.matrix(hydrocarbon3)), beside=T)
Basically, barplot will plot things in the order they show up in the matrix, which, since a matrix is just a vector wrapped colwise, means barplot will plot all the values of the first column, then all those of the second column, etc.
Check this question out: Barplot with 2 variables side by side
It uses ggplot2, so you'll have to use the following code before running it:
intall.packages("ggplot2")
library(ggplot2)
Hopefully this works for you. Plus it looks a little nicer with ggplot2!
> df
row exp con
1 a 1 2
2 b 2 3
3 c 3 4
> barplot(rbind(df$exp,df$con),
+ beside = TRUE,names.arg=df$row)
produces:

Resources