This question already has answers here:
Contingency table based on third variable (numeric)
(2 answers)
Closed 4 years ago.
I have made an edit after realising my code was insufficient in order to explain to problem - appologies.
I have a data frame including four columns
purchaseId <- c("abc","xyz","def","ghi")
product <- c("a","b","c","a")
quantity <- c(1,2,2,1)
revenue <- c(500,1000,300,500)
t <- data.frame(purchaseId,product, quantity, revenue)
table(t$product,t$quantity)
Running this query
table(t$product,t$quantity)
returns a table indicating how many times each combination occurs
1 2
a 2 0
b 0 1
c 0 1
What I would like to do is plot both product and quantity as rows and columns (as shown above) but with the revenue as an actual value.
The result should look like this:
1 2
a 1000 0
b 0 1000
c 300 0
This would allow me to create a table that I could export as a csv.
Could anyone help me any further?
edit - the code suggested below throws the following error on the actual data set of 140K rows:
Error: dims [product 21525] do not match the length of object [147805]
Other ideas?
Of course the example code above is a simplified version of the actual data I'm using, but the idea is the same.
Thank you advance,
Kind regards.
table(t$product,t$quantity)*t$revenue
Using library(reshape2) or library(data.table)
dcast(t,product ~ quantity, value.var = "revenue", fun = sum)
it is fairly simple syntax:
Set the data frame you are recasting
Set the "formula" of the resulting data frame. LHS of ~ is the row-wise pivot, RHS is the column-wise.
value.var tells you what column we want to place in the cells, and using fun we want to aggregate with the sum function
As you mentioned in your comments familiarity with Excel Pivot tables, its worth noting that dcast is a fairly comprehensive replacement, with additional flexibility.
Related
This question already has answers here:
how to change gender factor into an numerical coding in r
(2 answers)
Closed 1 year ago.
Essentially I have a table with different columns, of interest in this case is gender
Gender
Male
Female
I'd like to create a new column called gender_num that sets all male to 0 and all female to 1. I tried to use if df['Gender] == 'Male' , 0 , else, 1 type deal but r doesn't like that with strings that have more than 1 value. I know that you can use dyplr and the mutate function but I'm very confused. How could you get the df to look something like this via generating a new column.
Gender
Gender_num
Male
0
Female
1
From one new user to another, Try adding your code to generate a sample table next time so people can work off work you've already done. Also, you might get down-voted for lack of research as this is a common sub-chapter in many intro texts. You can see an example chapter here.
Aside from that, lets say you have 10 observations of male female.
df <- tibble::as.tibble(x= rep(1:10))
gendf <-df %>% mutate(gender=sample(x=c("male","female"), 10, replace=TRUE))
You can then run a mutate to add a your categorical numeric variable.
gendf <- gendf %>% mutate(gender_dummy=if_else(gender=="female",1,0))
Note: since the original character variable has two values, me using if_else() is the simplest way.
But you can use recode() too as you can add as many values as you like.
gendf <- gendf %>% mutate(gender_dummy2= recode(gender,"female" = 1, "male"=0))
You should get this resulting table
From there I would add value labels and call it a day.
This question already has answers here:
How to count the frequency of a string for each row in R
(4 answers)
Closed 4 years ago.
I have a data frame with 70variables, I want to create a new variable which counts the number of occurrences where the 70 variables take the value "mq" on a per row basis.
I am looking for something like this:
[ID] [Var1] [Var2] [Count_mq]
1. mq mq 2
2. 1 mq 1
3. 1 7 0
I have found this solution:
count_row_if("mq",DT)
But it gives me a vector with those values for the whole data frame and it is quite slow to compute.
I would like to find a solution using the function apply() but I don't know how to achieve this.
Best.
You can use the 'apply' function to count a particular value in your existing dataframe 'df',
df$count.MQ <- apply(df, 1, function(x) length(which(x=="mq")))
Here the second argument is 1 since you want to count for each row. You can read more about it from https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/apply
I assume the name of dataset is DT. I'm a bit confused what you really want to get but this is how I understand. Data frame consists of 70 columns and a number of rows that some of them have observations 'mq'.
If I get it right, please see the code below.
apply(DT, function(x) length(filter(DT,value=='mq')), MARGIN=1)
This question already has answers here:
How to subset matrix to one column, maintain matrix data type, maintain row/column names?
(1 answer)
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 5 years ago.
I am running a simulation model that creates a large data frame as its output, with each column corresponding to the time-series of a particular variable:
data5<-as.data.frame(simulation3$baseline)
Occasionally I want to look at subsets, especially particular columns, of this data frame in order to get an idea of the output. For this I am using the View-function like so
View(data5[1:100,1])
for instance, if I wish to see the first 100 rows of column 1. Alternatively, I also sometimes do something like this, using the names of the time series:
timeframe=1:100
toAnalyse=c("u","u_n","u_e","u_nw")
View(data5[timeframe,toAnalyse])
In either case, there is an annoying display problem when I am trying to view a single column on its own (as for instance with View(data5[1:100,1])), whereby what I get looks like this:
Example 1
As you can see, the top of the table which would usually contain the name of the variable in the dataset instead contains a string of all values that the variable takes. This problem does not appear if I select 2 or more columns:
Example 2
Does anyone know how to get rid of this issue? Is there some argument that I can feed to View to make sure that it behaves nicely when I ask it to just show a single column?
View(data5[1:100,1, drop=FALSE])
When you access a single column of a data frame it is converted to a vector, drop=FALSE prevents that and retains the column name.
For instance:
> df
n s b
1 2 aa TRUE
2 3 bb TRUE
3 5 cc TRUE
> df[, 1]
[1] 2 3 5
> df[, 1, drop=FALSE]
n
1 2
2 3
3 5
This question already has an answer here:
Apply function conditionally
(1 answer)
Closed 8 years ago.
I would like to sum equal values in a given data set. Unfortunately I do not really have a clue where to begin with, especially which function to use.
Lets say I have a data frame like this
count<- c(1,4,7,3,7,9,3,4,2,8)
clone<- c("aaa","aaa","aaa","bbb","aaa","aaa","aaa","bbb","ccc","aaa")
a<- c("d","e","k","v","o","s","a","y","q","f")
b<- c("g","e","j","v","i","q","a","x","l","p")
test<-data.frame(count,clone,a,b)
Problem is that there are lots of repetitive single values wich need to be combined in one (all the "aaa" and the two "bbb").
So I would like to aggregate(?) all equal values in column "clone", summing up the "count" values taking the value for "a" and "b" from the clone with the highest count.
My final data set should look like:
count<- c(39,7,2)
clone<- c("aaa","bbb","ccc")
a<- c("s","y","q")
b<- c("q","x","l")
test<-data.frame(count,clone)
Do you have a suggestion which function I could use for that? Thanks alot in advance.
EDIT: Sorry, I was too tired and forgot to put in the "a" and "b" cloumn, which makes quite a difference since aggregating just after clone and count drops these two columns with essential information, I need in my final data set.
Use aggregate
> aggregate(count~clone, FUN=sum, data=test)
clone count
1 aaa 39
2 bbb 7
3 ccc 2
Also see this answer for further alternatives.
This can be handled with tapply:
tapply(count, clone, sum)
# aaa bbb ccc
# 39 7 2
You can also do this with ddply from plyr
library(plyr)
ddply(test,.(clone),function(x) sum(x$count))
A dplyr solution:
library('dplyr')
summarize(group_by(test, clone), count = sum(count))
I hope you won't find my question too silly, i did a lot of research but it seems that i can't figure how to solve this really annoying issue.
Well, i have datas for 6 participants (P) in an experiment, with 50 trials (T) per participants and 10 condition (C). So i'd like to create a dataframe in r allowing me to put these datas.
This data.frame should have 3 factors (P, T and C) and so a number of total row of (P*T*C). The difficulty for me is to create this one, since i have the datas for the 6 participant in 6 data.frame of 100 obs(T) by 10 varibles(C).
I'd like first to create the empty dataset with these factors, and then copy the values of the 6 data.set according to the factors P, T and C.
Any help would be greatly appreciated, i'm novice in r.
Thank you.
OK; First we create one big dataframe for all participants:
result<-rbind(dfrforparticipant1, dfrforparticipant2,...dfrforparticipant6) #you'll have to fill out the proper names of the original data.frames
Next, we add a column for the participant ID:
numTrials<-50 #although 100 is also mentioned in your question
result$P<-as.factor(rep(1:6, each=numTrials))
Finally, we need to go from 'wide' format to 'long' format (I'm assuming your column names holding the results for each condition are called C1, C2 etc. ; I'm also assuming your original data.frames already held a column named T to denote the trial), like this (untested, since you did not provide example data):
orgcolnames<-paste("C", 1:10, sep="")
result2<-reshape(result, varying=list(orgcolnames), v.names="val", idvar=c("T","P"), timevar="C", times=seq_along(orgcolnames), direction="long")
What you want is now in result2.