I have a list object as shown below ->
> myaggregate
input$AgeAndGender input$CTR
1 Female_<18 0.030041698
2 Female_18-24 0.010918938
3 Female_25-34 0.009839806
4 Female_35-44 0.010193773
5 Female_45-54 0.009996056
6 Female_55-64 0.020024678
7 Female_65+ 0.030060728
8 Male_<18 0.028356698
9 Male_18-24 0.011031902
10 Male_25-34 0.010218562
11 Male_35-44 0.010168911
12 Male_45-54 0.010021256
13 Male_55-64 0.020191223
14 Male_65+ 0.029717747
Im trying to plot a bargraph representing the CTR levels(Y axis) for each value in AgeAndGender(X axis).
When I attempt a simple plot however I run into the following issue ->
> ggplot(data= myaggregate,aes(x=input$AgeAndGender,y=input$CTR))+geom_bar()
Error in data.frame(x = c("Male_35-44", "Female_65+", "Male_25-34", "Female_45-54", :
arguments imply differing number of rows: 3378934, 14
I'm sure I'm missing something pretty basic. Any help is appreciated!
If you are just wanting to plot the values, then you need stat="identity" like in the following example:
library(ggplot2)
AgeAndGender <- c("f1","f2","f3")
CTR <- c(.1,.15,.12)
myaggregate <- data.frame(AgeAndGender, CTR)
ggplot(data= myaggregate,aes(x=AgeAndGender, y=CTR)) + geom_bar(stat = "identity")
Which results in the following:
Looking at your comment about your data being in a list concerns me. Try making myaggregate a dataframe.
I was able to plot with something like what you are using but it's a rather weird construction. Dataframes do not generally have dollar-signs in there name because $ is an infix function in R. I read in the data with read.table and the dollar-signs get converted to periods. I put back the column names as you have them with:
names(myaggregate) <- c('input$AgeAndGender', 'input$CTR')
And then you can get a rather messy barplot with:
ggplot(data= myaggregate,aes(x=`input$AgeAndGender`,y=`input$CTR`))+ geom_bar(stat = "identity")
When you just put your code in, the unquoted names get interpreted as x being the "AgeAndGender"-clumn in the input dataframe. If you only use ordinary quotes rather than backticks you do not succeed.
Related
I have a dataframe called Insectsprays that has two columns, count and spray. When I try to use split to create boxplots for each value of spray, I get the error shown below.
Can anyone explain the error for me? It's no doubt clear I'm new to R.
#
class(InsectSprays)
[1] "data.frame"
#
head(InsectSprays)
count spray
1 10 A
2 7 A
3 20 A
4 14 A
5 14 A
6 12 A
#
boxplot(split(x=InsectSprays,f=InsectSprays$spray))
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) :
'x' must be atomic
boxplot expects a basic ('atomic') object like a series of numbers 1:10 or a list of basic atomic objects list(1:10,2:11). Your split produces a list of data.frames which boxplot doesn't know how to handle. Luckily, boxplot can also take a formula if you want to get results per group, like:
boxplot(count ~ spray, data=InsectSprays)
If you were working with a different function that didn't have this possibility, you would need to loop over the split list. Possibly something like:
## divide the plot window into 3 columns/2 rows
par(mfrow=c(2,3))
## loop over each object and `boxplot` the `count` column
lapply(split(InsectSprays, InsectSprays$spray), \(x) boxplot(x$count) )
I Have a dataframe named mydata. Here's a sample of the relevant columns:
Backlog.Item.Type Item.Created.To.Closed.Days Item.Created.To.Finished.Days
User Story 67 84
Task 14 17
Task 9 10
Epic 105 NA
User Story 56 59
Bug 5 NA
Now, what I want to accomplish is the following: I want to take the mean for the Item.Created.To.Closed.Days column as well as for the Item.Created.To.Finished.Days column, grouped by Backlog.Item.Type, and then plot both next to eachother. To calculate the mean I use, which works:
mydata %>%
group_by(Backlog.Item.Type) %>%
summarise_at(vars(Item.Created.to.Closed.Days),
funs(mean(Item.Created.to.Closed.Days, na.rm = TRUE)))
For the plotting part, I have tried something like
mydata.long <- melt(mydata)
ggplot(mydata.long,
aes(Backlog.Item.Type, value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge")
But I can't seem to get it to work. I should also note that I only want to plot the means for Backlog.Item.Type == 'User Story' and Backlog.Item.Type == 'Task' for both columns. Represented visually, this is what I want to accomplish:
Please excuses my horrible paint skills! I don't have any preference for colors or whatnot, I just need to get it done :D Thanks in advance, I hope I have been clear enough and formulated my question in a understandable manner!
assuming that the graph you provided includes your whole data set and therefore should not be corresponding to the sample data you provided here is what you can do:
mydata=mydata %>% group_by(Backlog.Item.Type) %>% summarise(Item.Created.To.Closed.Days=
mean(Item.Created.To.Closed.Days,na.rm=T),
Item.Created.To.Finished=mean(Item.Created.To.Finished,na.rm=T))
mydata=mydata[complete.cases(mydata),]%>%melt()
ggplot(mydata,aes(x=Backlog.Item.Type,y=value,fill=variable))+geom_bar(stat = "identity", position = "dodge")
I am trying to make a line graph in ggplot and I am having difficulty diagnosing my error. I've read through nearly all the similar threads, but have been unable to solve my issue.
I am trying to plot Japanese CPI. I downloaded the data online from FRED.
my str looks like:
str(jpycpi)
data.frame: 179 obs. of 2 variables:
$ DATE : Factor w/ 179 levels "2000-04-01","2000-05-01",..: 1 2 3 4 5 6 7 8 9 10 ...
$ JPNCPIALLMINMEI: num 103 103 103 102 103 ...
My code to plot:
ggplot(jpycpi, aes(x=jpycpi$DATE, y=jpycpi$JPNCPIALLMINMEI)) + geom_line()
it gives me an error saying:
geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
I have tried the following and have been able to plot it, but the graph x bar is distorted for some odd reason. That code is below:
ggplot(jpycpi, aes(x=jpycpi$DATE, y=jpycpi$JPNCPIALLMINMEI, group=1)) + geom_line()
The "Each group consists of only one observation" error message happens because your x aesthetic is a factor. ggplot takes that to mean that your independent variable is categorical, which doesn't make sense in conjunction with geom_line.
In this case, the right way to fix it is to convert that column of the data to a Date vector. ggplot understands how to use all of R's date/time classes as the x aesthetic.
Converting from a factor to a Date is a little tricky. A direct conversion,
jpycpi$DATE <- as.Date(jpycpi$DATE)
works in R version 3.3.1, but, if I remember correctly, would give nonsense results in older versions of the interpreter, because as.Date would look only at the ordinals of the factor levels, not at their labels. Instead, one should write
jpycpi$DATE <- as.Date(as.character(jpycpi$DATE))
Conversion from a factor to a character vector does look at the labels, so the subsequent conversion to a Date object will do the Right Thing.
You probably got a factor for $DATE in the first place because you used read.table or read.csv to load up the data set. The default behavior of these functions is to attempt to convert each column to a numeric vector, and failing that, to convert it to a factor. (See ?type.convert for the exact behavior.) If you're going to be importing lots of data with date columns, it's worth learning how to use the colClasses argument to read.table; this is more efficient and doesn't have gotchas like the above.
I have a list of words coming straight from file, one per line, that I import with read.csv which produces a data.frame. What I need to do is to compute and plot the numbers of occurences of each of these words. That, I can do easily, but the problem is that I have several hundreds of words, most of which occur just once or twice in the list, so I'm not interested in them.
EDIT https://gist.github.com/anonymous/404a321840936bf15dd2#file-wordlist-csv here is a sample wordlist that you can use to try. It isn't the same I used, I can't share that as it's actual data from actual experiments and I'm not allowed to share it. For all intents and purposes, this list is comparable.
A "simple"
df <- data.frame(table(words$word))
df[df$Freq > 2, ]
does the trick, I now have a list of the words that occur more than twice, as well as a hard headache as to why I have to go from a data.frame to an array and back to a data.frame just to do that, let alone the fact that I have to repeat the name of the data.frame in the actual selection string. Beats me completely.
The problem is that now the filtered data.frame is useless for charting. Suppose this is what I get after filtering
Var1 Freq
6 aspect 3
24 colour 7
41 differ 18
55 featur 7
58 function 19
81 look 4
82 make 3
85 mean 7
95 opposit 14
108 properti 3
109 purpos 6
112 relat 3
116 rhythm 4
118 shape 6
120 similar 5
123 sound 3
obviously if I just do a
plot(df[df$Freq > 2, ])
I get this
which obviously (obviously?) has all the original terms on the x axis, while the y axis only shows the filtered values. So the next logical step is to try and force R's hand
plot(x=df[df$Freq > 2, ]$Var1, y=df[df$Freq > 2, ]$Freq)
But clearly R knows best and already did that, because I get the exact same result. Using ggplot2 things get a little better
qplot(x=df[df$Freq > 2, ]$Var1, y=df[df$Freq > 2, ]$Freq)
(yay for consistency) but I'd like that to show an actual histograms, y'know, with bars, like the ones they teach in sixth grade, so if I ask that
qplot(x=df[df$Freq > 2, ]$Var1, y=df[df$Freq > 2, ]$Freq) + geom_bar()
I get
Error : Mapping a variable to y and also using stat="bin".
With stat="bin", it will attempt to set the y value to the count of cases in each group.
This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
If you want y to represent values in the data, use stat="identity".
See ?geom_bar for examples. (Defunct; last used in version 0.9.2)
so let us try the last suggestion, shall we?
qplot(df[df$Freq > 2, ]$Var1, stat='identity') + geom_bar()
fair enough, but there are my bars? So, back to basics
qplot(words$word) + geom_bar() # even if geom_bar() is probably unnecessary this time
gives me this
Am I crazy or [substitute a long list of ramblings and complaints about R]?
I generate some random data
set.seed(1)
df <- data.frame(Var1 = letters, Freq = sample(1: 8, 26, T))
Then I use dplyr::filter because it is very fast and easy.
library(ggplot2); library(dplyr)
qplot(data = filter(df, Freq > 2), Var1, Freq, geom= "bar", stat = "identity")
First of all, at least with plot(), there.s no reason to force a data.frame. plot() understands table objects. You can do
plot(table(words$words))
# or
plot(table(words$words), type="p")
# or
barplot(table(words$words))
We can use Filter to filter rows, unfortunately that drops the table class. But we can add that back on with as.table. This looks like
plot(as.table(Filter(function(x) x>2, table(words$words))), type="p")
I have data as follows in .csv format as I am new to ggplot2 graphs I am not able to do this
T L
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
I tried to plot a line graph using following graph
data<-read.csv("sample.csv",head=TRUE,sep=",")
ggplot(data,aes(T,L))+geom_line()]
but I got following image it is not I want
I want following image as follows
Can anybody help me?
You want to use a variable for the x-axis that has lots of duplicated values and expect the software to guess that the order you want those points plotted is given by the order they appear in the data set. This also means the values of the variable for the x-axis no longer correspond to the actual coordinates in the coordinate system you're plotting in, i.e., you want to map a value of "L=1" to different locations on the x-axis depending on where it appears in your data.
This type of fairly non-sensical thing does not work in ggplot2 out of the box. You have to define a separate variable that has a proper mapping to values on the x-axis ("id" in the code below) and then overwrite the labels with the values for "L".
The coe below shows you how to do this, but it seems like a different graphical display would probbaly be better suited for this kind of data.
data <- as.data.frame(matrix(scan(text="
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
"), ncol=2, byrow=TRUE))
names(data) <- c("T", "L")
data$id <- 1:nrow(data)
ggplot(data,aes(x=id, y=T))+geom_line() + xlab("L") +
scale_x_continuous(breaks=data$id, labels=data$L)
You have an error in your code, try this:
ggplot(data,aes(x=L, y=T))+geom_line()
Default arguments for aes are:
aes(x, y, ...)