I can not seem to figure out how to get a nice barplot that contains the data from two tables that contain a different number of columns.
The tables in question are something like (snipped some data from the end):
> tab1
1 2 3 6 8 31
5872 1525 831 521 299 4
> tab2
1 2 3 4 22
7874 422 2 5 1
Note the column names and sizes are different. When I just do barplot() on one of these tables it comes out with the plot I'd like (showing the column names as the X-axis, frequencies on Y-axis). But, I would like these two side by side.
I've gotten as far as creating a data frame containing both variables as comments and the different row names in the first column (with data.frame()and merge()), but when I plot this the X-axis seems to be all wrong. Attempting to reorder the columns gives me an exception about lengths differing.
Code:
combined <- merge(data.frame(tab1), data.frame(tab2), by = c('Var1'), all=T)
barplot(t(combined[,2:3]), names.arg = combined[,1], beside=T)
This shows a plot, but not all labels are present and the value for position 26 is plotted after 33.
Is there any simple way to get this plot working? A ggplot2 solution would be nice.
You can put all your data in one data frame (as in example).
df<-data.frame(group=rep(c("A","B"),times=c(2,3)),
values=c(23,56,345,6,7),xval=c(1,2,1,2,8))
group values xval
1 A 23 1
2 A 56 2
3 B 345 1
4 B 6 2
5 B 7 8
Then ggplot() with geom_bar() can be used to plot the data.
ggplot(df,aes(xval,values,fill=group))+
geom_bar(stat="identity",position="dodge")
Related
I would like to add a column to my dataframe that contains categorical data based on numbers in another column. I found a similar question at Create categorical variable in R based on range, but the solution provided there didn't provide the solution that I need. Basically, I need a result like this:
x group
3 0-5
4 0-5
6 6-10
12 > 10
The solutions suggested using cut() and shingle(), and while those are useful for dividing the data based on ranges, they do not create the new categorical column that I need.
I have also tried using something like (please don't laugh)
data$group <- "0-5"==data[data$x>0 & data$x<5, ]
but that of course didn't work. Does anyone know how I might do this correctly?
Why didn't cut work? Did you not assign to a new column or something?
> data=data.frame(x=c(3,4,6,12))
> data$group = cut(data$x,c(0,5,10,15))
> data
x group
1 3 (0,5]
2 4 (0,5]
3 6 (5,10]
4 12 (10,15]
What you've created there is a factor object in a column of your data frame. The text displayed is the levels of the factor, and you can change them by assignment:
levels(data$group) = c("0-5","6-10",">10")
data
x group
1 3 0-5
2 4 0-5
3 6 6-10
4 12 >10
Read some basic R docs on factors and you'll get it.
I'm trying to create stacked barplots using a large dataset (80,000 lines).
POS C1 C2 SELF
1 1 0.546982 0.256896 0.196122
2 2 0.628456 0.263229 0.108315
3 3 0.652629 0.256041 0.091330
4 4 0.562783 0.264318 0.172898
5 5 0.562783 0.264318 0.172898
6 6 0.180272 0.571032 0.248696
80.000
I've imported my csv file using read.csv and convert in a matrix using as.matrix(). I tried to plot using barplot using:
barplot(as.matrix(C3_GO2_06))
But I got a plot like that:
I need each row in POS to be a sample...i.e the column POS should be the x axes, and values associated with C1, C2, SELF stacked... but I don't know how to do that. Is necessary to convert to data frame?
I am a novice R user, hence the question. I refer to the solution on creating stacked barplots from R programming: creating a stacked bar graph, with variable colors for each stacked bar.
My issue is slightly different. I have 4 column data. The last column is the summed total of the first 3 column. I want to plot bar charts with the following information 1) the summed total value (ie 4th column), 2) each bar is split by the relative contributions of each of the three column.
I was hoping someone could help.
Regards,
Bernard
If I understood it rightly, this may do the trick
the following code works well for the example df dataframe
df <- a b c sum
1 9 8 18
3 6 2 11
1 5 4 10
23 4 5 32
5 12 3 20
2 24 1 27
1 2 4 7
As you don't want to plot a counter of variables, but the actual value in your dataframe, you need to use the goem_bar(stat="identity") method on ggplot2. Some data manipulation is necessary too. And you don't need a sum column, ggplot does the sum for you.
df <- df[,-ncol(df)] #drop the last column (assumed to be the sum one)
df$event <- seq.int(nrow(df)) #create a column to indicate which values happaned on the same column for each variable
df <- melt(df, id='event') #reshape dataframe to make it readable to gpglot
px = ggplot(df, aes(x = event, y = value, fill = variable)) + geom_bar(stat = "identity")
print (px)
this code generates the plot bellow
I am trying to plot multiple lines using ggplot2. My data is fitted into a data frame as follow:
> rs
time 1 2 3 4
1 200 17230622635 17280401147 17296993985 17313586822
2 400 22328386154 22456712709 22499488227 22542263745
3 600 28958840968 29186097622 29261849840 29337602058
4 800 40251281810 40650094691 40783032318 40915969945
5 1000 73705771414 74612829244 74915181854 75217534464
I would like to use the "time" column as the x value. Other columns are y values of points in different lines. In the data above, there are 4 lines, each line consists of 5 points. More specifically, the first line has points (200, 17230622635), (400, 22328386154), (600, 28958840968), etc. The second line has points (200, 17280401147), (400, 22456712709), etc. (If you need further explanation of the data format, see P.S. in the end.)
To generate a similar data, you could use the following code:
rs = data.frame(seq(200, 1000, by=200), runif(5), runif(5), runif(5))
names(rs)=c("time", 1:3)
I followed some examples on stack overflow and tried to use reshape2 and ggplot2 to do this plot:
I first melt the data into a "long-format":
library('reshape2')
library('ggplot2')
melted = melt(rs, id.vars="time")
Then plot the data using the following statment:
ggplot() + geom_line(data=melted, aes(x="time", y="value", group="variable"))
However, I got an empty graph which has no point nor line.
Can anyone help me to see what's wrong with my procedure?
P.S.
About the data format:
You can imagine there are many students in the class and we have their scores of several quizzes. Each row contains one student's data: first column is the quiz number, then the rest of columns are his/her scores. For each student, we want to plot a line to reflect how his/her scores change over different quizzes, each point is the score of one quiz for a certain students. Since there are multiple students, we would like to draw multiple lines.
About the melted data:
Specific to the data I show above, the data I got from the melt() function is:
> melted
time variable value
1 200 1 17230622635
2 400 1 22328386154
3 600 1 28958840968
4 800 1 40251281810
5 1000 1 73705771414
6 200 2 17280401147
7 400 2 22456712709
8 600 2 29186097622
9 800 2 40650094691
10 1000 2 74612829244
11 200 3 17296993985
12 400 3 22499488227
13 600 3 29261849840
14 800 3 40783032318
15 1000 3 74915181854
16 200 4 17313586822
17 400 4 22542263745
18 600 4 29337602058
19 800 4 40915969945
20 1000 4 75217534464
Drop the quotes:
ggplot(data=melted, aes(x=time, y=value, group=variable)) + geom_line()
see: ggplot aesthetics
Another option is to use aes_string.
I'm in need of assistance... I'm using R to analyze some data... I have a frequency table called mytable... that I created like this:
mytable=table(cut(var1,12),cut(var2,12))
the table looks something like this:
1-2 2-3 3-4
1-3 2 1 2
3-6 0 1 4
6-9 7 1 8
except is a 12 by 12 table.
I used boxplot.matrix(mytable),the boxplot looks ok... with the 12 boxes corresponding to my 12 stratums, but my boxplot has the frequency as the y-axis and I want the y-axis to be the values from var1, how can I do this?
I wanted to post a pic... but my rep wasnt high enough
use boxplot before you summarize your data.
boxplot(var1)
If you want to see the distribution per split, use the formula format:
boxplot(var1 ~ cut(var2, 12))