Creating a histogram with the number of occurrences at multiple places - r

I have a set of data that looks like that (just way bigger):
2 7
3 9
5 3
2 4
7 3
3 4
2 2
and I would like to produce a histogram with bars at 2 of height (7+4+2), so 13, at 3 of height 13, 5 at 3 and 7 at 3.
I hope the question is not too dumb, but the tutorials I found did not discuss this problem. Thanks for any help in advance.

DF <- read.table(text="2 7
3 9
5 3
2 4
7 3
3 4
2 2")
library(ggplot2)
ggplot(DF,aes(x=V1,y=V2)) +stat_summary(fun.y=sum,geom="bar")

If you want to get the aggregated sums out of the data and plot them later (the ggplot solution does it all) then, starting from DF:
> aggregate(V2~V1,data=DF,sum)
V1 V2
1 2 13
2 3 13
3 5 3
4 7 3

Other answers given here probably already answer your question, but for the sake of completeness, if you do not wish to depend on the ggplot package (I cannot really think of a reason for this, but you might) you could use a combination of aggregate and barplot.
> ADF <- aggregate(DF$V2, by = list(V1=DF$V1), FUN = sum)
> barplot(ADF$x, names.arg=ADF$V1)

Related

Change the order of Edges in Network Graph

is there anyway to change the order of the edges in a network graph,
using any of the igraph, visNetwork or even JS within R?
For example i would like a network to have all the arrows going to, from and to;from all in order,
however found nothing online to edit the way the order of the edges is produced,
any help appreciated?
Using igraph you could convert the graph into a data frame and then arrange it:
set.seed(4321)
g <- igraph::sample_gnp(10, .4) %>%
igraph::as.directed()
df <- igraph::as_data_frame(g)
dplyr::arrange(df, from)
This hsould give you something like:
from to
1 1 4
2 1 5
3 1 6
4 1 7
5 1 8
6 1 10
7 2 4
8 2 8
9 2 9
10 2 10

How to create the vector 2:10 and 10:2 using the seq function

I need to use seq() to create the vector (2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2), but I'm stuck. I've done quite a bit of YouTube-ing and reading online, but can't find a specific enough solution.
Any help is appreciated, but please use the seq function when making recommendations.
Using seq function, you need to following two steps:
Step-1 Generate sequence from 2 to 10 using following code:
a<-seq(from=2,to = 10)
Step-2 Generate sequence from 10 to 2 using following code:
b<-seq(from=9,to = 2)
Now, combine above two results using following code:
data<-c(a,b)
The output should as follow:
> data
[1] 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2
Hope it works for you!
Try this:
unlist(mapply(seq, c(2,9), c(10,2), c(1,-1), SIMPLIFY = FALSE))
# [1] 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2
According to help(":"),
[...] from:to is equivalent to seq(from, to), and generates a sequence from from to to in steps of 1 or -1.
This is the justification to offer
c(2:10, 9:2)
#[1] 2 3 4 5 6 7 8 9 10 9 8 7 6 5 4 3 2
as solution. Here, seq() is implicitely used but doesn't appear verbatim.

Extract data from data.frame based on coordinates in another data.frame

So here is what my problem is. I have a really big data.frame woth two columns, first one represents x coordinates (rows) and another one y coordinates (columns), for example:
x y
1 1
2 3
3 1
4 2
3 4
In another frame I have some data (numbers actually):
a b c d
8 7 8 1
1 2 3 4
5 4 7 8
7 8 9 7
1 5 2 3
I would like to add a third column in first data.frame with data from second data.frame based on coordinates from first data.frame. So the result should look like this:
x y z
1 1 8
2 3 3
3 1 5
4 2 8
3 4 8
Since my data.frames are really big the for loops are too slow. I think there is a way to do this with apply loop family, but I can't find how. Thanks in advance (and sorry for ugly message layout, this is my first post here and I don't know how to produce this nice layout with code and proper data.frames like in another questions).
This is a simple indexing question. No need in external packages or *apply loops, just do
df1$z <- df2[as.matrix(df1)]
df1
# x y z
# 1 1 1 8
# 2 2 3 3
# 3 3 1 5
# 4 4 2 8
# 5 3 4 8
A base R solution: (df1 and df2 are coordinates and numbers as data frames):
df1$z <- mapply(function(x,y) df2[x,y], df1$x, df1$y )
It works if the last y in the first data frame is corrected from 5 to 4.
I guess it was a typo since you don't have 5 columns in the second data drame.
Here's how I would do this.
First, use data.table for fast merging; then convert your data frames (I'll call them dt1 with coordinates and vals with values) to data.tables.
dt1<-data.table(dt)
vals<-data.table(vals)
Second, put vals into a new data.table with coordinates:
vals_dt<-data.table(x=rep(1:dim(vals)[1],dim(vals)[2]),
y=rep(1:dim(vals)[2],each=dim(vals)[1]),
z=matrix(vals,ncol=1)[,1],key=c("x","y"))
Now merge:
setkey(dt1,x,y)[vals_dt,z:=z]
You can also try the data.table package and update df1 by reference
library(data.table)
setDT(df1)[, z := df2[cbind(x, y)]][]
# x y z
# 1: 1 1 8
# 2: 2 3 3
# 3: 3 1 5
# 4: 4 2 8
# 5: 3 4 8

Merge data frames for Cohen's kappa

I'm trying to analyze some date using R but I'm not very familiar with R (yet) and therefore I'm totally stuck.
What I try to do is manipulate my input data so I can use it to calculate Cohen's Kappa.
Now the problem is, that for rater_1, I have several ratings for some of the items and I need to select one. If rater_1 has given the same rate on an item as rater_2, then this rating should be chosen, if not any rating of the list can be used.
I tried
unique(merge(rater_1, rater_2, all.x=TRUE))
which brings me close, but if the ratings between the two raters diverge, only one is kept.
So, my question is, how do I get from
item rating_1
1 3
2 5
3 4
item rating_2
1 2
1 3
2 4
2 1
2 2
3 4
3 2
to
item rating_1 rating_2
1 3 3
2 5 4
3 4 4
?
There are some fancy ways to do this, but I thought it might be helpful to combine a few basic techniques to accomplish this task. Usually, in your question, you should include some easy way to generate your data, like this:
# Create some sample data
set.seed(1)
id<-rep(1:50)
rater_1<-sample(1:5,50,replace=TRUE)
df1<-data.frame(id,rater_1)
id<-rep(1:50,each=2)
rater_2<-sample(1:5,100,replace=TRUE)
df2<-data.frame(id,rater_2)
Now, here is one simple technique for doing this.
# Merge together the data frames.
all.merged<-merge(df1,df2)
# id rater_1 rater_2
# 1 1 2 3
# 2 1 2 5
# 3 2 2 3
# 4 2 2 2
# 5 3 3 1
# 6 3 3 1
# Find the ones that are equal.
same.rating<-all.merged[all.merged$rater_2==all.merged$rater_1,]
# Consider id 44, sometimes they match twice.
# So remove duplicates.
same.rating<-same.rating[!duplicated(same.rating),]
# Find the ones that never matched.
not.same.rating<-all.merged[!(all.merged$id %in% same.rating$id),]
# Pick one. I chose to pick the maximum.
picked.rating<-aggregate(rater_2~id+rater_1,not.same.rating,max)
# Stick the two together.
result<-rbind(same.rating,picked.rating)
result<-result[order(result$id),] # Sort
# id rater_1 rater_2
# 27 1 2 5
# 4 2 2 2
# 33 3 3 1
# 44 4 5 3
# 281 5 2 4
# 11 6 5 5
A fancy way to do this would be like this:
same.or.random<-function(x) {
matched<-which.min(x$rater_1==x$rater_2)
if(length(matched)>0) x[matched,]
else x[sample(1:nrow(x),1),]
}
do.call(rbind,by(merge(df1,df2),id,same.or.random))

command for expanding data in r

I have some data:
Length(cm) Frequency
1 5
2 2
3 3
4 5
Is there a way to expand these numbers in R without typing them out manually, so I can work out the std error of the mean for length, so I have a dataset like:
1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
which I can then work on? Thanks
You can use rep.
> l <- 1:4
> f <- c(5,2,3,5)
> rep(l,f)
[1] 1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
In addition to using rep to replicate the observations you could also use the wtd.mean and wtd.var functions in the Hmisc package to compute the weighted summaries without expanding (this will be better if the expanded vector would take up a large portion of memory).
I recommend using a dataframe:
sd(rep(data$length, data$freq))

Resources