Creating many line plots using positions as function as time

Creating many line plots using positions as function as time - r

I am trying to make a single plot of the trajectory of many particles from a Brownian Motion experiment.
There are five measurements for each particle, a total of 10, for the x and y components of position.
I have the data in multiple data structures, as I am unaware of which is most useful for the end I aim to achieve.
1. All within a single data frame, with my 5 time measurements in x for the 16 particles measured, followed by the 16 for the y component.
Single data frame
In two separate dataframes, one for the x-component and one for the y.
I have tried to use rbind to create a single array that I can use geom_line() but this means I have one single line where each particle trajectory is connected to one another.
How could I go about making these different lines, all within one x-y plane. Thanks

The easiest way to achieve this is to have 3 columns, one for the common x component, one for the y, and one for the particle. To get this you'll need to convert your data to long format:
> df <- data.frame(t=c(1,2,3,4,5), x.1 = c(-1,1,3,4,5), x.2 = c(5,2,1,4,6))
> df
t x.1 x.2
1 1 -1 5
2 2 1 2
3 3 3 1
4 4 4 4
5 5 5 6
> (df <- tidyr::gather(df, "particle", "y", -t))
t particle y
1 1 x.1 -1
2 2 x.1 1
3 3 x.1 3
4 4 x.1 4
5 5 x.1 5
6 1 x.2 5
7 2 x.2 2
8 3 x.2 1
9 4 x.2 4
10 5 x.2 6
Then, use the group parameter to geom_line to plot them separately:
ggplot(df, aes(x = t, y = y)) + geom_line(aes(group = particle, color = particle))

First you have to have your data in this format
data <- data.table(particle = as.factor(rep(1:3, each = 5)),
x = sample(-10:10, 15, replace = TRUE),
y = sample(-10:10, 15, replace = TRUE))
data
particle x y
1: 1 -8 -4
2: 1 -5 -2
3: 1 -1 -5
4: 1 -3 9
5: 1 4 -7
6: 2 2 1
7: 2 -8 -10
8: 2 -4 -8
9: 2 -6 -4
10: 2 -8 -3
11: 3 -10 10
12: 3 6 -5
13: 3 -5 -6
14: 3 -6 8
15: 3 1 -4
One column for identifying the particle and the other for the position in coordinates.
This link might help you changing your data: http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
Then just plot grouping by particle (using color aes)
ggplot(data = data,
aes(x = x, y = y, color = particle)) +
geom_path(size = 3)
If you want to change the order of the path, just add a column of time and sort the df by that column.

Related

How do you randomly assign data into equal sized control and treatment groups in R?

set.seed(31)
resample(1:534, 90, replace = FALSE)
df.orig <- read.csv("project1data.csv")
df.groups <- filter(df.orig, participate == "y")
str(df.groups)
I have randomly selected 90 house numbers from 534 and entered whether or not they were willing to participate in the study into an excel sheet and then I filtered out the people who did not want to participate in the study. How do I now randomly assign the participants into two equally sized groups (control and treatment)

You haven't provided data or code that runs so I'll generate some code to show the idea
set.seed(31)
# Create dataset with three variables
# Participate are the ones that we wish to include in the study.
# You have those in your excel file.
fakedata <- data.frame(houseid=1:534,
size=rbinom(534, size=5, prob=.5),
participate=sample(c("y", "n"), size=534, replace=TRUE))
which produces
head(fakedata)
houseid size participate
1 1 3 y
2 2 4 n
3 3 2 n
4 4 2 y
5 5 4 y
6 6 2 n
Now we can use tidyverse to generate a random permutation of cases/controls. First we create a vector of the correct length (using rep with length) and then we shuffle them using sample.
library("tidyverse")
fakedata %>% # Take data
filter(participate=="y") %>%
mutate(group=sample(rep(c("Case", "Ctrl"), length=n())))
This gives
houseid size participate group
1 1 3 y Case
2 4 2 y Case
3 5 4 y Ctrl
4 7 4 y Case
5 8 1 y Case
6 9 4 y Ctrl
7 13 3 y Case
8 16 1 y Ctrl
.
.
.

Using Diff() in R for multiple columns

I would like to calculate the first order difference for many columns in a data frame without naming them explicitly. It works well with one column with this code:
set.seed(1)
Data <- data.frame(
X = sample(1:10),
Y = sample(1:10),
Z = sample(1:10))
Newdata <- as.data.frame(diff(Data$X, lag = 1))
How to I calculate the same for a lot of columns, e.g.[2:200], in a data frame?

I think this does what you want:
as.data.frame(lapply(Data, diff, lag=1))
## X Y Z
## 1 1 -1 -8
## 2 1 4 4
## 3 2 4 -5
## 4 -5 -5 8
## 5 6 2 -1
## 6 1 1 -1
## 7 -3 -4 -2
## 8 4 -3 -2
## 9 -9 8 1
Since data frames are internally lists, we can lapply over the columns. You can use Data[1:2] instead of Data to just do the first two columns, or any valid column indexing.

Custom Axis range with persp in R

I am trying to plot 3D graphs, but I need custom Range for that. For example X axis has values [1,2,3,4,5,6,7]
for Y and Z, both of them have [1,2,3,4,5,6,7,8,9,10]
The idea is that I want to test if X influences Y and Z. Y and Z are dependent variables.
I tried the following code, and the result is below:
persp(mat, col = heat.colors(20) ,phi = 30, theta = -30, scale = TRUE)
mat is matrix for the following format..
V1 V2 V3
1 1 1.000000
2 1 1.709133
4 1 3.278188
8 1 5.082078
16 1 5.753403
32 1 5.778228
64 1 5.783567
1 2 1.000000
2 2 1.789333
4 2 3.478188
8 2 5.182078
16 2 5.853403
32 2 5.877228
64 2 5.908357
...... V2 will have same format till 10
But I still couldn't custom ranges for X, Y and Z with the required ranges. Any idea how to custom it or if there is any other ways in R ?

How to get member of clusters from R's hclust/heatmap.2

I have the following code that perform hiearchical
clustering and plot them in heatmap.
library(gplots)
set.seed(538)
# generate data
y <- matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# the actual data is much larger that the above
# perform hiearchical clustering and plot heatmap
test <- heatmap.2(y)
Which plot this:
What I want to do is to get the cluster member from each hierarchy of in the plot
yielding:
Clust 1: g3-g2-g4
Clust 2: g2-g4
Clust 3: g4-g7
etc
Cluster last: g1-g2-g3-g4-g5-g6-g7-g8-g9-g10
Is there a way to do it?

I did have the answer, after all! #zkurtz identified the problem ... the data I was using were different than the data you were using. I added a set.seed(538) statement to your code to stabilize the data.
Use this code to create a matrix of cluster membership for the dendrogram of the rows using the following code:
cutree(as.hclust(test$rowDendrogram), 1:dim(y)[1])
This will give you:
1 2 3 4 5 6 7 8 9 10
g1 1 1 1 1 1 1 1 1 1 1
g2 1 2 2 2 2 2 2 2 2 2
g3 1 2 2 3 3 3 3 3 3 3
g4 1 2 2 2 2 2 2 2 2 4
g5 1 1 1 1 1 1 1 4 4 5
g6 1 2 3 4 4 4 4 5 5 6
g7 1 2 2 2 2 5 5 6 6 7
g8 1 2 3 4 5 6 6 7 7 8
g9 1 2 3 4 4 4 7 8 8 9
g10 1 2 3 4 5 6 6 7 9 10

This solution requires computing the cluster structure using a different packags:
# Generate data
y = matrix(rnorm(50), 10, 5, dimnames=list(paste("g", 1:10, sep=""), paste("t", 1:5, sep="")))
# The new packags:
library(nnclust)
# Create the links between all pairs of points with
# squared euclidean distance less than threshold
links = nncluster(y, threshold = 2, fill = 1, give.up =1)
# Assign a cluster number to each point
clusters=clusterMember(links, outlier = FALSE)
# Display the points that are "alone" in their own cluster:
nas = which(is.na(clusters))
print(rownames(y)[nas])
clusters = clusters[-nas]
# For each cluster (with at least two points), display the included points
for(i in 1:max(clusters, na.rm = TRUE)) print(rownames(y)[clusters == i])
Obviously you would want to revise this into a function of some kind to be more user friendly. In particular, this gives the clusters at only one level of the dendrogram. To get the clusters at other levels, you would have to play with the threshold parameter.

Why doesn't qplot plot lines in multiple series for this data file?

It's my first day learning R and ggplot. I've followed some tutorials and would like plots like are generated by the following command:
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
It looks like the figure on this page:
http://www.r-bloggers.com/quick-introduction-to-ggplot2/
I had a handmade test data file I created, which looks like this:
site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7
but when I try to read and plot it with:
test <- read.table('test.data')
qplot(temp, humidity, data = test, color=site, geom = c("point", "line"))
the lines on the plot aren't separate series, but link together:
http://imgur.com/weRaX
What am I doing wrong?
Thanks.

You need to tell ggplot2 how to group the data into separate lines. It's not a mind reader! ;)
dat <- read.table(text = " site temp humidity
1 1 1 3
2 1 2 4.5
3 1 12 8
4 1 14 10
5 2 1 5
6 2 3 9
7 2 4 6
8 2 8 7",sep = "",header = TRUE)
qplot(temp, humidity, data = dat, group = site,color=site, geom = c("point", "line"))
Note that you probably also wanted to do color = factor(site) in order to force a discrete color scale, rather than a continuous one.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Creating many line plots using positions as function as time - r

Related

How do you randomly assign data into equal sized control and treatment groups in R?

Using Diff() in R for multiple columns

Custom Axis range with persp in R

How to get member of clusters from R's hclust/heatmap.2

Why doesn't qplot plot lines in multiple series for this data file?

Categories

Resources