I am very new to R and I am currently playing around with it, I've run into an issue with plotting the following dataframe that I imported from a CSV.
studentname dateofbirth GSF3A3U FJÖ1UF05AU EÐLI2GR05BT FOR3D3U FOR3L3DU FOR4A3U ROB2B3U STÆR3FV05ET USA1012 WIN3B3DU userid
1 Ada Gauidóttir 13.8.1997 8.3 8.0 4.0 6.8 8.5 8.1 4.0 5.9 9.0 9.4 1
2 Gjaflaug Amildóttir 14.6.1998 6.0 6.6 6.2 8.9 4.7 9.4 8.5 8.1 4.3 5.3 2
3 Unndís Jónasdóttir 2.11.1998 8.7 7.8 6.9 10.0 7.0 10.0 9.3 5.4 7.2 5.8 3
4 Sigjón Elfráðurson 14.10.1996 9.3 8.9 6.2 8.1 9.7 5.5 6.8 9.0 6.9 4.2 4
5 Þórbjörg Rökkvidóttir 12.10.2000 4.9 6.9 5.2 6.9 5.3 5.5 5.6 4.8 8.9 9.2 5
6 Richard Hlérson 3.2.2000 9.4 7.7 8.4 6.1 6.4 9.6 4.9 7.2 9.3 7.0 6
7 Tala Arnalddóttir 18.8.1997 7.9 7.1 6.9 6.0 9.3 5.4 8.1 6.8 5.8 6.7 7
8 Petrína Estefandóttir 24.9.1994 9.6 4.9 5.0 8.4 7.9 8.7 5.5 10.0 4.0 9.5 8
9 Tanja Finnlaugurdóttir 11.7.1993 6.7 6.5 6.9 8.3 6.3 9.6 9.1 4.2 9.6 4.7 9
10 Elly Amosdóttir 6.7.2001 4.8 7.0 4.3 9.5 7.1 4.2 6.6 5.3 9.0 4.4 10
I am trying to plot this data so that the course names are on the X-axis and the rows for each course(the grades) are displayed on the Y-axis.
If I use the code below:
students[,3:12, drop=FALSE]
This result below is exactly what I'm looking for, but how do I scatterplot this?
GSF3A3U FJÖ1UF05AU EÐLI2GR05BT FOR3D3U FOR3L3DU FOR4A3U ROB2B3U STÆR3FV05ET USA1012 WIN3B3DU
1 8.3 8.0 4.0 6.8 8.5 8.1 4.0 5.9 9.0 9.4
2 6.0 6.6 6.2 8.9 4.7 9.4 8.5 8.1 4.3 5.3
3 8.7 7.8 6.9 10.0 7.0 10.0 9.3 5.4 7.2 5.8
4 9.3 8.9 6.2 8.1 9.7 5.5 6.8 9.0 6.9 4.2
5 4.9 6.9 5.2 6.9 5.3 5.5 5.6 4.8 8.9 9.2
6 9.4 7.7 8.4 6.1 6.4 9.6 4.9 7.2 9.3 7.0
7 7.9 7.1 6.9 6.0 9.3 5.4 8.1 6.8 5.8 6.7
8 9.6 4.9 5.0 8.4 7.9 8.7 5.5 10.0 4.0 9.5
9 6.7 6.5 6.9 8.3 6.3 9.6 9.1 4.2 9.6 4.7
10 4.8 7.0 4.3 9.5 7.1 4.2 6.6 5.3 9.0 4.4
Related
I have a dataframe with time in 10 min intervals.
date time h150 h200 h250 h500 h750 h1000 h1250 h1500
1 2018-06-01 07:40:00 7.2 8.0 7.8 7.9 7.8 7.8 7.9 7.9
2 2018-06-01 07:50:00 7.3 8.3 8.1 8.3 8.1 8.2 8.3 8.1
3 2018-06-01 08:00:00 7.5 9.0 8.3 8.4 8.2 8.2 8.5 8.3
4 2018-06-01 08:10:00 7.4 7.5 6.7 6.3 6.1 6.0 6.0 7.2
5 2018-06-01 08:20:00 7.4 5.9 5.7 5.6 5.4 5.4 5.3 5.3
6 2018-06-01 08:30:00 7.5 5.7 5.7 5.6 5.5 5.4 5.3 5.3
7 2018-06-01 08:40:00 7.5 5.7 5.7 5.6 5.5 5.4 5.3 5.3
8 2018-06-01 08:50:00 7.5 5.6 5.7 5.6 5.6 5.5 5.3 5.3
9 2018-06-01 09:00:00 7.4 5.6 5.7 5.6 5.6 5.5 5.3 5.3
10 2018-06-01 09:10:00 7.4 5.6 5.6 5.6 5.6 5.4 5.3 5.3
11 2018-06-01 09:20:00 7.4 5.6 5.6 5.6 5.5 5.5 5.4 5.3
12 2018-06-01 09:30:00 7.4 5.6 5.6 5.6 5.5 5.5 5.4 5.3
I only want to keep rows with full hours (i.e. 15:00:00).
How can I do this?
Thanks!
Perhaps this helps
library(dplyr)
library(stringr)
df1 %>%
filter(str_detect(time, ":00:00$"))
I have two lists of different sizes. One list (named * trees * ) is composed of phylogenetic trees (class phylo) and the second list (named * data_values*) is composed of numeric values.
The tips names of each phylogenetic tree of the list * tree* match with the names of each element inside of the list of values. But the list data_values is composed of a greater number of elements than the tips of each tree.
library(phytools)
library(ape)
#original tree:
tree_original = rtree(12, tip.label = paste0("species", LETTERS[1:12]))
##list of trees:
nodes = 14:23
trees = lapply(nodes,extract.clade,phy=tree_orignal)
names(trees) <- paste0("", 14:23)
data_values <- list()
for (i in 1:17) { data_values[[paste0('species', LETTERS[i])]] <- round(rnorm(10, 5, 4), 1) }
I would like to match both lists (trees and data_values) using species as an index to have a data frame for each tree (see example below). I can do this operation for each tree of the list trees individually but, as my list of species is much bigger than this example, I would like to know if I can do this operation (below) for the all list of trees and not run tree by tree, like this:
tree14 = data_values[match(trees$`14`$tip.label, names(data_values))]
tree14 = llply(tree14, function(x) sapply(x, as.numeric))
tree14_df = ldply(tree14, .fun=identity) **I will need each result as a data.frame**
.id 1 2 3 4 5 6 7 8 9 10
1 speciesE -0.5 3.4 2.0 5.3 3.7 8.2 3.5 -2.0 3.1 10.2
2 speciesL 6.8 4.3 7.1 5.5 4.9 2.5 0.3 -3.8 4.1 6.4
3 speciesA 2.5 2.5 9.6 10.6 2.2 7.1 4.1 4.4 6.0 6.7
4 speciesI -3.5 7.2 6.8 2.8 7.5 8.9 13.4 13.1 1.8 5.5
5 speciesC 4.3 2.2 10.0 7.4 4.4 8.3 -0.7 3.6 9.2 6.3
6 speciesH 6.3 6.1 2.2 4.6 7.4 7.3 2.9 0.6 3.0 5.2
7 speciesB 8.3 1.7 -0.1 4.5 9.4 -0.2 7.5 1.4 -0.3 4.6
8 speciesD 6.2 5.8 6.6 1.1 5.4 11.1 -1.1 0.0 7.9 0.4
9 speciesG 3.5 2.8 1.4 11.6 -2.8 11.0 3.5 2.8 3.1 4.8
10 speciesK 0.9 4.9 5.4 2.7 -0.7 5.1 18.3 4.9 2.5 -0.7
tree15 = data_values[match(trees$`15`$tip.label, names(data_values))]
tree15 = llply(tree15, function(x) sapply(x, as.numeric))
tree15_df = ldply(tree15, .fun=identity)
.id 1 2 3 4 5 6 7 8 9 10
1 speciesE -0.5 3.4 2.0 5.3 3.7 8.2 3.5 -2.0 3.1 10.2
2 speciesL 6.8 4.3 7.1 5.5 4.9 2.5 0.3 -3.8 4.1 6.4
3 speciesA 2.5 2.5 9.6 10.6 2.2 7.1 4.1 4.4 6.0 6.7
4 speciesI -3.5 7.2 6.8 2.8 7.5 8.9 13.4 13.1 1.8 5.5
5 speciesC 4.3 2.2 10.0 7.4 4.4 8.3 -0.7 3.6 9.2 6.3
6 speciesH 6.3 6.1 2.2 4.6 7.4 7.3 2.9 0.6 3.0 5.2
7 speciesB 8.3 1.7 -0.1 4.5 9.4 -0.2 7.5 1.4 -0.3 4.6
... this operation goes until tree23
I have a question about this simple code. I cannot understand why it produces two different plots.
boxplot(split(iris$Sepal.Length, iris$Species))
boxplot(iris[,1,1],iris[,1,2],iris[,1,3])
The answer to this issue can be seen by exploring the data for barplot:
The code split(iris$Sepal.Length, iris$Species) will produce these result:
$setosa
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0
[27] 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0
$versicolor
[1] 7.0 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6
[27] 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7
$virginica
[1] 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
[27] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9
Which are three different variables after using split() function. The plot is also different:
When splitting you create new variables according to Species.
For the second code: boxplot(iris[,1,1],iris[,1,2],iris[,1,3]) the output is the same variable for iris[,1,1],iris[,1,2],iris[,1,3]:
iris[,1,1]
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0
[27] 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4
[53] 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7
[79] 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3
[105] 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2
[131] 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9
iris[,1,2]
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0
[27] 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4
[53] 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7
[79] 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3
[105] 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2
[131] 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9
iris[,1,3]
[1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0
[27] 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4
[53] 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1 6.3 6.1 6.4 6.6 6.8 6.7
[79] 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3
[105] 6.5 7.6 4.9 7.3 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2 6.2 6.1 6.4 7.2
[131] 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8 6.7 6.7 6.3 6.5 6.2 5.9
That is why the output will be the same for the plots:
If you wnat to compare the first three variables in a boxplot you could use boxplot(iris[,1],iris[,2],iris[,3]) where iris[,1] is related to first variable in iris and so on.
I was expecting points but got this when I did
plot(data$v3,data$v2)
my data
V2 V3
2 -2.0 2.7
3 0.5 3.9
4 1.3 4.5
5 5.7 6.0
6 10.4 8.7
7 3.4 2.7
8 7.6 3.2
9 4.1 5.6
10 5.0 9.2
11 8.5 11.7
12 12.3 6.8
13 16.1 13.0
14 13.2 11.9
15 8.8 8.6
16 7.9 6.1
17 1.1 4.9
18 3.0 1.0
19 4.5 7.2
20 2.7 2.7
21 7.6 7.6
I tried searching but from my understanding the function is supposed to give points, not bars. How do I fix this?
Does anyone know any way to accomplish the following in a vectorized format?
Rather than subtracting member-wise test1 from test, I would like to subtract every element of test1 from every element of test. So, rather than:
test = c(1:10)
test1 = seq(0.1, 1, 0.1)
test - test1
[1] 0.9 1.8 2.7 3.6 4.5 5.4 6.3 7.2 8.1 9.0
I want:
test2=vector("list")
for(i in 1:length(test)){
test2[[i]] = test[i] - test1
}
test2
[[1]]
[1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
[[2]]
[1] 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0
[[3]]
[1] 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0
[[4]]
[1] 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3.0
[[5]]
[1] 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4.0
[[6]]
[1] 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0
[[7]]
[1] 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6.0
[[8]]
[1] 7.9 7.8 7.7 7.6 7.5 7.4 7.3 7.2 7.1 7.0
[[9]]
[1] 8.9 8.8 8.7 8.6 8.5 8.4 8.3 8.2 8.1 8.0
[[10]]
[1] 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0
Even for vectors of uneven length?
outer(test, test1, `-`)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
# [2,] 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1
# [3,] 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2
# [4,] 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3
# [5,] 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4
# [6,] 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5
# [7,] 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6
# [8,] 7.9 7.8 7.7 7.6 7.5 7.4 7.3 7.2 7.1 7
# [9,] 8.9 8.8 8.7 8.6 8.5 8.4 8.3 8.2 8.1 8
# [10,] 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9
This does the trick:
lapply(test,function(x) x - test1)
A vectorized approach giving you the desired list:
x = rep(test, each=length(test))
split(x- test1, x)
#$`1`
# [1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
#$`2`
# [1] 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0
#$`3`
# [1] 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0
#$`4`
# [1] 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3.0
#$`5`
# [1] 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4.0
#$`6`
# [1] 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0
#$`7`
# [1] 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6.0
#$`8`
# [1] 7.9 7.8 7.7 7.6 7.5 7.4 7.3 7.2 7.1 7.0
#$`9`
# [1] 8.9 8.8 8.7 8.6 8.5 8.4 8.3 8.2 8.1 8.0
#$`10`
# [1] 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0
> test = c(1:10)
> test1 = seq(0.1, 1, 0.1)
> lapply(test,function(e) e-test1)
[[1]]
[1] 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0
[[2]]
[1] 1.9 1.8 1.7 1.6 1.5 1.4 1.3 1.2 1.1 1.0
[[3]]
[1] 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2.0
[[4]]
[1] 3.9 3.8 3.7 3.6 3.5 3.4 3.3 3.2 3.1 3.0
[[5]]
[1] 4.9 4.8 4.7 4.6 4.5 4.4 4.3 4.2 4.1 4.0
[[6]]
[1] 5.9 5.8 5.7 5.6 5.5 5.4 5.3 5.2 5.1 5.0
[[7]]
[1] 6.9 6.8 6.7 6.6 6.5 6.4 6.3 6.2 6.1 6.0
[[8]]
[1] 7.9 7.8 7.7 7.6 7.5 7.4 7.3 7.2 7.1 7.0
[[9]]
[1] 8.9 8.8 8.7 8.6 8.5 8.4 8.3 8.2 8.1 8.0
[[10]]
[1] 9.9 9.8 9.7 9.6 9.5 9.4 9.3 9.2 9.1 9.0