I am working with readHTMLTable and am having difficulties performing calculations on the columns, as when I convert to numeric with as.numeric the values in the column are changed from values to rank.
Can anyone help
a=readHTMLTable("http://www.nhl.com/ice/standings.htm?season=20132014&type=LEA",which=3,trim=F)
> a[,5]
[1] 54 54 52 52 51 51 46 46 46 46 43 45 42 43 39 40 38 37 38 35 37 37 38 36 36 34 35 29 29 21
Levels: 21 29 34 35 36 37 38 39 40 42 43 45 46 51 52 54
> a[,5]=as.numeric(a[,5])
> a[,5]
[1] 16 16 15 15 14 14 13 13 13 13 11 12 10 11 8 9 7 6 7 4 6 6 7 5 5 3 4 2 2 1
I would like to be able to perform functions on the values of a[,5], not the ranks. such as mean(a[,5]) = (54+54+52...+21)/30, not
mean(a[,5])
[1] 8.933333
The problem is trying to convert a factor variable to numeric. See this post.
The canonical way to handle the problem would be as.numeric(levels(a[,5]))[a[,5]]
However, the method I often use is as.numeric(as.character(a[,5])) because it's easier to remember.
Related
Good evening,
I need to solve a location problem in R and I'm stuck in one of the first steps.
From a .txt file I need to create a distance matrix using the euclidean method.
datos <- file.choose()
servidores <- read.table(datos)
servidores
From which I obtain the following information:
X50 shows the total number of servers.
x5 the number of hubs required.
x120 the total capacity.
The first column shows the distance of x.
The second column shows the distance of y.
The third column shows the requirements of the node.
X50 X5 X120
1 2 62 3
2 80 25 14
3 36 88 1
4 57 23 14
5 33 17 19
6 76 43 2
7 77 85 14
8 94 6 6
9 89 11 7
10 59 72 6
11 39 82 10
12 87 24 18
13 44 76 3
14 2 83 6
15 19 43 20
16 5 27 4
17 58 72 14
18 14 50 11
19 43 18 19
20 87 7 15
21 11 56 15
22 31 16 4
23 51 94 13
24 55 13 13
25 84 57 5
26 12 2 16
27 53 33 3
28 53 10 7
29 33 32 14
30 69 67 17
31 43 5 3
32 10 75 3
33 8 26 12
34 3 1 14
35 96 22 20
36 6 48 13
37 59 22 10
38 66 69 9
39 22 50 6
40 75 21 18
41 4 81 7
42 41 97 20
43 92 34 9
44 12 64 1
45 60 84 8
46 35 100 5
47 38 2 1
48 9 9 7
49 54 59 9
50 1 58 2
I tried to use the dist() function:
distance_matrix <-dist(servidores,method = "euclidean",diag = TRUE,upper = TRUE)
but since x and y are on different columns I am not sure what to do to get a 50x50 matrix with all the distances.
Anybody knows how could I create such matrix?.
Many thanks in advance.
currently, I read in a graph from an edgelist as follows:
>> require(igraph) # i have igraph 1.1.0
>> g1 <- read_graph(graphname, format='ncol')
>> V(g1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 38 40 42 44 46 47 48 49 50 52 56 57 58
[50] 59 60 61 62 63 64 65 67 68 41 69 43 53 37 39 45 51 54 55 66 70
As you can see, the vertex ordering is completely wrong, despite the fact that the vertices have incredibly, incredibly basic naming convention (they are all just integers). This is incredibly problematic, because the ordering of the get.adjacency function in igraph (returning me a 70x70 matrix) depends on the ordering of the vertices in V(g1), so when I try to compare to some g2 with the same set of vertices, they are similarly in a ridiculously nonsensical ordering (yet distinct from the one here) leading to inconsistent graph vertices in the sample of graphs I have despite them all having the same vertex labels. Is there a way to correct this issue, such that I can easily reorder the vertices in my graph so that the resulting adjacency matrices have sensible orderings?
EDIT: note I have already tried permuting the vertices with the permute.vertices function:
>> gtest <- permute.vertices(g1, as.numeric(V(g1))) # permute vertex ids by the ordering returned by V()
>> V(gtest) # too bad it doesn't work...
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 38 40 42 44 46 47 48 49 50 52 56 57 58
[50] 59 60 61 62 63 64 65 67 68 41 69 43 53 37 39 45 51 54 55 66 70
I managed to get it working when I instead read my graph in as:
>> g1 <- read_graph(graphname, format='ncol', predef=1:70)
>> V(g1)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
[50] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70
But this seems a bit ludicrous if this really is the only way to do it. Does anybody have any other suggestions?
Thanks!
This question already has answers here:
Create integer sequences defined by 'from' and 'to' vectors
(2 answers)
Closed 5 years ago.
Let's say, I created two vectors like:
Ncla = 10
CC.1 = seq(2,((Ncla *Ncla)-Ncla),(Ncla+1))
CC.2 = seq(Ncla,((Ncla *Ncla)-Ncla),(Ncla))
and, I tried to create the following sequence:
#[1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26
# 27 28 29 30 35 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
using the statement:
for(i in 1:(Ncla-1)) A.1[i]={c(seq(CC.1[i],CC.2[i],length = 1))}
but it doesn't work.
Any help is greatly appreciated.
Try
unlist(Map(seq, CC.1, CC.2))
# [1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26 27 28 29 30 35
#[26] 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
Or
unlist(sapply(seq_along(CC.1), function(i) seq(CC.1[i], CC.2[i])))
Or
A.1 <- list()
for(i in seq_along(CC.1)) A.1[[i]] <- seq(CC.1[i], CC.2[i])
unlist(A.1)
# [1] 2 3 4 5 6 7 8 9 10 13 14 15 16 17 18 19 20 24 25 26 27 28 29 30 35
#[26] 36 37 38 39 40 46 47 48 49 50 57 58 59 60 68 69 70 79 80 90
test<-NULL
for(i in 1:(Ncla-1)) {
A.1=c(seq(CC.1[i],CC.2[i],1))
test<-c(test,A.1)
}
test
Your mistake: You were not saving your results.
I Have a sequence below that has a pattern that does not change. I can create a vector to represent the missing variables to this pattern(below). But I can't seem to figure out a way to print this specific sequence below as a vector. How would one create a vector that shows the sequence below but stops at a specific row(the last row in the pattern), instead of 51? Thanks
bad <- seq(1,51,by=3)
2,3,5,6,8,9,11,12,14,15,17,18,20,21,23,24,26,27,29,30,32,33,35,36,38,39,41,42,44,45,47,48,50,51
The most straightforward way I can think of is to use "recycling" of a logical vector:
(1:51)[c(FALSE, TRUE, TRUE)]
# [1] 2 3 5 6 8 9 11 12 14 15 17 18 20 21 23 24 26 27 29 30 32 33 35 36 38 39
# [27] 41 42 44 45 47 48 50 51
> bad <- 2:51
> bad[!bad %% 3 == 1]
[1] 2 3 5 6 8 9 11 12 14 15 17 18 20 21 23 24 26 27 29 30 32 33 35 36 38 39 41 42 44 45 47 48 50 51
cumsum(rep(c(2,1), 51/3))
probably inefficient though.
a = 2:51
b = seq(1, 51, by=3)
setdiff(a,b)
This question already has answers here:
How to convert a factor to integer\numeric without loss of information?
(12 answers)
Closed 9 years ago.
I am building an App using shiny and openair to analyze wind data.
Right now the data needs to be “cleaned” before uploading by the user.
I am interested in doing this automatically.
Some of the data is empty, some of is not numeric, so it is not possible to build a wind rose.
I want to:
1. Estimate how much of the data is not numeric
2. Cut it out and leave only numeric data
here is an example of the data:
the "NO2.mg" is read as a factor and not int becuse it does not consist only numbers
OK
here is a reproducible example:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2
[1] 5 4 c1 54 c5 1 2 3 4 5 6 7 8 9 10 11 12 13 14
[20] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[39] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
52 Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 ... c5
> as.numeric(no2)
[1] 45 34 51 46 52 1 12 23 34 45 47 48 49 50 2 3 4 5 6
[20] 7 8 9 10 11 13 14 15 16 17 18 19 20 21 22 24 25 26 27
[39] 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44
Worst R haiku ever:
Some of the data is empty,
some of is not numeric,
so it is not possible to build a wind rose.
To convert a factor to numeric, you need to convert to character first:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2_num <- as.numeric(as.character(no2))
#Warning message:
# NAs introduced by coercion
no2_clean <- na.omit(no2_num) #remove NAs resulting from the bad data
# [1] 5 4 54 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
# [40] 37 38 39 40 41 42 43 44 45 46 47 48 49
# attr(,"na.action")
# [1] 3 5
# attr(,"class")
# [1] "omit"
length(attr(no2_clean,"na.action"))/length(no2)*100
#[1] 3.703704
OK this is how i did it i am sure someone has abetter way
i'd love it if you share with me
this is my data:
no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
to count the "bad data:"
sum(is.na((as.numeric(as.vector(no2)))))
and to estimate the percent of bad data:
sum(is.na((as.numeric(as.vector(no2)))))/length(no2)*100