I am trying to make map using R ggplot2 dplyr & ggswissmap
My problem is the following, some points have the exact same coordinates ( long/V1;lat/V2 ) and despite my different tries I always get the same graphics with only one point instead of several.
Of course I tried the jitter function, any help would be appreciated or a link to a tutorial.
library(ggswissmaps)
library(dplyr)
dd<-shp_df[[5]]
df<-ggplot() +
geom_polygon(data=dd, aes( x = long, y = lat, group = group),color="white") +
theme_white_f()
df <- ggplot() +geom_polygon( data = df.merge ,
aes(x = long, y = lat, group = group),
color="white", fill="grey92" )
pp<-df+ geom_point(data=merge_data,aes(x=V1, y=V2,colour=FB), size=3, alpha=0.8) +
theme_white_f()
pp
pp+geom_jitter()
Points in center of each region
cnames <- aggregate(cbind(df.merge$long, df.merge$lat) ~ df.merge$V2, data=df.merge,
FUN=function(x)mean(range(x)))
cnames$Kanton<-cnames$`df.merge$V2`
merge_data<-right_join(cnames,tbl_canton_separate,by=c("Kanton"="Cantons"))
head(df.merge)
long lat order hole piece group id KTNR GRNR AREA_HA X_MIN X_MAX Y_MIN Y_MAX
1 692429 281173 1 FALSE 1 0.1 0 1 4 172895 669244 716900 223896 283343
2 692993 280860 2 FALSE 1 0.1 0 1 4 172895 669244 716900 223896 283343
3 693163 280421 3 FALSE 1 0.1 0 1 4 172895 669244 716900 223896 283343
4 693048 280201 4 FALSE 1 0.1 0 1 4 172895 669244 716900 223896 283343
5 693243 279410 5 FALSE 1 0.1 0 1 4 172895 669244 716900 223896 283343
6 693606 278826 6 FALSE 1 0.1 0 1 4 172895 669244 716900 223896 283343
X_CNTR Y_CNTR Z_MIN Z_MAX Z_AVG Z_MED V1 V2 V3
1 691800 252000 331 1290 533 504 1 ZH Zürich
2 691800 252000 331 1290 533 504 1 ZH Zürich
3 691800 252000 331 1290 533 504 1 ZH Zürich
4 691800 252000 331 1290 533 504 1 ZH Zürich
5 691800 252000 331 1290 533 504 1 ZH Zürich
6 691800 252000 331 1290 533 504 1 ZH Zürich
head(merge_data[,c(1:5,8)])
df.merge$V2 V1 V2 Kanton Numéro FB
1 AG 648725 247936 AG PGV01.002 II
2 AG 648725 247936 AG PGV01.005 I
3 AG 648725 247936 AG PGV01.044 II
4 AG 648725 247936 AG PGV01.047 II
5 AG 648725 247936 AG PGV01.071 IV
6 AG 648725 247936 AG PGV02.015 IV
Increase the spread of the points: Yes it works !
df + geom_jitter(data = merge_data, aes(x=V1, y=V2, colour = FB),
size=2, width = 10000, height = 10000, alpha = 0.5) + theme_white_f()
Related
when I do
table(df$strategy.x)
0 1 2 3
70 514 223 209
table(df$strategy.y)
0 1 2 3
729 24 7 4
I want to create a variable with both of these combined. I tried this
df <- df %>%
mutate(nstrategy1 = ifelse(strategy.x==1| strategy.y==1 , 1, 0))
table(df$nstrategy1)
0 1
399 519
I am supposed to get 514 + 24 = 538 but I got 519 instead
df <- df %>% mutate(nstrategy2 = ifelse(strategy.x==2| strategy.y==2 , 1, 0))
table(df$nstrategy2)
0 1
578 228
Similarly, I am supposed to get 223 + 7 = 230, but I got 228 instead
Is there a good way to merge both strategy.x and strategy.y and end up with a table like the following with 4 categories?
0 1 2 3
799 538 230 213
table(mtcars$am) # 13 1's
table(mtcars$vs) # 14 1's
mtcars$ones = ifelse(mtcars$am == 1 | mtcars$vs == 1, 1, 0)
table(mtcars$ones) # 20 1's < 13 + 14 = 27
Why is it showing only 20 1's instead of 27? It's because there are 7 + 6 + 7 = 20 cars with either one or two 1's in am and vs. There are 13 with am==1 (6+7), and 14 with vs==1 (7+7). Seven cars are in the bottom left because they have 1's in both dimensions, which you are expecting/seeking to count twice.
table(mtcars$am, mtcars$vs)
# 0 1
# 0 12 7
# 1 6 7
The simplest way to get the sum of the two results would be by adding the two table objects:
table(mtcars$am) + table(mtcars$vs)
# 0 1
# 37 27
this seems like a basic question; however, I am not sure if I am unable to word my question to search for the answer that I need.
This is the sample:
id2 sbp1 dbp1 age1 sbp2 dbp2 sex bmi1 bmi2 smoke drink exercise
1 1 134.5 89.5 40 146 84 2 21.74685 22.19658 1 0 1
2 4 128.5 89.5 48 125 70 1 24.61942 22.29476 1 0 0
3 5 105.5 64.5 42 121 80 2 22.15103 26.90204 1 0 0
4 8 116.5 79.5 39 107 72 2 21.08032 27.64403 0 0 1
5 9 106.5 73.5 26 132 81 2 21.26762 29.16131 0 0 0
6 10 120.5 81.5 34 130 85 1 24.91663 26.89427 1 1 0
I have this code here for a function I am making:
linreg.ols<- function(indat, dv, p1, p2, p3){
data<- read.csv(file= indat, header=T)
data[1:5,]
y<- data$dv
x <- as.matrix(data.frame(x0=rep(1,nrow(data)), x1=data$p1, x2=data$p2,
x3=data$p3))
inv<- solve(t(x)%*%x)
xy<- t(x)%*%y
betah<- inv%*%xy
print("Value of beta hat")
betah
}
And when I run my code with this line:
linreg.ols("bp.csv",sbp1,smoke,drink,exercise)
I get the following error:
Error in data.frame(x0 = rep(1, nrow(data)), x1 = data$p1, x2 = data$p2, :
arguments imply differing number of rows: 75, 0
I have a feeling that it's because of how I am extracting the p1, p2, and p3 columns on the line where I create the x variable.
EDIT: changed to y<-data$dv
EDIT: added on part of the sample. Also, I tried:
x <- as.matrix(data.frame(1,data[,c("p1","p2","p3")]))
But that returned the error:
Error in `[.data.frame`(data, , c("p1", "p2", "p3")) : undefined columns selected
I have been trying to plot some data on the map.
With the plot circle of the observed data changing according to a scale.
But the plot produced doesn't reflect the scale.
See the following.
This is the code which I have tried.
newmap <- get_map(location = c(lon = 82.5,lat = 24),zoom=4, color="bw")
ggmap(newmap, extent = "normal") +
geom_point(aes(x = lon, y = lat, colour = scale , size = scale), data = final_data)
I got the code from the following post.
My data looks like this.
> final_data
lon lat disab scale
1 74.79728 34.083671 27832 1
2 74.87226 31.633979 28119 1
3 75.85728 30.900965 34830 1
4 77.31779 28.408912 33579 1
5 77.10249 28.704059 228427 6
6 75.78727 26.912434 74541 2
7 73.02431 26.238947 24898 1
8 75.86475 25.213816 20843 1
9 77.70641 28.984462 27864 1
10 77.45376 28.669156 84458 2
11 78.00807 27.176670 54382 2
12 80.94617 26.846694 77684 2
13 80.33187 26.449923 81988 2
14 81.84631 25.435801 37750 1
15 82.97391 25.317645 39408 2
16 85.13756 25.594095 68869 2
17 86.95240 23.673945 24627 1
18 88.36390 22.572646 342319 8
19 86.43039 23.795653 28865 1
20 86.20288 22.804566 20766 1
21 85.30956 23.344100 22957 1
22 81.28492 21.190449 22061 1
23 81.62964 21.251384 25868 1
24 78.18283 26.218287 18434 1
25 75.85773 22.719569 56279 2
26 77.41262 23.259933 73219 2
27 79.98641 23.181467 32597 1
28 72.57136 23.022505 188917 5
29 70.80216 22.303894 20219 1
30 73.18122 22.307159 47587 2
31 72.83106 21.170240 55055 2
32 75.34331 19.876165 36205 1
33 79.08815 21.145800 63969 2
34 73.78980 19.997453 26572 1
35 72.83973 19.391928 37382 1
36 72.81771 19.003050 484688 11
37 73.85674 18.520430 127858 3
38 78.48667 17.385044 294072 7
39 80.64802 16.506174 40592 2
40 NA NA 53865 2
41 77.61586 12.941483 251561 6
42 75.37037 11.874477 33907 1
43 75.78041 11.258753 51981 2
44 76.07400 11.073182 31863 1
45 76.21443 10.527642 38573 2
46 76.26730 9.931233 41432 2
47 76.61414 8.893212 23403 1
48 76.93664 8.524139 39024 2
49 80.27072 13.082680 163428 4
50 78.70467 10.790483 14489 1
51 78.11978 9.925201 19890 1
52 76.95583 11.016844 32794 1
It will be ton of help, if someone can help me figure out the problem..:)
Thanks in advance.
It does not work because there is no myscale column in your data:final_data.
Change myscale to scale:
newmap <- get_map(location = c(lon = 82.5,lat = 24),zoom=4, color="bw")
ggmap(newmap, extent = "normal") +
geom_point(aes(x = lon, y = lat, colour = scale , size = scale), data = final_data)
I have some data which I have collected.
It consists of Vertices and then Triangles which I have made using a meshing software.
I am able to use R with
trimesh(triangles, vertices)
to make a nice mesh plot.
But can't figure out how to use RGL to make an interactive plot that I can view, and I can't work out how to colour the faces of the mesh based on a different value in the data frame.
here are the vertices in a data frame. x, y, z are the coordinates of the nodes/points (nn)
'data.frame': 23796 obs. of 7 variables:
$ nn : int 0 1 2 3 4 5 6 7 8 9 ...
$ x : num 39.5 70.8 49 83.5 -16 ...
$ y : num 28.2 -2.97 -25.67 -9.1 -39.75 ...
$ z: num 160 158 109 121 188 ...
$ uni: num 3.87 6.64 5.02 4.48 1.91 ...
$ bi : num 0.749 0.784 1.045 0.935 0.733 ...
nn x y z uni bi
0 39.527 28.202 160.219 3.86942 0.74871
1 70.804 -2.966 157.578 6.64361 0.78373
2 48.982 -25.674 109.022 5.02491 1.0451
3 83.514 -9.096 120.988 4.47977 0.9348
4 -16.04 -39.749 188.467 1.90873 0.73286
5 74.526 -3.096 174.347 8.4263 0.70594
6 54.93 -56.347 151.496 7.53334 2.17128
7 56.936 -20.131 186.177 7.16118 1.44875
8 -14.627 -47.1 162.185 2.13939 0.70887
9 38.207 -59.201 147.993 5.83457 4.32971
10 50.645 -32.04 110.418 5.3741 1.14543
The triangles for the vertices are
'data.frame': 47602 obs. of 7 variables:
$ X : int 3435 3161 18424 13600 1564 21598 21283 1171 51 9331 ...
$ Y : int 19658 17204 17467 19721 10099 19018 11341 2723 15729 5851 ...
$ Z : int 2764 9466 16955 2669 10091 21205 18399 20833 15865 9106 ...
X Y Z
3435 19658 2764
3161 17204 9466
18424 17467 16955
13600 19721 2669
1564 10099 10091
21598 19018 21205
21283 11341 18399
1171 2723 20833
51 15729 15865
9331 5851 9106
310 3513 9121
5651 11928 15468
8594 2295 6852
22725 22636 11114
I need to make this into a mesh as I can in trimesh, but with RGL and I need to colour the faces of the mesh based on a scale of uni, where <0.5 is red, 0/5-1/5 is orange and >1.5 is green
It looks something like this in trimesh but how to i do it in RGL for R, WITH COLOURING BASED ON VALUE ON UNI in the first data table
Here is an example, starting with two dataframes.
> library(rgl)
> vertices
x y z
1 1 -1 1
2 1 -1 -1
3 1 1 -1
4 1 1 1
5 -1 -1 1
6 -1 -1 -1
7 -1 1 -1
8 -1 1 1
> triangles
T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12
1 5 5 1 1 2 2 6 6 8 8 1 1
2 1 4 2 3 6 7 5 8 4 3 6 5
3 4 8 3 4 7 3 8 7 3 7 2 6
You need matrices to deal with tmesh3d. A row of 1's must be added to the table of vertices.
verts <- rbind(t(as.matrix(vertices)),1)
trgls <- as.matrix(triangles)
tmesh <- tmesh3d(verts, trgls)
Now you can plot the mesh:
wire3d(tmesh)
About colors, you have to associate one color to each triangle:
tmesh$material <- list(color=rainbow(ncol(trgls)))
wire3d(tmesh)
> shade3d(tmesh)
UPDATE 2019-03-09
The newest version of rgl (0.100.18) allows different interpretation of the material colors.
You can assign a color to each face:
vertices <- as.matrix(vertices)
triangles <- as.matrix(triangles)
mesh1 <- tmesh3d(
vertices = t(vertices),
indices = triangles,
homogeneous = FALSE,
material = list(color = rainbow(ncol(triangles)))
)
shade3d(mesh1, meshColor = "faces")
or assign a color to each vertex:
mesh2 <- tmesh3d(
vertices = t(vertices),
indices = triangles,
homogeneous = FALSE,
material = list(color = rainbow(nrow(vertices)))
)
shade3d(mesh2, meshColor = "vertices")
In previous versions of R I could combine factor levels that didn't have a "significant" threshold of volume using the following little function:
whittle = function(data, cutoff_val){
#convert to a data frame
tab = as.data.frame.table(table(data))
#returns vector of indices where value is below cutoff_val
idx = which(tab$Freq < cutoff_val)
levels(data)[idx] = "Other"
return(data)
}
This takes in a factor vector, looks for levels that don't appear "often enough" and combines all of those levels into one "Other" factor level. An example of this is as follows:
> sort(table(data$State))
05 27 35 40 54 84 9 AP AU BE BI DI G GP GU GZ HN HR JA JM KE KU L LD LI MH NA
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
OU P PL RM SR TB TP TW U VD VI VS WS X ZH 47 BL BS DL M MB NB RP TU 11 DU KA
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3
BW ND NS WY AK SD 13 QC 01 BC MT AB HE ID J NO LN NM ON NE VT UT IA MS AO AR ME
4 4 4 4 5 5 6 6 7 7 7 8 8 8 9 10 11 17 23 26 26 30 31 31 38 40 44
OR KS HI NV WI OK KY IN WV AL CO WA MN NH MO SC LA TN AZ IL NC MI GA OH ** CT DE
45 47 48 57 57 64 106 108 112 113 120 125 131 131 135 138 198 200 233 492 511 579 645 646 840 873 1432
RI DC TX MA FL VA MD CA NJ PA NY
1782 2513 6992 7027 10527 11016 11836 12221 15485 16359 34045
Now when I use whittle it returns me the following message:
> delete = whittle(data$State, 1000)
Warning message:
In `levels<-`(`*tmp*`, value = c("Other", "Other", "Other", "Other", :
duplicated levels in factors are deprecated
How can I modify my function so that it has the same effect but doesn't use these "deprecated" factor levels? Converting to a character, tabling, and then converting to the character "Other"?
I've always found it easiest (less typing and less headache) to convert to character and back for these sorts of operations. Keeping with your as.data.frame.table and using replace to do the replacement of the low-frequency levels:
whittle <- function(data, cutoff_val) {
tab = as.data.frame.table(table(data))
factor(replace(as.character(data), data %in% tab$data[tab$Freq < cutoff_val], "Other"))
}
Testing on some sample data:
state <- factor(c("MD", "MD", "MD", "VA", "TX"))
whittle(state, 2)
# [1] MD MD MD Other Other
# Levels: MD Other
I think this verison should work. The levels<- function allows you to collapse by assigning a list (see ?levels).
whittle <- function(data, cutoff_val){
tab <- table(data)
shouldmerge <- tab < cutoff_val
tokeep <- names(tab)[!shouldmerge]
tomerge <- names(tab)[shouldmerge]
nv <- c(as.list(setNames(tokeep,tokeep)), list("Other"=tomerge))
levels(data)<-nv
return(data)
}
And we test it with
set.seed(15)
x<-factor(c(sample(letters[1:10], 100, replace=T), sample(letters[11:13], 10, replace=T)))
table(x)
# x
# a b c d e f g h i j k l m
# 5 11 8 8 7 5 13 14 14 15 2 3 5
y <- whittle(x, 9)
table(y)
# y
# b g h i j Other
# 11 13 14 14 15 43
It's worth adding to this answer that the new forcats package contains the fct_lump() function which is dedicated to this.
Using #MrFlick's data:
x <- factor(c(sample(letters[1:10], 100, replace=T),
sample(letters[11:13], 10, replace=T)))
library(forcats)
library(magrittr) ## for %>% ; could also load dplyr
fct_lump(x, n=5) %>% table
# b g h i j Other
#11 13 14 14 15 43
The n argument specifies the number of most common values to preserve.
Here's another way of doing it by replacing all the items below the threshold with the first and then renaming that level to Other.
whittle <- function(x, thresh) {
belowThresh <- names(which(table(x) < thresh))
x[x %in% belowThresh] <- belowThresh[1]
levels(x)[levels(x) == belowThresh[1]] <- "Other"
factor(x)
}