Plotting great circles from a subset in R - r

I have a data frame that after some processing (as geocoding for example) has the following characteristics:
'data.frame': 13 obs. of 5 variables:
$ id : int 1 2 3 4 5 6 7 8 9 10 ...
$ ciudad : Factor w/ 10 levels "Auch","Barcelona",..: 8 4 5 3 2 7 9 10 6 6 ...
$ proyecto: int 1 1 1 1 1 1 1 1 2 2 ...
$ lon : num -1.131 0.564 -9.139 0.627 2.173 ...
$ lat : num 38 44.2 38.7 44.5 41.4 ...
Every proyect (proyecto) has a list of cities. And I need to connect in a radial-way the first of them with the others (of the project). That is what I have been done so far:
# Capitalizing first letters
municipios <- read.csv("ciudades.csv", header=TRUE, sep=";")
stri_trans_totitle(as.character(municipios$ciudad))
write.csv(municipios, file = "municipios.csv")
# Obtaining latitude & longitude
lonlat <- geocode(as.character(municipios$ciudad))
municipios_lonlat <- cbind(municipios, lonlat)
write.csv(municipios_lonlat, file = "municipios_lonlat.csv")
str(municipios_lonlat)
# Plotting a simple map
xlim <- c(-13.08, 8.68)
ylim <- c(34.87, 49.50)
map("world", col="#191919", fill=TRUE, bg="#000000", lwd=0.05, xlim=xlim, ylim=ylim)
# Plotting cities
symbols(municipios_lonlat$lon, municipios_lonlat$lat, bg="#e2373f", fg="#ffffff", lwd=0.5, circles=rep(1, length(municipios_lonlat$lon)), inches=0.05, add=TRUE)
# Subsetting, splitting & connecting
uniq <- unique(unlist(municipios_lonlat$proyecto))
for (i in 1:length(uniq)){
data_1 <- subset(municipios_lonlat, proyecto == uniq[i])
for (i in 2:length(data_1$lon)-1){
lngs <- c(data_1$lon[1], data_1$lon[i])
lats <- c(data_1$lat[1], data_1$lat[i])
lines(lngs, lats, col="#e2373f", lwd=2)
}
}
But it does not like quite real. So I need to use great circles to improve the resulting map. I know I have to use the geosphere library, and use a similar loop as the one I have done in the last paragraph. But the things I tried did not work. Please could you help me. You are my only hope Obi Wan Kenobis.
Note: here you can download my data.

Related

GGPLOT: Printing Stacked Bar Chart & Line to File

I know that it might not look like it from this question, but I've actually been programming for over 20 years, but I'm new to R. I'm trying to move away from Excel and to automate creation of about 100 charts I currently do in Excel by hand. I've asked two previous questions about this: here and here. Those solutions work for those toy examples, but when I try the exact same code on my own full program, they behave very differently and I'm completely befuddled as to why. When I run the program below, the testplot.png file is just a plot of the line, without the stacked bar chart.
So here is my (full) code as cut down as I can make it. If anyone wants to critique my programming, go ahead. I know that the comments are light, but that's to try to shorten it for this post. Also, this does actually download the USDA PSD database which is about 20MB compressed and is 170MB uncompressed...sorry but I would love someone's help on this!
Edit, here are str() outputs of both 'full' data and 'toy' data. The toy data works, the full data doesn't.
> str(melteddata)
Classes ‘data.table’ and 'data.frame': 18 obs. of 3 variables:
$ Year : int 1 2 3 4 5 6 1 2 3 4 ...
$ variable: Factor w/ 3 levels "stocks","exports",..: 1 1 1 1 1 1 2 2 2 2 ...
$ Qty : num 2 4 3 2 4 3 4 8 6 4 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(SoySUHist)
Classes ‘data.table’ and 'data.frame': 159 obs. of 3 variables:
$ Year : int 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ...
$ variable: Factor w/ 3 levels "Stocks","DomCons",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Qty : num 0.0297 0.0356 0.0901 0.1663 0.3268 ...
- attr(*, ".internal.selfref")=<externalptr>
> str(linedata)
Classes ‘data.table’ and 'data.frame': 6 obs. of 2 variables:
$ Year: int 1 2 3 4 5 6
$ Qty : num 15 16 15 16 15 16
- attr(*, ".internal.selfref")=<externalptr>
> str(SoyProd)
Classes ‘data.table’ and 'data.frame': 53 obs. of 2 variables:
$ Year: int 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 ...
$ Qty : num 701 846 928 976 1107 ...
- attr(*, ".internal.selfref")=<externalptr>
>
library(data.table)
library(ggplot2)
library(ggthemes)
library(plyr)
toyplot <- function(plotdata,linedata){
plotCExp <- ggplot(plotdata) +
geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity") +
geom_line(data=linedata, aes(x=Year,y=Qty)) # <---- comment out this line & the stack plot works
ggsave(plotCExp,filename = "ggsavetest.png", width=7, height=5, units="in")
}
convertto <- function(value,crop,unit='BU'){
if (unit=='BU' & ( crop=='WHEAT' | crop=='SOYBEANS')){
value = value * 36.7437
}
return(value)
}
# =====================================
# Download Data (Warning...large download!)
# =====================================
system("curl https://apps.fas.usda.gov/psdonline/download/psd_alldata_csv.zip | funzip > DATA/psd.csv")
tmp <- fread("DATA/psd.csv")
PSD = data.table(tmp)
rm(tmp)
setkey(PSD,Country_Code,Commodity_Code,Attribute_ID)
tmp=unique(PSD[,.(Commodity_Description,Attribute_Description,Commodity_Code,Attribute_ID)])
tmp[order(Commodity_Description)]
names(PSD)[names(PSD) == "Market_Year"] = "Year"
names(PSD)[names(PSD) == "Value"] = "Qty"
PSDCmdtyAtt = unique(PSD[,.(Commodity_Code,Attribute_ID)])
# Soybean Production, Consumpion, Stocks/Use
SoyStocks = PSD[list("US",2222000,176),.(Year,Qty)] # Ending Stocks
SoyExp = PSD[list("US",2222000,88),.(Year,Qty)] # Exports
SoyProd = PSD[list("US",2222000,28),.(Year,Qty)] # Total Production
SoyDmCons = PSD[list("US",2222000,125),.(Year,Qty)] # Total Dom Consumption
SoyStocks$Qty = convertto(SoyStocks$Qty,"SOYBEANS","BU")/1000
SoyExp$Qty = convertto(SoyExp$Qty,"SOYBEANS","BU")/1000
SoyProd$Qty = convertto(SoyProd$Qty,"SOYBEANS","BU")/1000
SoyDmCons$Qty = convertto(SoyDmCons$Qty,"SOYBEANS","BU")/1000
# Stocks/Use
SoySUPlot <- SoyExp
names(SoySUPlot)[names(SoySUPlot) == "Qty"] = "Exports"
SoySUPlot$DomCons = SoyDmCons$Qty
SoySUPlot$Stocks = SoyStocks$Qty
SoySUHist <- melt(SoySUPlot,id.vars="Year")
SoySUHist$Qty = SoySUHist$value/1000
SoySUHist$value <- NULL
SoySUPlot$StocksUse = 100*SoySUPlot$Stocks/(SoySUPlot$DomCons+SoySUPlot$Exports)
SoySUPlot$Production = SoyProd$Qty/1000
SoySUHist$variable <- factor(SoySUHist$variable, levels = rev(levels(SoySUHist$variable)))
SoySUHist = arrange(SoySUHist,variable)
toyplot(SoySUHist,SoyProd)
All right, I'm feeling generous. Your example code contains a lot of fluff that should not be in a minimal reproducible example and your system call is not portable, but I had a look anyway.
The good news: Your code works as expected.
Let's plot only the bars:
ggplot(SoySUHist) +
geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity")
Now only the lines:
ggplot(SoySUHist) +
geom_line(data=SoyProd, aes(x=Year,y=Qty))
Now compare the scales of the y-axes. If you plot both together, the bars get plotted, but they are so small that you can't see them. You need to rescale:
ggplot(SoySUHist) +
geom_bar(aes(x=Year,y=Qty,factor=variable,fill=variable), stat="identity") +
geom_line(data=SoyProd, aes(x=Year,y=Qty/1000))

merge data frames "not a slot in class data.frame"

I use the book "A practical guide to geostatistical mapping" from T. Hengl, which also offers the code to reproduce the results. Unfortunately, loads of the code contained is deprecated or even defunct. I was able to restore most of the code, but now I'm stuck with something seemingly simple: merging two data frames. My error:
Error in (function (cl, name, valueClass) : ‘data’ is not a slot in class “data.frame”
Here the code to reproduce that error:
library(gstat)
library(rgdal)
library(sp)
# load the data:
data(meuse)
coordinates(meuse) <- ~x+y
proj4string(meuse) <- CRS("+init=epsg:28992")
download.file("http://spatial-analyst.net/book/system/files/meuse.zip", destfile=paste(getwd(), "meuse.zip", sep="/"))
grid.list <- c("ahn.asc", "dist.asc", "ffreq.asc", "soil.asc")
# unzip the maps in a loop:
for(j in grid.list){
fname <- unzip("meuse.zip", file=j)
print(fname)
file.copy(fname, paste("./", j, sep=""), overwrite=FALSE)
}
# load grids to R:
meuse.grid <- readGDAL(grid.list[1])
# fix the layer name:
names(meuse.grid)[1] <- sub(".asc", "", grid.list[1])
for(i in grid.list[-1]) {
meuse.grid#data[sub(".asc", "", i[1])] <- readGDAL(paste(i))$band1
}
names(meuse.grid)
proj4string(meuse.grid) <- CRS("+init=epsg:28992")
meuse.ov <- over(meuse, meuse.grid)
str(meuse.ov)
meuse.data <- meuse[c("zinc", "lime")]#data
str(meuse.data)
meuse.ov#data <- merge(meuse.ov, meuse.data)
This is really confusing, as both data frames (meuse.ov and meuse.data) seem identical in their structure:
> str(meuse.ov)
'data.frame': 155 obs. of 4 variables:
$ ahn : int 3214 3402 3277 3563 3406 3355 3428 3476 3522 3525 ...
$ dist : num 0.00136 0.01222 0.10303 0.19009 0.27709 ...
$ ffreq: int 1 1 1 1 1 1 1 1 1 1 ...
$ soil : int 1 1 1 2 2 2 2 1 1 2 ...
and
> str(meuse.data)
'data.frame': 155 obs. of 2 variables:
$ zinc: num 1022 1141 640 257 269 ...
$ lime: Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
I tried resolving this with looking things up on stackoverflow, but nothing did work. The (not working) legacy code in the book suggested this (for your understanding maybe):
meuse.ov <- overlay(meuse.grid, meuse)
meuse.ov#data <- cbind(meuse.ov#data, meuse[c("zinc", "lime")]#data)

Scatterplots in R using lattice and cloud, how to determine colors by factors?

I am still struggling with R plots and colors -- some results are as I expected, some not.
I have a 2-million point data set, generated by a simulation process. There are several variables on the dataset, but I am interested on three and on a factor that describe the class for that data point.
Here is a short snippet of code that reads the points and get some basic statistics on it:
library(lattice)
library(plyr)
myData <- read.table("dados - b1000 n10000 var 0,2 - MAX40.txt",
col.names=c("Class","Thet1Thet2","Thet3Thet2","Thet3Thet1",
"K12","K23","delta","w_1","w_2","w_3"))
count (myData$Class)
That gives me
## x freq
## 1 A 8030
## 2 B 17247
## 3 C 4999
## 4 D 16495
## 5 E 1949884
## 6 N 3345
(the input file is quite large, cannot add it as a link)
I want to see these points in a scatterplot matrix, so I use the code
colors=c("red","green","blue","cyan","magenta","yellow")
# Let's try with a very small dot size, see if we can visualize the inners of the cube.
cloud(myData$delta ~ myData$K12 + myData$K23, xlab="K12", ylab="K23", zlab="delta",
cex=0.001,main="All Classes",col.point = colors[myData$Class])
Here is the result. As expected, points from class E are in vast majority, so I cannot see points of other classes. The problem is that I expected the points to be plotted in magenta (classes are A, B, C, D, E, N; colors are red, green, blue, cyan, magenta, yellow).
When I do the plot class by class it works as expected, see two examples:
data <- subset(myData, Class=="A")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,main="Class A",
col.point = colors[data$Class])
gives this:
And this snippet of code
data <- subset(myData, Class=="E")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,main="Class E",
col.point = colors[data$Class])
gives this:
This also seems as expected: a plot of points of all classes except E.
data <- subset(myData, Class!="E")
cloud(data$delta ~ data$K12 + data$K23, xlab="K12", ylab="K23", zlab="delta",pch=20,
cex=0.01,main="All Classes (except E)",col.point = colors[data$Class])
The question is, why on the first plot the points are blue instead of magenta?
This question is somehow similar to Color gradient for elevation data in a XYZ plot with R and Lattice but now I am using factors to determine colors on the scatterplot.
I've also read Changing default colours of a lattice plot by factor -- grouping plots by a factor (using the parameter groups.factor=myData$Class) does not solve my problem, plots are still in blue but separated by class.
Edited to add more information: this fake data set can be used for tests.
num <- 10
data <- as.data.frame(
cbind(
x=rep(seq(1,num), each=num*num),
y=rep(seq(1,num), each=num),
z=rep(seq(1,num))
))
# This is ugly but works!
data$Class[data$z==1]<-'A'
data$Class[data$z==2]<-'A'
data$Class[data$z==3]<-'B'
data$Class[data$z==4]<-'B'
data$Class[data$z==5]<-'C'
data$Class[data$z==6]<-'C'
data$Class[data$z==7]<-'D'
data$Class[data$z==8]<-'D'
data$Class[data$z==9]<-'E'
data$Class[data$z==10]<-'E'
str(data)
When I plot it with
colors=c("red","green","blue","cyan","magenta","yellow")
cloud(data$z ~ data$x + data$y, xlab="X", ylab="Y", zlab="Z",main="All Classes",
col.point = colors[data$Class])
I get the plot below. All points are in blue.
JeremyCG found the problem. Here is the complete code that works, for future reference.
library(lattice)
num <- 10
data <- as.data.frame(
cbind(
x=rep(seq(1,num), each=num*num),
y=rep(seq(1,num), each=num),
z=rep(seq(1,num))
))
data$Class[data$z==1]<-'A'
data$Class[data$z==2]<-'A'
data$Class[data$z==3]<-'B'
data$Class[data$z==4]<-'B'
data$Class[data$z==5]<-'C'
data$Class[data$z==6]<-'C'
data$Class[data$z==7]<-'D'
data$Class[data$z==8]<-'D'
data$Class[data$z==9]<-'E'
data$Class[data$z==10]<-'E'
str(data)
That showed the issue:
## 'data.frame': 1000 obs. of 4 variables:
## $ x : int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : int 1 1 1 1 1 1 1 1 1 1 ...
## $ z : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class: chr "A" "A" "B" "B" ...
Class must be a factor. This solved it:
data$Class <- as.factor(data$Class)
str(data)
## 'data.frame': 1000 obs. of 4 variables:
## $ x : int 1 1 1 1 1 1 1 1 1 1 ...
## $ y : int 1 1 1 1 1 1 1 1 1 1 ...
## $ z : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Class: Factor w/ 5 levels "A","B","C","D",..: 1 1 2 2 3 3 4 4 5 5 ...
Then plot it:
colors=c("red","green","blue","cyan","magenta","yellow")
cloud(data$z ~ data$x + data$y, xlab="X", ylab="Y", zlab="Z",
pch=20,main="All Classes",col = colors[data$Class])
Here is the result:
Thanks #jeremycg !

Histogram of Weekdays by Year R

I have a .csv file that I have loaded into R using the following basic command:
lace <- read.csv("lace for R.csv")
It pulls in my data just fine. Here is the str of the data:
str(lace)
'data.frame': 2054 obs. of 20 variables:
$ Admission.Day : Factor w/ 872 levels "1/1/2013","1/10/2011",..: 231 238 238 50 59 64 64 64 67 67 ...
$ Year : int 2010 2010 2010 2011 2011 2011 2011 2011 2011 2011 ...
$ Month : int 12 12 12 1 1 1 1 1 1 1 ...
$ Day : int 28 30 30 3 4 6 6 6 7 7 ...
$ DayOfWeekNumber : int 3 5 5 2 3 5 5 5 6 6 ...
$ Day.of.Week : Factor w/ 7 levels "Friday","Monday",..: 6 5 5 2 6 5 5 5 1 1 ...
What I am trying to do is create three (3) different histograms and then plot them all together on one. I want to create a histogram for each year, where the x axis or labels will be the days of the week starting with Sunday and ending on Saturday.
Firstly how would I go about creating a histogram out of Factors, which the days of the week are in?
Secondly how do I create a histogram for the days of the week for a given year?
I have tried using the following post here but cannot get it working. I use the Admission.Day as the variable and get an error message:
dat <- as.Date(lace$Admission.Day)
Error in charToDate(x) : character string is not in a standard unambiguous format
Thank you,
Expanding on the comment above: the problem seems to be with importing dates, rather than making the histogram. Assuming there is an excel workbook "lace for R.xlsx", with a sheet "lace":
## Not tested...
library(XLConnect)
myData <- "lace for R.xlsx" # NOTE: need path also...
wb <- loadWorkbook(myData)
lace <- readWorksheet(wb, sheet="lace")
lace$Admission.Day <- as.Date(lace$Admission.Day)
should provide dates that work with all the R date functions. Also, the lubridate package provides a number of functions that are more intuitive to use than format(...).
Then, as an example:
library(lubridate) # for year(...) and wday(...)
library(ggplot2)
# random dates around Jun 1, across 5 years...
set.seed(123)
lace <- data.frame(date=as.Date(rnorm(1000,sd=50)+365*(0:4),origin="2008/6/1"))
lace$year <- factor(year(lace$date))
lace$dow <- wday(lace$date, label=T)
# This creates the histograms...
ggplot(lace) +
geom_histogram(aes(x=dow, fill=year)) + # fill color by year
facet_grid(~year) + # facet by year
theme(axis.text.x=element_text(angle=90)) # to rotate weekday names...
Produces this:

appending new data to specific elements in lists in r

Please correct me if my terminology is wrong because on this question Im not quite sure what Im dealing with regarding elements, objects, lists..I just know its not a data frame.
Using the example from prepksel {adehabitatHS} I am trying to modify my own data to fit into their package. Running this command on their example data creates an object? called x which is a list with 3 sections? elements? to it.
The example data code:
library(adehabitatHS)
data(puechabonsp)
locs <- puechabonsp$relocs
map <- puechabonsp$map
pc <- mcp(locs[,"Name"])
hr <- hr.rast(pc, map)
cp <- count.points(locs[,"Name"], map)
x <- prepksel(map, hr, cp)
looking at the structure of x it is a list of 3 elements called tab, weight, and factor
str(x)
List of 3
$ tab :'data.frame': 191 obs. of 4 variables:
..$ Elevation : num [1:191] 141 140 170 160 152 121 104 102 106 103 ...
..$ Aspect : num [1:191] 4 4 4 1 1 1 1 1 4 4 ...
..$ Slope : num [1:191] 20.9 18 17 24 23.9 ...
..$ Herbaceous: num [1:191] 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 0.2 ...
$ weight: num [1:191] 1 1 1 1 1 2 2 4 0 1 ...
$ factor: Factor w/ 4 levels "Brock","Calou",..: 1 1 1 1 1 1 1 1 1 1 ...
for my data, I will create multiple "x" lists and want to merge the data within each segment. So, I have created an "x" for year 2007, 2008 and 2009. Now, I want to append the "tab" element of 08 to 07, then 09 to 07/08. and do the same for the "weight" and "factor" elements of this list "x". How do you bind that data? I thought about using unlist on each segment of the list and then appending and then joining the yearly data for each segment and then rejoining the three segments back into one list. But this was cumbersome and seemed rather inefficient.
I know this is not how it will work, but in my head this is what I should be doing:
newlist<-append(x07$tab, x08$tab, x09$tab)
newlist<-append(x07$weight, x08$weight, x09$weight)
newlist<-append(x07$factor, x08$factor, x09$factor)
maybe rbind? do.call("rbind", lapply(....uh...stuck
append works for vectors and lists, but won't give the output you want for data frames, the elements in your list (and they are lists) are of different types. Something like
tocomb <- list(x07,x08,x09)
newlist <- list(
tab = do.call("rbind",lapply(tocomb,function(x) x$tab)),
weight = c(lapply(tocomb,function(x) x$weight),recursive=TRUE),
factor = c(lapply(tocomb,function(x) x$factor),recursive=TRUE)
)
You may need to be careful with factors if they have different levels - something like as.character on the factors before converting them back with as.factor.
This isn't tested, so some assembly may be required. I'm not an R wizard, and this may not be the best answer.

Resources