I have a similar question to u/Ananas here: Sentinel3 OLCI (chl) Average of netcdf files on Python
I am running into similar problems, in so much that I cannot seem to extract the necessary information from the .nc-files and then merge them to create a time-series. In my case,I am trying to do this in R. My current code, which I have followed and customised from here: https://www.youtube.com/watch?v=jWRszWCVWLc&t=1504s , returns an error:
Error in `[<-.data.frame`(`*tmp*`, variable, value = c(0, 0, 0, 0, 0, :
replacement has 1927 rows, data has 2202561
Maybe I am going at it the wrong way from the start and R-s capabilities wiht .nc files are not suited for this? Any suggestions are welcomed.
Here is my code
extract_variable_from_netcdf<- function(nc,variable){
tryCatch(
{
result<-var.get.nc(nc,variable)
return(result)
},
error=function(cond){
message(paste(variable,"attribute not found"))
message("Here is the original error message")
message(cond)
}
)
}
extract_global_attribute_from_netcdf<- function(nc,global_attribute){
tryCatch(
{
result<-att.get.nc(nc,"NC_GLOBAL",global_attribute)
return(result)
},
error=function(cond){
message(paste(global_attribute,"attribute not found"))
message("Here is the original error message")
message(cond)
}
)
}
folder<- "path to folder"
files<- list.files(folder, pattern= ".nc", full.names = TRUE)
variables<- c("conc_chl", "iop_bpart","lat", "lon") #variables I need to extract
global_attrs<- c("start_date", "stop_date")
headers<-c(global_attrs,variables)
df<-data.frame(matrix(ncol=length(headers), nrow=0))
colnames(df)<- headers
for(file in files) {
nc<- open.nc(file)
chl<- var.get.nc(nc, "conc_chl")
num_chl<- length(chl)
newdf<- data.frame(matrix(ncol=length(headers), nrow=num_chl))
colnames(newdf)<- headers
for (global_attribute in global_attrs) {
newdf[global_attribute]<-extract_global_attribute_from_netcdf(nc,global_attribute)
}
for (variable in variables) {
newdf[variable]<-extract_variable_from_netcdf(nc,variable)
}
df<-merge(df,newdf,all=TRUE)
}
The way I have used ".nc" files with satellite data, in R. Have been reading it in with the "raster" library as a raster file.
library(raster)
r <- raster("yuor_file.nc")
plot(r) # quick plot to see if everything is as it should be
The way I read in my timeseries was with a loop, and in addition I used a function found from this site somewhere, to covert the raster into a sensible r-data frame
stack overflow function, to convert the loaded raster to data frame
gplot_data <- function(x, maxpixels = 50000) {
x <- raster::sampleRegular(x, maxpixels, asRaster = TRUE)
coords <- raster::xyFromCell(x, seq_len(raster::ncell(x)))
## Extract values
dat <- utils::stack(as.data.frame(raster::getValues(x)))
names(dat) <- c('value', 'variable')
dat <- dplyr::as.tbl(data.frame(coords, dat))
if (!is.null(levels(x))) {
dat <- dplyr::left_join(dat, levels(x)[[1]],
by = c("value" = "ID"))
}
dat
}
Read in one file at a time, convert with function and return data.frame
files<- list.files(folder, pattern= ".nc", full.names = TRUE)
fun <- function(i) {
#read in one file at a time
r <- raster(files[i])
#convert to normal data frame
temp <- gplot_data(r)
temp #output
}
dat <- plyr::rbind.fill(lapply(1:length(files), fun)) #bind each iteration
Here a plot using ggplot2 and ggforce.
ggplot() +
geom_tile(data = dat,
aes(x = x, y = y, fill = value))
Alternatively if you do not know the context of you file, the following, from the "ncdf4" package, will help you inspect it. https://towardsdatascience.com/how-to-crack-open-netcdf-files-in-r-and-extract-data-as-time-series-24107b70dcd
library(ncdf4)
our_nc_data <- nc_open("/your_file.nc")
print(our_nc_data)
# look for the variable names and assign them to vectors that can be bound together in dataframes
lat <- ncvar_get(our_nc_data, "lat") #names of latitude column
lon <- ncvar_get(our_nc_data, "lon") #name of longitude column
time <- ncvar_get(our_nc_data, "time") #the time was called time
tunits <- ncatt_get(our_nc_data, "time", "units")# check units
lswt_array <- ncvar_get(our_nc_data, "analysed_sst") #select the relevant variable, this is temperature named "analysed_sst"
I ran the code below in R:
CLCLT_Homes <- file.choose(new = TRUE)
origAddress <- read.csv(CLCLT_Homes, header = TRUE, stringsAsFactors = FALSE)
geocoded <- data.frame(stringsAsFactors = FALSE)
for (i in 1:nrow(origAddress))
{
result <- geocode(origAddress$Address[i], output = "latlona", source = "google")
origAddress$lon[1] <- as.numeric(result[1])
origAddress$lat[1] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
write.csv(origAddress, "where I put the file.csv", row.names = FALSE)
and when I went to look at the file, it had created columns for long and lat for every address, but each address had the exact same longitude and latitude (oddly, except for the address at the very top; it had its own coordinates while all others had different coordinates that matched). Did I forget to include something in the code? Is it only reading the first two lines correctly and then not rotating?
I am new to R. I found a script online which is used to batch geocode a list of addresses.
http://www.storybench.org/geocode-csv-addresses-r/
However I keep getting this error message 'Error: is.character(location) is not TRUE'...anyone have any ideas on how to reslove the issue??
# Geocoding script for large list of addresses.
# Finbar Gillen 25/07/2018
#load up the ggmap library
install.packages('ggmap')
library(ggmap)
# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)
# Read in the CSV data and store it in a variable
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)
# Loop through the addresses to get the latitude and longitude
of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output =
"latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
# Write a CSV file containing origAddress to the working
directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)
After # Print("Working...")
it shall be name of column of your inputfile/dataframe and not 'addresses'
result <- geocode(origAddress$addresses[i], output =
"latlona", source = "google")
I have collected data of different users' location from twitter. I am trying to plot those data in a map in R. The problem is users have given invalid/incorrect addresses which causes geocode function to fail. How can I avoid this failure? Is there any way to check for this error case and not proceed? For example the user location data is something like this for any file geocode9.csv.
available locations,
Buffalo,
New York,
thsjf,
Washington, USA
Michigan,
nkjnt,
basketball,
ejhrbvw
library(ggmap)
fileToLoad <- file.choose(new = TRUE)
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
geocoded <- data.frame(stringsAsFactors = FALSE)
for(i in 1:nrow(origAddress))
{
result <- geocode(origAddress$available_locations[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
write.csv(origAddress, "geocoded.csv", row.names=FALSE)
When the code runs through "thsjf" of the locations list, it throws an error. How can I get past this error? I want something like,
if(false){ # do not run geocode function}
I'm not sure how to geocode those addresses if they are actually wrong. How would the machine even figure it out if it was wrong? I think you need to get the addresses corrected, and THEN geocode everything. Here is some sample code.
#load ggmap
library(ggmap)
startTime <- Sys.time()
# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)
# Read in the CSV data and store it in a variable
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
origAddress$lon[i] <- as.numeric(result[1])
origAddress$lat[i] <- as.numeric(result[2])
origAddress$geoAddress[i] <- as.character(result[3])
}
# Write a CSV file containing origAddress to the working directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)
endTime <- Sys.time()
processingTime <- endTime - startTime
processingTime
Check this for more info.
http://www.storybench.org/geocode-csv-addresses-r/
I am trying to use ggmap to get the fields in administrative_area_level_3 from the google maps api. The single call returns the correct data. The below code returns a '1' for every entry from administrative_area_level_3.
# Geocoding a csv column of "addresses" in R
#load ggmap
library(ggmap)
# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)
# Read in the CSV data and store it in a variable
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)
# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)
# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))
{
# Print("Working...")
result <- geocode(origAddress$addresses[i], output = "more", source = "google")
origAddress$lon[i] <- as.character(result[1])
origAddress$lat[i] <- as.character(result[2])
origAddress$geoAddress[i] <- as.character(result[5])
origAddress$district[i] <- as.character(result[13])
}
# Write a CSV file containing origAddress to the working directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)
I modified these lines to get those fields:
origAddress$geoAddress[i] <- as.character(result[5])
origAddress$district[i] <- as.character(result[13])
When I run this I get the correct administrative_area_level_3
adr <- geocode("35880 WIDENER VALLEY RD Glade Spring VA", output = "more", source = "google")
Here is my CSV:
ID,addresses
1,35880 WIDENER VALLEY RD Glade Spring VA