I am trying to do the Landsat 8 example at https://cran.r-project.org/web/packages/water/vignettes/Landsat8.html example. I get stuck at the read.WSdata example where I get the error Error in data.frame(date = unique(WSdata$date), radiation_sum = tapply(WSdata$radiation, :
arguments imply differing number of rows: 1, 0 I am using my own data - NOT the data provided in the example.
My csv file has been organized exactly as the example dataset ("INTA.csv"). The only difference I have noticed between the datasets is that mine has a datetime every 15min and the example dataset has datetime every hour.
Here is my code.
`rm(list=ls())
library(water)
aoi<-createAoi(topleft=c(385387,4776577),
bottomright=c(414825,4749526), EPSG = 32612)
raw_data_folder <- system.file("rossfrk072616", package="water")
image <- loadImage(path=raw_data_folder, aoi=aoi, sat="L8")
image.SR <- loadImageSR(path=raw_data_folder, aoi=aoi)
plot(image)
plot(image.SR)
csvfile<-system.file("rossfrk072616","FTHI_L8_1.csv",package="water")`
I am also assuming we use the original MTL file and NOT the surface reflectance MTL file, which when you download from the ESPA gives the same name of the mtl file as the original?
MTLfile <- system.file("rossfrk072616",
"LC08_L1TP_039030_20160726_20170221_01_T1_MTL.txt", package="water")
WeatherStation <- read.WSdata(WSdata = csvfile,datetime.format = "%Y/%m/%d
%H:%M",columns = c("datetime", "temp","RH", "pp", "radiation",
"wind"),lat=43.07138, long= -112.4311, elev=1354.5, height= 2.5, MTL =
MTLfile)
After I run the read.WSdata I get the error
Error in data.frame(date = unique(WSdata$date), radiation_sum =
tapply(WSdata$radiation,: arguments imply differing number of rows: 1, 0
For some reason I was not able to get the code from the website to function with my dataset. However, I was able to read my weather station data with the following code. WeatherStation <- read.WSdata(WSdata = csvfile, date.format = "%d/%m/%Y",
lat=43.07138, long= -112.4311, elev=1354.5, height= 2.5,
MTL = MTLfile)
This is error is related to different formats for the date. In your first attempt the date.format was set to '%Y/%m/%d'.
Also, you can directly specify the file in the read.WSdata() function, e.g.:
WeatherStation <- read.WSdata(WSdata = 'FTHI_L8_1.csv', date.format = "%d/%m/%Y",
lat=43.07138, long= -112.4311, elev=1354.5, height= 2.5,
MTL = MTLfile)
I used my data and it worked well
library(water)
aoi <- createAoi(topleft = c(810927, 2134059), bottomright = c( 272751,1985845),
EPSG = 32616)
plot(aoi)
csvfile <- system.file("extdata", "datos.csv", package="water")
MTLfile <- system.file("extdata", "L8.MTL.txt", package="water")
ws<- read.WSdata(WSdata = csvfile, date.format = "%d/%m/%Y", time.format="%H:%M:%S", cf=
c(1,1,1),lat=18.094, long= -89.462, elev=279, height= 2, MTL =
MTLfile, columns=c("date" = 1, "time" = 2, "radiation" = 3,"wind" = 4,
"RH" = 5, "temp" = 6, "rain" = 7))
Related
I'm trying to run an event study for multiple firms and multiple events. Input is essentially a simple data frame containing tickers in one column and event dates in the other. This is extracted from WRDS and then loaded into R. The rest of the code should be running directly. To try my code, you could use data(SplitDates) from the package instead of Annoucements.
library(eventstudies)
library(estudy2)
library(tidyquant)
library(quantmod)
#load announcement dates
Announcements = read.csv2(file ="Dates_short_test.csv", sep=";", header=FALSE)
str(Announcements)
colnames(Announcements) = c("name", "when")
Announcements$when = as.Date(Announcements$when)
#get stock prices
tickers = c("IBM", "MSFT")
getSymbols(tickers,
from = "2009-01-01",
to = "2016-01-15")
ClosePrices <- do.call(merge, lapply(tickers, function(x) Cl(get(x))))
#get S&P500
SPX <- getSymbols("^GSPC",auto.assign = FALSE, from = "2009-01-01",
to = "2016-01-15")
#Only keep closing price
SPX1 = SPX[,4]
#Returns SPX
SPX_Ret = (exp(diff(log(SPX1))) - 1) *100
SPX_Ret1 = SPX_Ret[-1]
#Returns Stocks
Stock_Returns = (exp(diff(log(ClosePrices))) - 1) *100
Stock_Returns1 = Stock_Returns[-1]
#create zoo variables as required in the manual of the package
Stock_Returns2 = as.zoo(Stock_Returns1)
SPX_Ret2 = as.zoo(SPX1)
#conduct event study
Event_study_results<- eventstudy(firm.returns = Stock_Returns2,
event.list = Announcements,
event.window = 5,
type = "marketModel",
to.remap = TRUE,
remap = "cumsum",
inference = TRUE,
inference.strategy = "bootstrap",
model.args = list(market.returns=SPX_Ret2))
What it returns is
Error in rval[i, j, drop = drop., ...] :
Subscript out of bounce
I know it's probably a stupid mistake but I can't find it myself.
Thanks!
I want to create a time series in a netcdf file with 3 dimensions(lon, lat, time[unlimited]). The timeseries should be created from other netcdf-files. Each of them have only one timepoint [For Example 17856].
I know how to create the new netcdf-file, how to extract the data from the netcdf-file as a 2D array and the time for the data.
My problem is:
How do I put the 2D array in the netcdf-file with its correct time? How does the start and count argument in the "ncvar_put" fucntion does work?
I use the ncdf4 package and read the Tutorial on:
http://geog.uoregon.edu/bartlein/courses/geog490/week04-netCDF.html#create-and-write-a-netcdf-file and searched for an answer but I still don`t understand it. I´m still unexperienced with netcdf files.
Example
e of my problem:
# data from other netcdf file
values = array(data = c(1:9)/10, dim = c(3,3))
values_2 = array(data = c(9:25)/10, dim = c(3,3))
time = 25
time_2 = 23
# set parameters
lon = 1:3
lat = 1:3
# define dimensions
# Longitude
londim = ncdim_def(name = "longitude", units = "degrees", vals = as.double(lon),
longname = "longitude")
# Latitude
latdim = ncdim_def(name = "latitude", units = "degrees", vals = as.double(lat),
longname = "latitude")
# Time
timedim = ncdim_def(name = "time", units ="days since 1582-10-15 00:00", vals = as.double(1),
unlim = TRUE, calendar = "gregorian")
# define variables
B01 = ncvar_def(name = "B01",
units ="percent",
list(londim,latdim,timedim),
missval = NA,
prec="double")
# create netcdf
nc_test = nc_create("test.nc", list(B01), force_v4 = TRUE)
# Add values
### Here is somethin missing --> How do I add the timestamp?
ncvar_put(nc_test, "B01", values, start=c(1,1,1), count=c(-1,-1,1))
ncvar_put(nc_test, "B01", values2, start=c(1,1,2), count=c(-1,-1,1))
When I want to extract the data I get the 3-3-2 array, but the timesteps are not correct, because I didnt add them. How do I do this?
I would like to have the 3-3-2 array and when I take the time and I want the right times in the correct order.
I add time to the netCDF file by using another method. This is the sample code for your reference.
from datetime import datetime
from datetime import timedelta
from netCDF4 import date2num
from netCDF4 import Dataset
import os
# generate time for netCDF with 1 hour interval
utc_now = datetime.utcnow()
time_list = [utc_now + timedelta(hours=1*step) for step in range(6)]
trans_time = date2num(time_list, units="hours since 0001-01-01 00:00:00.0", calendar="gregorian")
with Dataset(os.getcwd() + "/sample.nc", "w") as sample:
# Create dimension for sample.nc
time = sample.createDimension("time", None)
lat = sample.createDimension("lat", 3) # 3 is the latitude size in this sample
lon = sample.createDimension("lon", 3)
# Create variable for sample.nc
time = sample.createVariable("time","f8",("time",))
lat = sample.createVariable("lat","f4",("lat",))
lon = sample.createVariable("lon","f4",("lon",))
time[:] = trans_time
variable_with_time = sample.createVariable("variable_with_time", "f4", ("time", "lat", "lon"))
for key, value in sample.variables.items():
print(key)
print(value)
print("*"*70)
Output:
time
<class 'netCDF4._netCDF4.Variable'>
float64 time(time)
unlimited dimensions: time
current shape = (6,)
filling on, default _FillValue of 9.969209968386869e+36 used
**********************************************************************
lat
<class 'netCDF4._netCDF4.Variable'>
float32 lat(lat)
unlimited dimensions:
current shape = (3,)
filling on, default _FillValue of 9.969209968386869e+36 used
**********************************************************************
lon
<class 'netCDF4._netCDF4.Variable'>
float32 lon(lon)
unlimited dimensions:
current shape = (3,)
filling on, default _FillValue of 9.969209968386869e+36 used
**********************************************************************
variable_with_time
<class 'netCDF4._netCDF4.Variable'>
float32 variable_with_time(time, lat, lon)
unlimited dimensions: time
current shape = (6, 3, 3)
filling on, default _FillValue of 9.969209968386869e+36 used
**********************************************************************
You may notice that, time is placed as the first dimension. For detailed information, this is the link to the document that I referenced.
I am trying to read in a number of datasets (approx 300) all with similar names to the following (I am not loading them all in at the same time, but I am trying to find a generalised solution where I only change a few things at the beginning of the R file)
E:/Data/Academic/Year1/External/beer/beer_drug_1114_1165
E:/Data/Academic/Year1/External/beer/beer_groc_1114_1165
E:/Data/Academic/Year1/External/beer/beer_PANEL_DR_1114_1165.dat
E:/Data/Academic/Year1/External/beer/beer_PANEL_GR_1114_1165.dat
E:/Data/Academic/Year1/External/beer/beer_PANEL_MA_1114_1165
E:/Data/Academic/Year1/External/beer/Delivery_Stores
The only things which changes are;
Year1 in the E:/Data/Academic/Year1/External
beer in the beer/beer_drug_1114_1165
1114_1165 at the end and the extentions
So I am trying different combinations of paste0 in order to recreate the file extentions.
I have something such as the following which isn´t working so well.
file <- "E:/IRI Data/Academic Dataset External/Year1/External/"
product <- "/beer"
weeks <- "_1114_1165"
paste0(file, product, product, weeks)
But I would like to change /Year1/ in the middle of the extension.
Extentions:
drug <- read.table("E:/Data/Academic/Year1/External/beer/beer_drug_1114_1165", header = TRUE)
groc <- read.table("E:/Data/Academic/Year1/External/beer/beer_groc_1114_1165", header = TRUE)
PANEL_DR <- read.delim("E:/Data/Academic/Year1/External/beer/beer_PANEL_DR_1114_1165.dat", header = TRUE)
PANEL_GR <- read.delim("E:/Data/Academic/Year1/External/beer/beer_PANEL_GR_1114_1165.dat", header = TRUE)
PANEL_MA <- read.delim("E:/Data/Academic/Year1/External/beer/beer_PANEL_MA_1114_1165.dat", header = TRUE)
Delivery_Stores <- read.fwf("E:/Data/Academic/Year1/External/beer/Delivery_Stores",
widths = c(7, 3, 9, 21, 5, 4, 5, 9))
So, I have this workflow :
I have selected 2 columns(Day and Temperature) from my file using ‘Columns filter’. And I connected to ‘R plot’ that I configurated but I obtain this :
The day column is not selected as X axis but (Row ID) and the Y axis is ok.
This is my code in R plot:
# Library
library(qcc)
library(readr)
library(Rserve)
Rserve(args = "--vanilla")
# Data column filter from CSV file imported
Test <- kIn
#Background color
qcc.options(bg.margin = "white", bg.figure = "gray95")
#R graph ranges of a continuous process variable
qcc(data = Test,
type = "R",
sizes = 5,
title = "Sample R Chart Title",
digits = 2,
plot = TRUE)
Here is my try (using KNIME's R, not the community contribution):
#install.packages("qcc")
library(qcc)
data <- knime.in
#Change the names to use Day instead of row keys
row.names(data) <- data$Day
#Using the updated data
plot(qcc(data = data,
type = "R",
sizes = 5,
title = "Sample R Chart Title",
digits = 2,
plot = TRUE))
With results like:
If you want to select the column for the X axis, just change the row.names assignment. (It can also come from knime.flow.in in case the column name is coming from a flow variable, but as I understand it is not the case for you.)
I need help determining how I can use the input for the function below as an input for another r file.
Hotel <- function(hotel) {
require(data.table)
dat <- read.csv("demo.csv", header = TRUE)
dat$Date <- as.Date(paste0(format(strptime(as.character(dat$Date),
"%m/%d/%y"),
"%Y/%m"),"/1"))
library(data.table)
table <- setDT(dat)[, list(Revenue = sum(Revenues),
Hours = sum(Hours),
Index = mean(Index)),
by = list(Hotel, Date)]
answer <- na.omit(table[table$Hotel == hotel, ])
if (nrow(answer) == 0) {
stop("invalid hotel")
}
return(answer)
}
I would input Hotel("Hotel Name")
Here's the other R file using the Hotel name I inputted above.
#Reads the dataframe from the Hotel Function
star <- (Hotel("Hotel Name"))
#Calculates the Revpolu and Index
Revpolu <- star$Revenue / star$Hours
Index <- star$Index
png(filename = "~/Desktop/result.png", width = 480, height= 480)
plot(Index, Revpolu, main = "Hotel Name", col = "green", pch = 20)
testing <- cor.test(Index, Revpolu)
write.table(testing[["p.value"]], file = "output.csv", sep = ";", row.names = FALSE, col.names = FALSE)
dev.off()
I would like for this part to become automated instead of having to copy and paste from the first file an input and then storing it as a variable. Or if it's easier, then make all of this just one function.
Also instead of having to input one Hotel name for the function. Is it possible to make the first file read all the hotel names if they are identified as row names in the .csv file and have that input read in the second file?
Since your example is not reproducible and your code has some bugs (using the column "Rooms" which is not produced by your function), I can't give you a tested answer, but here's how you can structure your code to produce the statistics you want for all hotels without having to copy and paste hotel names:
library(data.table)
# Use fread instead of read.csv, it's faster
dat <- fread("demo.csv", header = TRUE)
dat[, Date := as.Date(paste0(format(strptime(as.character(Date), "%m/%d/%y"), "%Y/%m"),"/1"))
table <- dat[, list(
Revenue = sum(Revenues),
Hours = sum(Hours),
Index = mean(Index)
), by = list(Hotel, Date)]
# You might want to consider using na.rm=TRUE in cor.test instead of
# using na.omit, but I kept it here to keep the result similar.
answer <- na.omit(table)
# Calculate Revpolu inside the data.table
table[, Revpolu := Revenue / Hours]
# You can compute a p-value for all hotels using a group by
testing <- table[, list(p.value = cor.test(Index, Revpolu)[["p.value"]]), by=Hotel]
write.table(testing, file = "output.csv", sep = ";", row.names = FALSE, col.names = FALSE)
# You can get individual plots for each hotel with a for loop
hotels <- unique(table$Hotel)
for (h in hotels) {
png(filename = "~/Desktop/result.png", width = 480, height= 480)
plot(table[Hotel == h, Index], table[Hotel == h, Revpolu], main = h, col = "green", pch = 20)
dev.off()
}