times series import from Excel and date manipulation in R - r

I have 2 columns of a time series in a .csv excel file, "date" and "widgets"
I import the file into R using:
widgets<-read.csv("C:things.csv")
str(things)
'data.frame': 280 obs. of 2 variables:
$ date: Factor w/ 280 levels "2012-09-12","2012-09-13",..: 1 2 3 4 5 6 7 8 9 10 ...
$ widgets : int 5 10 15 20 30 35 40 50 55 60 65 70 75 80 85 90 95 100 ...
How do I convert the factor things$date into either xts or Time Series format?
for instance when I:
hist(things)
Error in hist.default(things) : 'x' must be numeric

Try reading it in as a zoo object and then converting:
Lines <- "date,widgets
2012-09-12,5
2012-09-13,10
"
library(zoo)
# replace first argument with: file="C:things.csv"
z <- read.zoo(text = Lines, header = TRUE, sep = ",")
x <- as.xts(z)

Related

Trying to tidy data but get errors: can't convert .x[[i]] empty character vector to a function & must extract column with a single valid subscript

I am trying to tidy a dataset where I measured exposure at two different stations by time (in seconds). I have a data frame which has column 1 as Second, column 2 as SiteA_Number (corresponding to the number of particles at SiteA), SiteA_Diamater (diameter of particles at SiteA), SiteA_LDSA (LDSA at SiteA), and the same measurements for SiteB as 3 more columns (SiteB_Number, SiteB_Diam, SiteB_LDSA).
I would like my dataset to transform to have a column for the Seconds, columns for Number, Diameter, and LDSA, and a separate column for the station (SiteA or SiteB). That way, I can plot a graph with Number (y axis) over time (seconds) and fill by site.
The structure of each column is as follows:
'data.frame': 1800 obs. of 7 variables:
$ Second: num 1 2 3 4 5 6 7 8 9 10 ...
$ SiteA_Number : int 16673 19891 20370 17513 18185 18982 18362 17579 16605 15590 ...
$ SiteA_Diam : int 41 39 38 42 41 39 40 42 44 45 ...
$ SiteA_LDSA : num 36.1 40.4 40.7 38.6 38.8 ...
$ SiteB_Number: int 15554 16745 17719 16494 15811 15331 16053 16196 15733 15521 ...
$ SiteB_Diam : int 40 39 37 40 42 44 42 42 42 43 ...
$ SiteB_LDSA : num 33 33.8 34.3 34.2 35.2 ...
I tried using pivot_longer to create a station column and then corresponding columns for the number, diameter, and LDSA:
MergedLDSA %>%
pivot_longer(-Second,
names_to =c("Station", ".value"),
names_sep = ("_"),
names_transform = list(
Number = as.integer,
Diameter = as.integer,
LDSA = as.integer,
Station = as.character())
)
But I get the error message:
Error in `map()`:
! Can't convert
`.x[[i]]`, an empty
character vector, to
a function.
I then tried using the separate() function:
MergedLDSA %>%
separate(c(SiteA_Number, SiteA_Diam, SiteA_LDSA, SiteB_Number, SiteB_Diam, SiteB_LDSA), into = c("Station", ".value"), sep = "_")
But I get the error message:
Error:
! Must extract column with a single valid subscript.
x Subscript `var` has size 6 but must be size 1.
I'm fairly beginner at coding and this is my first time trying to tidy real data. I do not understand the errors and cannot figure out how to tidy my data the way I'd like.
Any help would be greatly appreciated! :)

Why R netCDF4 package is transposing my data?

I'm reading a .nc data in R with ncdf4 and RNetCDF. The NetCDF metadata says that there are 144 lons and 73 lats, which leads to 144 columns and 73 rows, right?
However, the data I get in R seems to be transposed with 144 rows and 73 columns.
Please could you tell me what is wrong?
thanks
library(ncdf4)
a <- tempfile()
download.file(url = "ftp://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis2.derived/pressure/uwnd.mon.mean.nc", destfile = a)
nc <- nc_open(a)
uwnd <- ncvar_get(nc = ncu, varid = "uwnd")
dim(uwnd)
## [1] 144 73 17 494
umed <- (uwnd[ , , 10, 421] + uwnd[ , , 10, 422] + uwnd[ , , 10, 423])/3
nrow(umed)
## [1] 144
ncol(umed)
## [1] 73
It looks you are having two problems.
The first one is related with expecting the same structure that the netCDF file has in R which is a problem in itself because when you are translating the multi-dimensional array structure of the netCDF into 2 dimensional dataframe. NetCDF format needs some reshaping in R in order to be manipulated as it does in python(see: http://geog.uoregon.edu/bartlein/courses/geog490/week04-netCDF.html).
The second one is that you are using values instead of indices when subsetting the data.
umed <- (uwnd[ , , 10, 421] + uwnd[ , , 10, 422] + uwnd[ , , 10, 423])/3
The solution that I see for this is starting by creating the indices of the dimensions that you want to subset. In this example I am subsetting preassure level 10 millibar and all that goes between longitude 230 and 300 and latitude 25 and 40.
nc <- nc_open("uwnd.mon.mean.nc")
LonIdx <- which( nc$dim$lon$vals > 230 & nc$dim$lon$vals <300 )
## [1] 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113
## 114 115 116 117 118 119 120
LatIdx <- which( nc$dim$lat$vals >25 & nc$dim$lat$vals < 40)
## [1] 22 23 24 25 26
LevIdx <- which( nc$dim$level$vals==10)
## [1] 17
Then you would need to apply the indices over each dimension except time which i would assume you don't want to subset. Sub setting lon and latitude is important due to R saves all in memory therefore leaving the whole range of them would consume a significant amount of RAM.
lat <- ncvar_get(nc,"lat")[LatIdx]
lon <- ncvar_get(nc,"lon")[LonIdx]
lev <- ncvar_get(nc,"level")[LevIdx]
time <- ncvar_get(nc,"time")
After that you can get the variable that you were looking for uwnd Monthly U-wind on Pressure Levels and finish reading the netCDF file with a nc_close(nc).
uwnd <- ncvar_get(nc,"uwnd")[LonIdx,LatIdx,LevIdx,]
nc_close(nc)
At the end you can expand the grid with all the four dimensions: longitude,latitude,preassure level and time.
uwndf <- data.frame(as.matrix(cbind(expand.grid(lon,lat,lev,time))),c(uwnd))
names(uwndf) <- c("lon","lat","level","time","U-wind")
Bind it to a dataframe with the U-wind variable and convert the netcdf time variable into an R time object.
uwndf$time_final<-convertDateNcdf2R(uwndf$time, units = "hours", origin =
as.POSIXct("1800-01-01", tz = "UTC"),time.format="%Y-%m-%d %Z %H:%M:%S")
At the end you will have the dataframe you are looking for between Jan 1979 and March 2020.
max(uwndf$time_final)
## [1] "2020-03-01 UTC"
min(uwndf$time_final)
## [1] "1979-01-01 UTC"
head(uwndf)
## lon lat level time U-wind time_final
## 1 232.5 37.5 10 1569072 3.289998 1979-01-01
## 2 235.0 37.5 10 1569072 5.209998 1979-01-01
## 3 237.5 37.5 10 1569072 7.409998 1979-01-01
## 4 240.0 37.5 10 1569072 9.749998 1979-01-01
## 5 242.5 37.5 10 1569072 12.009998 1979-01-01
## 6 245.0 37.5 10 1569072 14.089998 1979-01-01
I hope this is useful! Cheers!
Note: For converting the netcdf time variable into an R time object make sure you have the ncdf.tools library installed.

How can I improve my code to convert a column of factor into list within a data.frame?

I want to convert a column of factor into lists within a data.frame.
I made it with the code below, but I'm feeling this is not the right way.
How can I improve the code below ?
The data I'm dealing with is a result of association rules.(Using the package: arules) (it's in Japanese)
Here are 3 rows of the column "rules":
rules
{道路構造=交差点_交差点付近,昼間12時間平均旅行速度=20~30km/h,歩道設置率=100%,バス優先.専用レーンの有無=なし} => {事故類型=車両相互_追突}
{道路構造=交差点_交差点付近,昼間12時間平均旅行速度=20~30km/h,バス優先.専用レーンの有無=なし} => {事故類型=車両相互_追突}
{道路構造=交差点_交差点付近,歩道設置率=100%,バス優先.専用レーンの有無=なし,代表沿道状況=人口集中地区(商業地域を除く)} => {事故類型=車両相互_追突}
And str(data)
'data.frame': 50 obs. of 5 variables:
$ rules : Factor w/ 50 levels "{道路構造=交差点_交差点付近,バス優先.専用レーンの有無=なし,指定最高速度=50} => {事故類型=車両相互_追突}",..: 9 8 35 38 10 31 11 25 3 7 ...
$ support : Factor w/ 48 levels "0.050295052",..: 5 14 5 10 24 1 30 13 15 18 ...
$ confidence: Factor w/ 50 levels "0.555131629",..: 50 49 48 47 46 45 44 43 42 41 ...
$ lift : Factor w/ 50 levels "1.894879112",..: 50 49 48 47 46 45 44 43 42 41 ...
$ count : Factor w/ 48 levels "1013","1250",..: 9 18 9 14 28 5 34 17 19 22 ...
# convert factor to character
data %>% mutate_if(is.factor, as.character) -> data
# delete the RHS in rules(the part after '=>' )
data$rules <- strsplit(data$rules, " =>")
i = 1
for (i in 1:length(data$rules)) {
data$rules[[i]] <- data$rules[[i]][[-2]]
}
# delete "{" and "}"
data$rules <- as.character(data$rules)
data$rules <- strsplit(data$rules, "[{]")
i = 1
for (i in 1:length(data$rules)) {
data$rules[[i]] <- data$rules[[i]][[-1]]
}
data$rules <- as.character(data$rules)
data$rules <- strsplit(data$rules, "[}]")
# split character to list (:length(data$rules[[1]] -> 4))
data$rules <- as.character(data$rules)
data$rules <- strsplit(data$rules, ",")
The output should be like this:
[[1]]
[1] "道路構造=交差点_交差点付近" "昼間12時間平均旅行速度=20~30km/h" "歩道設置率=100%" "バス優先.専用レーンの有無=なし"
[[2]]
[1] "道路構造=交差点_交差点付近" "昼間12時間平均旅行速度=20~30km/h" "バス優先.専用レーンの有無=なし"
[[3]]
[1] "道路構造=交差点_交差点付近" "歩道設置率=100%" "バス優先.専用レーンの有無=なし"
[4] "代表沿道状況=人口集中地区(商業地域を除く)"
My code did work, however, I just feel it's not beautiful, or efficient.
So could you improve it. Or, the right way to do this work.
We can use str_extract
library(stringr)
library(dplyr)
out <- data %>%
mutate(rules = trimws(str_extract(rules, "(?<=\\{)[^}]+")))
out$rules
#[1] "道路構造=交差点_交差点付近,昼間12時間平均旅行速度=20~30km/h,歩道設置率=100%,バス優先.専用レーンの有無=なし"
#[2] "道路構造=交差点_交差点付近,昼間12時間平均旅行速度=20~30km/h,バス優先.専用レーンの有無=なし"
#[3] "道路構造=交差点_交差点付近,歩道設置率=100%,バス優先.専用レーンの有無=なし,代表沿道状況=人口集中地区(商業地域を除く)"
If we want to split the 'rules' by , and create a list column
out$rules <- str_split(out$rules, ",")
data
data <- structure(list(rules = c("{道路構造=交差点_交差点付近,昼間12時間平均旅行速度=20~30km/h,歩道設置率=100%,バス優先.専用レーンの有無=なし} => {事故類型=車両相互_追突}",
"{道路構造=交差点_交差点付近,昼間12時間平均旅行速度=20~30km/h,バス優先.専用レーンの有無=なし} => {事故類型=車両相互_追突}",
"{道路構造=交差点_交差点付近,歩道設置率=100%,バス優先.専用レーンの有無=なし,代表沿道状況=人口集中地区(商業地域を除く)} => {事故類型=車両相互_追突}"
)), class = "data.frame", row.names = c(NA, -3L))

Writing functions: Creating data processing functions with R software [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
Hello fellow "R" users!
Please spare me some of your time on helping me with the use of "R" software(Beginner) regarding "Data processing function", wherein I have three (3) different .csv files named "x2013, x2014, x2015" that has the same 6 columns as per respective year based on the image below: Problem and started typing the commands:
filenames=list.files()
library(plyr)
install.packages("plyr")
import.list=adply(filenames,1,read.csv)
Although I just really wanted to summarize all the calls from the three source (csv). Any kind of help would be appreciated. Thank you for assisting me!
If you want to summarize the results of read.csv into one data.frame you can use the following approach with do.call and rbind, given that csv-files has the same amount of columns. The code below takes all csv files (the amount of columns should be the same) from the project home directory and concatenate into one data.frame:
# simulation of 3 data.frames with 6 columns and 10 rows
df1 <- as.data.frame(matrix(1:(10 * 6), ncol = 6))
df2 <- df1 * 2
df3 <- df1 * 3
write.csv(df1, "X2012.csv")
write.csv(df2, "X2013.csv")
write.csv(df3, "X2014.csv")
# Load all csv files from home directory
filenames <- list.files(".", pattern = "csv$")
import.list<- lapply(filenames, read.csv)
# concatenate list of data.frames into one data.frame
df_res <- do.call(rbind, import.list)
str(df_res)
Output is a data.frame with 6 columns and 30 rows:
'data.frame': 30 obs. of 7 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ V1: int 1 2 3 4 5 6 7 8 9 10 ...
$ V2: int 11 12 13 14 15 16 17 18 19 20 ...
$ V3: int 21 22 23 24 25 26 27 28 29 30 ...
$ V4: int 31 32 33 34 35 36 37 38 39 40 ...
$ V5: int 41 42 43 44 45 46 47 48 49 50 ...
$ V6: int 51 52 53 54 55 56 57 58 59 60 ...

Observations becoming NA when ordering levels of factors in R with ordered()

Hi have a longitudinal data frame p that contains 4 variables and looks like this:
> head(p)
date.1 County.x providers beds price
1 Jan/2011 essex 258 5545 251593.4
2 Jan/2011 greater manchester 108 3259 152987.7
3 Jan/2011 kent 301 7191 231985.7
4 Jan/2011 tyne and wear 103 2649 143196.6
5 Jan/2011 west midlands 262 6819 149323.9
6 Jan/2012 essex 2 27 231398.5
The structure of my variables is the following:
'data.frame': 259 obs. of 5 variables:
$ date.1 : Factor w/ 66 levels "Apr/2011","Apr/2012",..: 23 23 23 23 23 24 24 24 25 25 ...
$ County.x : Factor w/ 73 levels "avon","bedfordshire",..: 22 24 32 65 67 22 32 67 22 32 ...
$ providers: int 258 108 301 103 262 2 9 2 1 1 ...
$ beds : int 5545 3259 7191 2649 6819 27 185 24 70 13 ...
$ price : num 251593 152988 231986 143197 149324 ...
I want to order date.1 chronologically. Prior to apply ordered(), this variable does not contain NA observations.
> summary(is.na(p$date.1))
Mode FALSE NA's
logical 259 0
However, once I apply my function for ordering the levels corresponding to date.1:
p$date.1 = with(p, ordered(date.1, levels = c("Jun/2010", "Jul/2010",
"Aug/2010", "Sep/2010", "Oct/2010", "Nov/2010", "Dec/2010", "Jan/2011", "Feb/2011",
"Mar/2011","Apr/2011", "May/2011", "Jun/2011", "Jul/2011", "Aug/2011", "Sep/2011",
"Oct/2011", "Nov/2011", "Dec/2011" ,"Jan/2012", "Feb/2012" ,"Mar/2012" ,"Apr/2012",
"May/2012", "Jun/2012", "Jul/2012", "Aug/2012", "Sep/2012", "Oct/2012", "Nov/2012",
"Dec/2012", "Jan/2013", "Feb/2013", "Mar/2013", "Apr/2013", "May/2013",
"Jun/2013", "Jul/2013", "Aug/2013", "Sep/2013", "Oct/2013", "Nov/2013",
"Dec/2013", "Jan/2014",
"Feb/2014", "Mar/2014", "Apr/2014", "May/2014", "Jun/2014", "Jul/2014" ,"Aug/2014",
"Sep/2014", "Oct/2014", "Nov/2014", "Dec/2014", "Jan/2015", "Feb/2015", "Mar/2015",
"Apr/2015","May/2015", "Jun/2015" ,"Jul/2015" ,"Aug/2015", "Sep/2015", "Oct/2015",
"Nov/2015")))
It seems I miss some observations.
> summary(is.na(p$date.1))
Mode FALSE TRUE NA's
logical 250 9 0
Has anyone come across with this problem when using ordered()? or alternatively, is there any other possible solution to group my observations chronologically?
It is possible that one of your p$date.1 doesn't matched to any of the levels. Try this ord.monas the levels.
ord.mon <- do.call(paste, c(expand.grid(month.abb, 2010:2015), sep = "/"))
Then, you can try this to see if there's any mismatch between the two.
p$date.1 %in% ord.mon
Last, You can also sort the data frame after transforming the date.1 columng into Date (Note that you have to add an actual date beforehand)
p <- p[order(as.Date(paste0("01/", p$date.1), "%d/%b/%Y")), ]

Resources