Loop over number preceded by underscore symbol in R - r

I apologize in advance but I did not find what I need in previous topic-related posts.
Suppose that I have the following data. "bchain" is a dataframe of 2192 observations. The column "Date" contains dates from 2011/01/01 to 2016/12/31. The column "Value" contains daily exchange rates.
>bchain
Date Value
1 2011-01-01 0.299998
2 2011-01-02 0.299996
3 2011-01-03 0.299998
4 2011-01-04 0.299899
5 2011-01-05 0.298998
6 2011-01-06 0.299000
7 2011-01-07 0.322000
8 2011-01-08 0.322898
. ....... .......
What I want to do is to visualize the exchange rates year by year in separate plots and save the six graphs on my desktop by using a "for" loop. Consider this simple following pseudo-code which I built around this post content:
https://www.r-bloggers.com/automatically-save-your-plots-to-a-folder/
PSEUDO-CODE:
Date_2011=bchain[1:365,1]
Date_2012=bchain[366:731,1]
Date_2013=bchain[732:1096,1]
Date_2014=bchain[1097:1461,1]
Date_2015=bchain[1462:1826,1]
Date_2016=bchain[1827:2192,1]
bchain_2011=bchain[1:365,2]
bchain_2012=bchain[366:731,2]
bchain_2013=bchain[732:1096,2]
bchain_2014=bchain[1097:1461,2]
bchain_2015=bchain[1462:1826,2]
bchain_2016=bchain[1827:2192,2]
years=2011:2016
for(i in years){
mypath = file.path("C:/Users/toshiba1/Desktop",paste("myplot_", years[i], ".jpg", sep = ""))
jpeg(file=mypath)
mytitle = paste("my title is", years[i])
plot(Date_[i],bchain_[i], main = mytitle)
dev.off()
}
Then I get the following error message: object "Date_" not found. I suspect that the problem is that the above loop does not recognize the numbers which come after the underscore sign. So, any suggestion?
Thank you in advance.

Here is another approach avoiding the need to make the year-specific data frames. I used the lubridate package to extract the year from the date values, generated a data.frame of that year, and plotted those data. As #Konrad also pointed out, the way in which you call some of the objects is giving you issues - I cleaned up some of those in your paste statements below.
library(lubridate)
# Create toy data to plot
bchain <- data.frame(Date = seq.Date(from = as.Date("2011-01-01"), to = as.Date("2016-12-31"),
by = 1),
Value = runif(2192, 0, 1))
years <- 2011:2016
for(i in years){
# Create dataset of just data to plot
bchain_plot <- bchain[year(bchain$Date) == i, ]
# Edited file name w/i jpeg call and fixed paste statement
jpeg(filename=paste0("C:/Users/toshiba1/Desktop/myplot_", i, ".jpg"))
# Plot data w/ title included in plot call
plot(bchain_plot$Date, bchain_plot$Value, main = paste("my title is", i))
dev.off()
}

You should call your object properly one approach may involve making use of get on the lines:
# Now plot data number i
x <- get(paste("Date", i, sep = "_"))
# Plot
plot(x)
or simply by nesting:
plot(get(paste("Date", i, sep = "_")))
To test it, see what happens if you type Date_[i] in R console? Are you getting the object you want to pass to the plot function? Arrive at the desired object via get or any other mechanism that suits you and then pass it to the plotting function.
I reckon that you want to iterate through your objects - you need i not [i]. Type [i] in the R console and see what happens.

Related

Reading ZIP file of machine-written data won't "plot" in RStudio

Summary: Despite a complicated lead-up, the solution was very simple: In order to plot a row of a dataframe as a line instead of a lattice, I needed to transpose the data in order to invert from x obs of y variables to y obs of x variables.
I am using RStudio on a Windows 10 computer.
I am using scientific equipment to write measurements to a csv file. Then I ZIP several files and read to R using read.csv. However, the data frame behaves strangely. Commands "length" and "dim" disagree and the "plot" function throws errors. Because I can create simulated data that doesn't throw the errors, I think the problem is either in how the machine wrote the data or in my loading and processing of the data.
Two ZIP files are located in my stackoverflow repository (with "Monterey Jack" in the name):
https://github.com/baprisbrey/stackoverflow
Here is my code for reading and processing them:
# Unzip the folders
unZIP <- function(folder){
orig.directory <- getwd()
setwd(folder)
zipped.folders <- list.files(pattern = ".*zip")
for (i in zipped.folders){
unzip(i)}
setwd(orig.directory)
}
folder <- "C:/Users/user/Documents/StackOverflow"
unZIP(folder)
# Load the data into a list of lists
pullData <- function(folder){
orig.directory <- getwd()
setwd(folder)
#zipped.folders <- list.files(pattern = ".*zip")
#unzipped.folders <- list.files(folder)[!(list.files(folder) %in% zipped.folders)]
unzipped.folders <- list.dirs(folder)[-1] # Removing itself as the first directory.
oData <- vector(mode = "list", length = length(unzipped.folders))
names(oData) <- str_remove(unzipped.folders, paste(folder,"/",sep=""))
for (i in unzipped.folders) {
filenames <- list.files(i, pattern = "*.csv")
#setwd(paste(folder, i, sep="/"))
setwd(i)
files <- lapply(filenames, read.csv, skip = 5, header = TRUE, fileEncoding = "UTF-16LE") #Note unusual encoding
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- vector(mode="list", length = length(files))
oData[[str_remove(i, paste(folder,"/",sep=""))]] <- files
}
setwd(orig.directory)
return(oData)
}
theData <- pullData(folder) #Load the data into a list of lists
# Process the data into frames
bigFrame <- function(bigList) {
#where bigList is theData is the result of pullData
#initialize the holding list of frames per set
preList <- vector(mode="list", length = length(bigList))
names(preList) <- names(bigList)
# process the data
for (i in 1:length(bigList)){
step1 <- lapply(bigList[[i]], t) # transpose each data
step2 <- do.call(rbind, step1) # roll it up into it's own matrix #original error that wasn't reproduced: It showed length(step2) = 24048 when i = 1 and dim(step2) = 48 501. Any comments on why?
firstRow <- step2[1,] #holding onto the first row to become the names
step3 <- as.data.frame(step2) # turn it into a frame
step4 <- step3[grepl("µA", rownames(step3)),] # Get rid of all those excess name rows
rownames(step4) <- 1:(nrow(step4)) # change the row names to rowID's
colnames(step4) <- firstRow # change the column names to the first row steps
step4$ID <- rep(names(bigList[i]),nrow(step4)) # Add an I.D. column
step4$Class[grepl("pos",tolower(step4$ID))] <- "Yes" # Add "Yes" class
step4$Class[grepl("neg",tolower(step4$ID))] <- "No" # Add "No" class
preList[[i]] <- step4
}
# bigFrame <- do.call(rbind, preList) #Failed due to different number of measurements (rows that become columns) across all the data sets
# return(bigFrame)
return(preList) # Works!
}
frameList <- bigFrame(theData)
monterey <- rbind(frameList[[1]],frameList[[2]])
# Odd behaviors
dim(monterey) #48 503
length(monterey) #503 #This is not reproducing my original error of length = 24048
rowOne <- monterey[1,1:(ncol(monterey)-2)]
plot(rowOne) #Error in plot.new() : figure margins too large
#describe the data
quantile(rowOne, seq(0, 1, length.out = 11) )
quantile(rowOne, seq(0, 1, length.out = 11) ) %>% plot #produces undesired lattice plot
# simulate the data
doppelganger <- sample(1:20461,501,replace = TRUE)
names(doppelganger) <- names(rowOne)
# describe the data
plot(doppelganger) #Successful scatterplot. (With my non-random data, I want a line where the numbers in colnames are along the x-axis)
quantile(doppelganger, seq(0, 1, length.out = 11) ) #the random distribution is mildly different
quantile(doppelganger, seq(0, 1, length.out = 11) ) %>% plot # a simple line of dots as desired
# investigating structure
str(rowOne) # results in a dataframe of 1 observation of 501 variables. This is a correct interpretation.
str(as.data.frame(doppelganger)) # results in 501 observations of 1 variable. This is not a correct interpretation but creates the plot that I want.
How do I convert the rowOne to plot like doppelganger?
It looks like one of my errors is not reproducing, where calls to "dim" and "length" apparently disagree.
However, I'm confused as to why the "plot" function is producing a lattice plot on my processed data and a line of dots on my simulated data.
What I would like is to plot each row of data as a line. (Next, and out of the scope of this question, is I would like to classify the data with adaboost. My concern is that if "plot" behaves strangely then the classifier won't work.)
Any tips or suggestions or explanations or advice would be greatly appreciated.
Edit: Investigating the structure with ("str") of the two examples explains the difference between plots. I guess my modified question is, how do I switch between the two structures to enable plotting a line (like doppelganger) instead of a lattice (like rowOne)?
I am answering my own question.
I am leaving behind the part about the discrepancy between "length" and "dim" since I can't provide a reproducible example. However, I'm happy to leave up for comment.
The answer is that in order to produce my plot, I simply have to transpose the row as follows:
rowOne %>% t() %>% as.data.frame() %>% plot
This inverts the structure from one observation of 501 variables to 501 obs of one variable as follows:
rowOne %>% t() %>% as.data.frame() %>% str()
#'data.frame': 501 obs. of 1 variable:
# $ 1: num 8712 8712 8712 8712 8712 ...
Because of the unusual encoding I used, and the strange "length" result, I failed to see a simple solution to my "plot" problem.

String Conversion- For loop on categorical variables

Hi all I am a novice to R and appreciate your hints on this case.
I've been struggling to convert the variables (objects) in my dataframe to strings and plot them using a for loop, as detailed below.
COUNTRY: China Belgium ...
COMPANY: XXX Inc. YYY Inc. ...
Here, COUNTRY and COMPANY are categorical variables.
I've used toString() as well as as.character() to convert variable name to a string so I can specify the plot name but I cant seem to get it to work. I need 4 variable as listed in code below in for loop for 2 purposes:
as String for naming plot
use in barplot()
but neither string conversion nor the for loop is working properly as I meant to.
Could somebody assist me with the proper command for this purpose?
Your help is greatly appreciated...
Kind regards,
CODE
Frequency_COUNTRY <- table(COUNTRY)#Get Frequency for COUNTRY
Relative_Frequency_COUNTRY <- table(COUNTRY) / length(COUNTRY)#Get Relative
#Frequency (Percentage %) for Variable COUNTRY
Frequency_COMPANY <- table(COMPANY) #Get Frequency and Relative Frequency for COMPANY
Relative_Frequency_COMPANY <- table(COMPANY) / length(COMPANY)
Categorical_Variable_List = c(Frequency_COUNTRY,
Relative_Frequency_COUNTRY ,
Frequency_COMPANY,
Relative_Frequency_COMPANY)`# Get list of 4 variables above
for (Categorical_Variable in Categorical_Variable_List){#Plot 4 variables using a for loop
A = toString(Categorical_Variable) #Trying to convert non-string variable name to string
plotName <- paste("BarChart_", A, sep = "_")# Specify plot name, e.g. BarChart_Frequency_COUNTRY
png(file = plotName)#Create png file
barplot(Categorical_Variable) #use barplot() to make graph
dev.off()`# Switch off dev
}
Your code is treating Categorical_Variable_List as if it were a named list of categorical variables. It is neither.
The following code corrects those errors and plots a graph of 4 barplots. In your code, remove the two calls to par, one before and the other after the for loop.
I will make up a dataset, to test the code.
set.seed(1234)
n <- 20
COUNTRY <- sample(LETTERS[1:5], n, TRUE)
COMPANY <- sample(letters[1:4], n, TRUE)
Frequency_COUNTRY <- table(COUNTRY) # Get Frequency for COUNTRY
Relative_Frequency_COUNTRY <- table(COUNTRY) / length(COUNTRY)#Get Relative
# Frequency (Percentage %) for Variable COUNTRY
Frequency_COMPANY <- table(COMPANY) # Get Frequency and Relative Frequency for COMPANY
Relative_Frequency_COMPANY <- table(COMPANY) / length(COMPANY)
Variable_List <- list(Frequency_COUNTRY = Frequency_COUNTRY,
Relative_Frequency_COUNTRY = Relative_Frequency_COUNTRY,
Frequency_COMPANY = Frequency_COMPANY,
Relative_Frequency_COMPANY = Relative_Frequency_COMPANY) # Get list of 4 variables above
Variable_Name <- names(Variable_List)
old_par <- par(mfrow = c(2, 2))
for (i in seq_along(Variable_List)){ # Plot 4 variables using a for loop
plotName <- paste("BarChart", Variable_Name[[i]], sep = "_") # Specify plot name
print(plotName) # for debugging only
#png(file = plotName) # Create png file
barplot(Variable_List[[i]]) # use barplot() to make graph
#dev.off() # Switch off dev
}
par(old_par)

Moving window over zoo time series in R

I'm running into issues while applying a moving window function to a time series dataset. I've imported daily streamflow data (date and value) into a zoo object, as approximated by the following:
library(zoo)
df <- data.frame(sf = c("2001-04-01", "2001-04-02", "2001-04-03", "2001-04-04",
"2001-04-05", "2001-04-06", "2001-04-07", "2001-06-01",
"2001-06-02", "2001-06-03", "2001-06-04", "2001-06-05",
"2001-06-06"),
cfs = abs(rnorm(13)))
zoodf <- read.zoo(df, format = "%Y-%m-%d")
Since I want to calculate the 3-day moving minimum for each month I've defined a function using rollapply:
f.3daylow <- function(x){rollapply(x, 3, FUN=min, align = "center")}
I then use aggregate:
aggregate(zoodf, by=as.yearmon, FUN=f.3daylow)
This promptly returns an error message:
Error in zoo(df, ix[!is.na(ix)]) :
“x” : attempt to define invalid zoo object
The problem appears to be that there are unequal number of data points in each month,since using the same dataframe with an additional date for June results in a correct response. Any suggestions for how to deal with this would be appreciated!
Ok, you might be thinking of something like this then. It pastes the results for each month into one data point, so that it can be returned in the aggregate function. Otherwise you may also have a look at ?aggregate.zoo for some more precise data manipulations.
f.3daylow <- function(x){paste(rollapply(x, 3, FUN=min,
align = "center"), collapse=", ")}
data <- aggregate(zoodf, by=as.yearmon, FUN=f.3daylow)
Returns, this is then a rolling window of 3 copied into 1 data point. To analyse it, eventually you will have to break it down again, so it is not recommended.
Apr 2001
0.124581285281643, 0.124581285281643, 0.124581285281643,
0.342222172241979, 0.518874882033892
June 2001
0.454158221843514, 0.454158221843514, 0.656966528249837,
0.513613009234435
Eventually you can cut it up again via strsplit(data[1],", "), but see Convert comma separated entry to columns for more details.

How to convert rows

I have uploaded a data set which is called as "Obtained Dataset", it usually has 16 rows of numeric and character variables, some other files of similar nature have less than 16 characters, each variable is the header of the data which starts from the 17th row and onwards "in this specific file".
Obtained dataset & Required Dataset
For the data that starts 1st column is the x-axis, 2nd column is y-axis and 3rd column is depth (which are standard for all the files in the database) 4th column is GR 1 LIN, 5th column is CAL 1 LIN so and soforth as given in the first 16 rows of the data.
Now i want an R code which can convert it into the format shown in the required data set, also if a different data set has say less than 16 lines of names say GR 1 LIN and RHOB 1 LIN are missing then i want it to still create a column with NA entries till 1:nrow.
Currently i have managed to export this file to excel and manually clean the data and rename the columns correspondingly and then save it as csv and then read.csv("filename") etc but it is simply not possible to do this for 400 files.
Any advice how to proceed will be of great help.
I have noticed that you have probably posted this question again, and in a different format. This is a public forum, and people are happy to help. However, it's your job to simplify life of others, and you are requested to put in some effort. Here is some advice on that.
Having said that, here is some code I have written to help you out.
Step0: Creating your first data set:
sink("test.txt") # This will `sink` all the output to the file "test.txt"
# Lets start with some dummy data
cat("1\n")
cat("DOO\n")
cat(c(sample(letters,10),"\n"))
cat(c(sample(letters,10),"\n"))
cat(c(sample(letters,10),"\n"))
cat(c(sample(letters,10),"\n"))
# Now a 10 x 16 dummy data matrix:
cat(paste(apply(matrix(sample(160),10),1,paste,collapse = "\t"),collapse = "\n"))
cat("\n")
sink() # This will stop `sink`ing.
I have created some dummy data in first 6 lines, and followed by a 10 x 16 data matrix.
Note: In principle you should have provided something like this, or a copy of your dataset. This would help other people help you.
Step1: Now we need to read the file, and we want to skip the first 6 rows with undesired info:
(temp <- read.table(file="test.txt", sep ="\t", skip = 6))
Step2: Data clean up:
We need a vector with names of the 16 columns in our data:
namesVec <- letters[1:16]
Now we assign these names to our data.frame:
names(temp) <- namesVec
temp
Looks good!
Step3: Save the data:
write.table(temp,file="test-clean.txt",row.names = FALSE,sep = "\t",quote = FALSE)
Check if the solution is working. If it is working, than move to next step, otherwise make necessary changes.
Step4: Automating:
First we need to create a list of all the 400 files.
The easiest way (to explain also) is copy the 400 files in a directory, and then set that as working directory (using setwd).
Now first we'll create a vector with all file names:
fileNameList <- dir()
Once this is done, we'll need to function to repeat step 1 through 3:
convertFiles <- function(fileName) {
temp <- read.table(file=fileName, sep ="\t", skip = 6)
names(temp) <- namesVec
write.table(temp,file=paste("clean","test.txt",sep="-"),row.names = FALSE,sep = "\t",quote = FALSE)
}
Now we simply need to apply this function on all the files we have:
sapply(fileNameList,convertFiles)
Hope this helps!

R: changing column names for improved documentation

I have two csv files. One containing measurements at several points and one containing the description of the single points. It has about a 100 different points and 10000's of measurements but for simplification let's assume there are only two points and measurements.
data.csv:
point1,point2,date
25,80,11.06.2013
26,70,10.06.2013
description.csv:
point,name,description
point1,tempA,Temperature in room A
point2,humidA,Humidity in room A
Now I read both of the csv's into dataframes. Then I change the column names in the dataframe to make it more readable.
options(stringsAsFactors=F)
DataSource <- read.csv("data.csv")
DataDescription <- read.csv("description.csv")
for (name.source in names(DataSource))
{
count = 1
for (name.target in DataDescription$point)
{
if (name.source == name.target)
{
names(DataSource)[names(DataSource)==name.source] <- DataDescription[count,'name']
}
count = count + 1
}
}
So, my questions now are: Is there a way to do this without the loops? And would you change the names for readability as I did or not? If not, why?
The trick with replacements is sometimes to match the indexing on both sides of hte assignment:
names(DataSource)[match(DataDescription$point, names(DataSource))] <-
DataDescription$name[match(DataDescription$point, names(DataSource))]
#> DataSource
tempA humidA date
1 25 80 11.06.2013
2 26 70 10.06.2013
Earlier effort :
names(DataSource)[match(DataDescription$point, names(DataSource))] <-
gsub(" ", "_", DataDescription$description)[
match(DataDescription$point, names(DataSource))]
#> DataSource
Temperature_in_room_A Humidity_in_room_A date
1 25 80 11.06.2013
2 26 70 10.06.2013
Notice that I did not put non-syntactic names on that dataframe. To do so would have been a disservice. Anando Mahto's comment is well considered. I would not want to do this unless it were are the very end of data-processing or a side excursion on the way to a plotting effort. In that case I might not substitute the underscores. In the case where you wanted plotting lables there might be a further need for insertion of "\n" to fold the text within space constraints.
ok, I ordered the columns in the first one and the rows in the second one to work around the problem with the same order of the points. Now the description only need to have the same points as the data source. Here is my final code:
# set options to get strings right
options(stringsAsFactors=F)
# read in original data
DataOriginal <- read.csv("data.csv", sep = ";")
DataDescriptionOriginal <- read.csv("description.csv", sep = ";")
# sort the data
DataOrdered <- DataOriginal[,order(names(DataOriginal))]
DataDescriptionOrdered <- DataDescriptionOriginal[order(DataDescriptionOriginal$points),]
# copy data into final dataframe and replace names
Data <- DataOrdered
names(Data)[match(DataDescriptionOrdered$points, names(Data))] <- gsub(" ", "_", DataDescriptionOrdered$description)[match(DataDescriptionOrdered$points, names(Data))]
Thx a lot to everyone contributing to find a good solution for me!

Resources