Loop in R to create and save series of ggplot2 plots with specified names - r

I have a data frame in R with POSIXct variable sessionstarttime. Each row is identified by integer ID variable of a specified location . Number of rows is different for each location. I plot overall graph simply by:
myplot <- ggplot(bigMAC, aes(x = sessionstarttime)) + geom_freqpoly()
Is it possible to create a loop that will create and save such plot for each location separately?
Preferably with a file name the same as value of ID variable?
And preferably with the same time scale for each plot?

Not entirely sure what you're asking but you can do one of two things.
a) You can save each individual plot in a loop with a unique name based on ID like so:
ggsave(myplot,filename=paste("myplot",ID,".png",sep="")) # ID will be the unique identifier. and change the extension from .png to whatever you like (eps, pdf etc).
b) Just assign each plot to an element of a list. Then write that list to disk using save
That would make it very easy to load and access any individual plot at a later time.

I am not sure if I get what you want to do. From what I guess, i suggest to write a simple function that saves the plot. and then use lapply(yourdata,yourfunction,...) . Since lapply can be used for lists, it´s not necessary that the number of rows is equal.
HTH
use something like this in your function:
ggsave(filename,scale=1.5)

Related

Efficient way of extracting names of a large number of variables in R

It could be a very easy question, given that I am very unfamiliar with R. I know normally one can use deparse(substitute(.)) to extract the name of a variable. However, if I have a long list of variables (let's say it's built without names), how can I extract the name of each variable efficiently? I was thinking about using loops, but the deparse(substitute(.)) method would obviously generate the 'general' variable name we used to denote every item.
Sample code:
countries<-
list(austria,belgium,czech,denmark,france,germany,italy,luxemberg,netherlands,poland,swiss)
Suppose I want to get countryNames equals to list("austria","belgium",...,"swiss"), how shall I code? I tried generating the list using countries <- list(countryA = countryA, countryB = countryB, ...), but it was extremely tedious, and in some cases I might only have an unnamed input list from elsewhere.
countries would just have values of each individual objects (austria,belgium etc.). To access the names you need to create a named list while creating countries which can be done like :
countries <- list(austria = austria,belgium = belgium....)
However, if this is very tedious you can use tibble::lst which creates the names automatically without explicitly mentioning them.
countries <- tibble::lst(austria,belgium....)
In both the case you can access the names using names(countries).
If the country objects are the only ones loaded in the global environment, we can do this easily with ls and mget to return a named list of values
countries <- mget(ls())

Updating a File in R by adding a column/vector

Is there any way that I can update an existing .csv file by adding a column/vector that I have scraped from the web. I have a webscraper that pulls COVID-19 data and I am trying to create a file that has positive cases in columns and each column is the list of cases for a day in each county (x-axis is counties, y-axis is date). I have toyed around with many different ideas at this point and seem to have hit a roadblock. I'm fairly new to r so any ideas would be appreciated!
Packages I am Currently Using/Planning to Use:
library(tidyverse)
library(funModeling)
library(Hmisc)
library(rvest)
library(ggplot2)
CODE:
#writing the original file
positive <- data.frame(Counties= counties_list, "06/12/2020"= positive_data)
positive[is.na(positive)]= 0
positive = positive[-c(76),]
write.csv(positive, "C:/Users/Nathan May/Desktop/Research Files (ABI)/Covid/Data For
Shiny/Positive/Positive Data.csv")
#creating the new vector and updating the existing file with it
datap <- read.csv("C:/Users/Nathan May/Desktop/Research Files (ABI)/Covid/Data For
Shiny/Positive/Positive Data.csv")
positive_data = positive_data[-c(76),]
datap$DATE <- positive_data
NOTE: The end goal is to create a ShinyApp that displays bar charts for postives, recoveries, and deaths by day in each county. This is the data wrangling portion.
First things first, if you are going to use the tidyverse, use tibble instead of data.frame. Tibbles are the Tidyverse version of data frames.
Next, be aware of the structure of your data frame. The way you create your data.frame now (and later probably your tibble) you get a variable "Counties" and one additional variable for each day. That means that you will have to add columns as time passes (the opposite of what you described: Moving along the x axis (along columns) will move along dates while moving along the y-axis (moving along rows) will move along counties). It's possible but I think a bit unconventional. You might want to initialize your data frame with one column for each county and an additional variable called "date". Then whenever you get new data you can add a row in your dataframe instead of a column (so you're "adding a new case" instead of "adding a new variable").
To actually add the row you will have to load the data as you do in your code, create the new row (or column, if you insist) and then "glue" it to the rest of the data.
Depending on how your data looks you can create a single row dataframe using tibble_row() with the same countries as variable names as you have in your main data frame and then glue them together with add_row(datap, your_new_row). Alternatively, if you want to add the row only using position and not column names, you can have the new row as a vector and use rbind() instead of add_row.
If you persist with the "one variable per date" approach there's column equivalents (add_column and cbind) for both these functions.
Hope this helps, Cheers

Efficient way to review formulas that generate named objects in R

If I have a named object (in my case a named plot) in R, is there an efficient way to double check the formula that generated it? As of now I am scrolling back through the console, but I'm hoping that there is a more efficient way.
For example, at the start of my project I input
Boxplot <- ggplot(plotting input) + geom_boxplot(plotting input)
Now I can call Boxplot by name to plot it, but I want to be able to efficiently review my ggplot input. Is there a tool to do this?
For your example, you can see the elements of Boxplot using:
names(Boxplot)
So you can see, for example, the input data using:
Boxplot$data
Or the parameters and type of the plot using:
Boxplot$layers

What's the easiest way to ignore one row of data when creating a histogram in R?

I have this csv with 4000+ entries and I am trying to create a histogram of one of the variables. Because of the way the data was collected, there was a possibility that if data was uncollectable for that entry, it was coded as a period (.). I still want to create a histogram and just ignore that specific entry.
What would be the best or easiest way to go about this?
I tried making it so that the histogram would only use the data for every entry except the one with the period by doing
newlist <- data1$var[1:3722]+data1$var[3724:4282]
where 3723 is the entry with the period, but R said that + is not meaningful for factors. I'm not sure if I went about this the right way, my intention was to create a vector or list or table conjoining those two subsets above into one bigger list called newlist.
Your problem is deeper that you realize. When R read in the data and saw the lone . it interpreted that column as a factor (categorical variable).
You need to either convert the factor back to a numeric variable (this is FAQ 7.10) or reread the data forcing it to read that column as numeric, if you are using read.table or one of the functions that calls read.table then you can set the colClasses argument to specify a numeric column.
Once the column of data is a numeric variable then a negative subscript or !is.na will work (or some functions will automatically ignore the missing value).

CSV file to Histogram in R

I'm a total newbie with R, and I'm trying to create a histogram (with value and frequency as the axises) from a csv file (just one row of values). Any idea how I can do this?
I'm also an R newbie, and I ran into the same thing. I made two separate mistakes, actually, so I'll describe them both here.
Mistake 1: Passing a frequency table to hist(). Originally I was trying to pass a frequency table to hist() instead of passing in the raw data. One way to fix this is to use the rep() ("replicate") function to explode your frequency table back into a raw dataset, as described here:
Creating a histogram using aggregated data
Simple R (histogram) from counted csv file
Instead of that, though, I just decided to read in my original dataset instead of the frequency table.
Mistake 2: Wrong data type. My raw data CSV file contains two columns: hostname and bookings (idea is to count the number of bookings each host generated during some given time period). I read it into a table.
> tbl <- read.csv('bookingsdata.csv')
Then when I tried to generate a histogram off the second column, I did this:
> hist(tbl[2])
This gave me the "'x' must be numeric" error you mention in a comment. (It was trying to read the "bookings" column header in as a data value.)
This fixed it:
> hist(tbl$bookings)
You should really start to read some basic R manual...
CRAN offers a lot of them (look into the Manuals and Contributed sections)
In any case:
setwd("path/to/csv/file")
myvalues <- read.csv("filename.csv")
hist(myvalues, 100) # Example: 100 breaks, but you can specify them at will
See the manual pages for those functions for more help (accessible through ?read.table, ?read.csv and ?hist).
To plot the histogram, the values must be of numeric class i.e the data must be of numeric value. Here the value of x seems to be of some other class.
Run the following command and see:
sapply(myvalues[1,],class)

Resources