Plot() only one column of data - r

I use the following code in r to read a CSV file of stock prices.
library(quantmod)
#column headings ("open","high","low","close","volume","adj.")
fmt <- '%Y-%m-%d'
SPY <- read.zoo("~/Stocks/csv/SPY.csv",header=TRUE,sep=',',tz='',format=fmt,index=0:1)
plot(SPY['open'])
I can successfully use plot(SPY) to plot all columns.
How would I select just one column by name, for example plot just the "open" column? I've tried a bunch of things such as plot(SPY['open']) but can't figure it out.
Could somebody help? Many thanks!

Try:
plot(SPY[,'open']
The square brackets method of selecting a subset requires two expressions: first, one describing the rows, and second, one describing the columns. These two expressions are separated by a comma. When you want to include all the rows, just leave a blank before the comma, and specify the name of the column you want.
Your code, with only one expression, treats 'open' as a row, not a column. The result is probably a strip chart, a one-dimensional graph, instead of the plot you were expecting.

Related

How to skip empty rows while reading multiple tabs in R?

I am trying to read an excel file with multiple tabs. For that, I use the code provided here.
The problem is that each tab has a different number of empty rows before the actual data begins. For example, the first tab has two empty rows, the second tab has three empty rows, and so on.
Normally, I would use the parameter skip in the read_excel function to indicate the number of empty lines to skip. But how do I do that for multiple tabs with different numbers of rows to skip?
perhaps the easiest solution would be to read it as it is then remove rows, i.e. yourdata <- yourdata[!is.na(yourdata$columname),] ; this would work if you don't expect any NA's in a particular column, like id. If you have data gaps everywhere you can test for all NAs in multiple columns - let me know if that's what you need.

Adding new column with data value in R

forest area to the I want to add a column name (say ForestAreaPerPopn) to find the ratio of forest area to the population(represented by variable Total below) residing. The data contains the following variables and their values.
How can I add a column named ForestAreaPerPopn in Table****ForestAreaPerPop (shown below) so that the column contains the data calculated as ratio of forest area to Total.
Too long for a comment.
You have a couple of problems. First, your column names have spaces and other special characters. This is allowed but creates all kinds of problems later. I suggest you do something like:
colnames(ForestAreaPerPop) <- gsub(' |\\(|\\)', '_', colnames(ForestAreaPerPop))
This will replaces any spaces, left or right parens in the colnames with '_'.
Then, something like:
ForestAreaPerPop$n <- with(ForestAreaPerPop, Forest_Area_in_ha/Total)
should give you what you want.
Some advice: long table names and column names may seem like a good idea, but you will live to regret it. Make them short but meaningful (easier said than done).

Why does R think my imported vector of characters are numbers?

This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks
You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.

data frame names as word in quotation mark

I have two part question, both concerns working with data frames names:
I want to concatenate two dfs names with separator, for example: df1 and df2 to be "df1_&_df2"
I want make R to read data frame name as character in quotation marks so my df is called df1 and I in certain parts of my code I want it to be "df1".
When it come to 1st part I tried paste but it pasted entire data in both dfs and names concerns column names.
In the 2nd issue, being able to make R understand df name as quotation marked word is very handy in code for more complex charts, I simply put dfs into code and R makes chart title out of it. I understand there is very simple workaround here, I can create list of names manually list=c("df1", "df2") and then just use function get in places where I need to refer to content of data frame instead of its name, but it seems little inconvenient in the long run. Is there any function in R which output is just df name? Something that looks like GiveMeName(df) and the output is "df"? (I wrote this in normal font intentionally, so no one would thought this is real function)
For #1, you'll have to give a use case for me to understand your goal.
For #2, you can use deparse(substitute(df1)). Here's an example:
plot_and_title <- function(df1) {
data_name <- deparse(substitute(df1))
plot(df1[[1]], df1[[2]], main = data_name)
}
plot_and_title(mtcars)
Adding on to the answer by #Nathan Werth, you can concatenate names using:
paste(deparse(substitute(df1)), deparse(substitute(df2)), sep="_&_")

What's the easiest way to ignore one row of data when creating a histogram in R?

I have this csv with 4000+ entries and I am trying to create a histogram of one of the variables. Because of the way the data was collected, there was a possibility that if data was uncollectable for that entry, it was coded as a period (.). I still want to create a histogram and just ignore that specific entry.
What would be the best or easiest way to go about this?
I tried making it so that the histogram would only use the data for every entry except the one with the period by doing
newlist <- data1$var[1:3722]+data1$var[3724:4282]
where 3723 is the entry with the period, but R said that + is not meaningful for factors. I'm not sure if I went about this the right way, my intention was to create a vector or list or table conjoining those two subsets above into one bigger list called newlist.
Your problem is deeper that you realize. When R read in the data and saw the lone . it interpreted that column as a factor (categorical variable).
You need to either convert the factor back to a numeric variable (this is FAQ 7.10) or reread the data forcing it to read that column as numeric, if you are using read.table or one of the functions that calls read.table then you can set the colClasses argument to specify a numeric column.
Once the column of data is a numeric variable then a negative subscript or !is.na will work (or some functions will automatically ignore the missing value).

Resources