Adding new column with data value in R - r

forest area to the I want to add a column name (say ForestAreaPerPopn) to find the ratio of forest area to the population(represented by variable Total below) residing. The data contains the following variables and their values.
How can I add a column named ForestAreaPerPopn in Table****ForestAreaPerPop (shown below) so that the column contains the data calculated as ratio of forest area to Total.

Too long for a comment.
You have a couple of problems. First, your column names have spaces and other special characters. This is allowed but creates all kinds of problems later. I suggest you do something like:
colnames(ForestAreaPerPop) <- gsub(' |\\(|\\)', '_', colnames(ForestAreaPerPop))
This will replaces any spaces, left or right parens in the colnames with '_'.
Then, something like:
ForestAreaPerPop$n <- with(ForestAreaPerPop, Forest_Area_in_ha/Total)
should give you what you want.
Some advice: long table names and column names may seem like a good idea, but you will live to regret it. Make them short but meaningful (easier said than done).

Related

How can I make two key columns from the different part of the column names in R?

I am going to do repeated measures ANOVA on my data, but to this point, my data is wide. Two independent (categorical) variables are spread across single responsive variable.
See the image: https://imgur.com/1eTWSIM
I want to create two categorical variables that take values from the different parts of the columns (circled on the screenshot). Subject numbers should be kept as a category. So after using gather() function, the data should look something like this:
https://imgur.com/SGM2N69
I've seen in a tutorial (that I can't find anymore) that you can create two columns from a single function, using different parts of the colnames (using "_" as a separator), but I can't exactly remember how it was done.
Any help would be appreciated and ask if anythings is not clear in my explanation.
I have solved the problem by using 'gather()' function first and then 'separate()' to separate it into two new columns. So I guess, if you want to make two key columns, first you have to make a single column containing both values and later separate it into two.
At least that is how I did it.

Why does R think my imported vector of characters are numbers?

This is probably a basic question, but why does R think my vector, which has a bunch of words in it, are numbers when I try to use these vectors as column names?
I imported a data set and it turns out the first row of data are the column headers that I want. The column headers that came with the data set are wrong ones. So I want to replace the column names. I figured this should be easy.
So what I did was I extracted the first row of data into a new object:
names <- data[1,]
Then I deleted the first row of data:
data <- data[-1,]
Then I tried to rename the column headers with the "names" object:
colnames(data) <- names
However, when I do this, instead of changing my column names to the words within the names object, it turns it into a bunch of numbers. I have no idea where these numbers come from.
Thanks
You need to actually show us the data, and the read.csv()/read.table() command you used to import.
If R thinks your numeric column is string, it sounds like that's because it wrongly includes the column name, i.e. you omitted header=TRUE in your read.csv()/read.table() import.
But show us your actual data and commands used.

Plot() only one column of data

I use the following code in r to read a CSV file of stock prices.
library(quantmod)
#column headings ("open","high","low","close","volume","adj.")
fmt <- '%Y-%m-%d'
SPY <- read.zoo("~/Stocks/csv/SPY.csv",header=TRUE,sep=',',tz='',format=fmt,index=0:1)
plot(SPY['open'])
I can successfully use plot(SPY) to plot all columns.
How would I select just one column by name, for example plot just the "open" column? I've tried a bunch of things such as plot(SPY['open']) but can't figure it out.
Could somebody help? Many thanks!
Try:
plot(SPY[,'open']
The square brackets method of selecting a subset requires two expressions: first, one describing the rows, and second, one describing the columns. These two expressions are separated by a comma. When you want to include all the rows, just leave a blank before the comma, and specify the name of the column you want.
Your code, with only one expression, treats 'open' as a row, not a column. The result is probably a strip chart, a one-dimensional graph, instead of the plot you were expecting.

What's the easiest way to ignore one row of data when creating a histogram in R?

I have this csv with 4000+ entries and I am trying to create a histogram of one of the variables. Because of the way the data was collected, there was a possibility that if data was uncollectable for that entry, it was coded as a period (.). I still want to create a histogram and just ignore that specific entry.
What would be the best or easiest way to go about this?
I tried making it so that the histogram would only use the data for every entry except the one with the period by doing
newlist <- data1$var[1:3722]+data1$var[3724:4282]
where 3723 is the entry with the period, but R said that + is not meaningful for factors. I'm not sure if I went about this the right way, my intention was to create a vector or list or table conjoining those two subsets above into one bigger list called newlist.
Your problem is deeper that you realize. When R read in the data and saw the lone . it interpreted that column as a factor (categorical variable).
You need to either convert the factor back to a numeric variable (this is FAQ 7.10) or reread the data forcing it to read that column as numeric, if you are using read.table or one of the functions that calls read.table then you can set the colClasses argument to specify a numeric column.
Once the column of data is a numeric variable then a negative subscript or !is.na will work (or some functions will automatically ignore the missing value).

Cleansing an excel spreadsheet with whitespace cells

I'm looking for advice about how to cleanse an excel spreadsheet using R.
http://www.abs.gov.au/AUSSTATS/abs#.nsf/DetailsPage/5506.02012-13?OpenDocument
Gathering the years by tidyr::gather is simple enough. The difficulty is the subgroups. The groups are defined by whitespace. Each amount of whitespace is a subgroup.
My question is how to assign each row to its group, so that the table is tidy form.
My initial instinct was to look where there is a line of NAs in the spreadsheet and use na.locf to fill them, but that method cannot distinguish between subgroups followed by groups without subgroups. Is there a way to count the amount of whitespace visible before the cells in the linked excel spreadsheet?
On the particular sheet you are talking about, there aren't any leading characters - the indentation is just the formatting applied to the cell, in much the same way as you might apply a font to a cell.
The only way to count the indents in the formatting is to create a macro . Here's a user defined function that will work:
Public Function inds(r As Excel.Range) As Integer
inds = r.Cells(1, 1).IndentLevel
End Function
You would then just count the indents with =inds(a3)
Looks like you might be trying to prepare the data for a pivot table (there might be better options). However to count the leading spaces, simple formula:
=len(a3)-len(trim(a3))+1

Resources