I am new to R and reading 19 variables (Import_1 to Import_19) from a CSV file x <- (as.data.frame(final_data[,c(15:33)]))
When I summarized one variable I got following display (possibly character variable)
Import_1
EXTREMELY IMPORTANT-10:177
09 :176
08 : 89
07 : 45
06 : 15
05 : 6
04 : 3
03 : 3
02 : 3
NOT AT ALL IMPORTANT-01 : 2
Now I need to convert these 19 variables into numeric 1-10 values, so that I can do regression. Let me know how can I do that.
You can convert variables to numeric using the functions as.numeric, as.double, as.integer. See this Introduction to R DataTypes to get you started.
Related
so I have this dataset where age of the respondent was an open-ended question and responses sometimes look as follows:
Age:
23
45
36 years
27
33yo
...
I would like to save the numeric data, without introducing (and filtering out NAs), and I wondered if there is an option of making this out of it:
Age:
23
45
36
27
33
...
..by restricting the n.o. characters in the vector and converting them later to "numeric".
I believe there is a simple line for this. I just somehow couldn't find it.
I have been using R for several years, but could really use some help with this. Time series data is not my norm. For some background, this data comes from I-buttons, which record temperature that were planted in different sized patches in a landscape. My data looks like this in general:
Ibutton_new.csv:
Date Edge1 Edge2 Edge3...
2012-7-16 25 24 24.5
2012-7-16 24 23 23
2012-7-16 23.5 22.5 22.5
2012-7-16 27.5 24.5 24.5
2012-7-16 27 27.5 26.5
2012-7-16 27 26.5 27
2012-7-17 26 25 25
2012-7-17 25 25 25
2012-7-17 24 23 23
2012-7-17 24 23 23
2012-7-17 28 29 27.5
2012-7-17 28 28 28
etc for a year
Step 1: I convert my data into an xts object:
library(zoo)
library(xts)
x<-read.csv("Ibutton_new.csv")
x$Date <- mdy(x$Date)
x.xts <- xts(x[,-1], order.by=x[,1])
class(x.xts)
[1] "xts" "zoo"
str(x.xts)
An ‘xts’ object on 2012-07-16/2013-06-22 containing:
Data: num [1:2048, 1:114] 25 24 23.5 27.5 27 27 26 25 24 28 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:114] "edge_1" "edge_2" "edge_3" "edge_4" ...
Indexed by objects of class: [Date] TZ: UTC
xts Attributes:
NULL
Step 2:Make a graph
windows()
plot(x.xts,col=color1,main="Soil Temperature (C) Across Whole
Site",lwd=2,ylim=c(-10,50),cex.axis=1.5)
Okay, in general I am pretty happy with this, except for the x axis which is just ugly. I don't know why it is putting time stamps with the date values? I would like it just to have tick marks and labels by month. So I tried this:
plot(x.xts,col=color1,main="Soil Temperature (C) Across Whole
Site",lwd=2,ylim=c(-10,50),cex.axis=1.5,major.ticks="months",grid.ticks.on="months")
So grid lines and tick marks looked fine, but the labels were still the same. Then I tried this:
plot(x.xts,xaxt="n",col=color1,main="Soil Temperature (C) Across Whole
Site",lwd=2,ylim=c(-10,50),cex.axis=1.5)
ticks <- axTicksByTime(x.xts,"months",format.labels="%b-%Y")
axis(1,at = .index(x.xts)[ticks], labels = names(ticks),mgp=c(0,0.5,0))
And hilariously got this:
So I am so close, but not quite there. Any suggestions? Normally I would just import this into powerpoint and edit it. I have an add on that can export high quality pictures, but my computer is in a repair shop right now. Also, I hate feeling like R got the best of me. I am sure there is an easy solution or something silly I did but am not seeing, possibly a formatting issue in step one? It took me forever to be able to even make a proper xts object. Again, I don't normally work with this type of data or packages. Thanks in advance!
Using axis() just adds a formatted axis over the existing plot, similar to lines(), for example.
The solution is to include the formatting you want in the original call to plot() like so:
plot(x.xts,xaxt="n",col=color1,main="Soil Temperature (C) Across Whole
Site",lwd=2,ylim=c(-10,50),cex.axis=1.5, format.labels="%b-%Y")
I have a xts object called 'usagexts' with dates from 01 Oct 15 to 31 Mar 18. I want to create 3 subsets of this object for the periods 01 Oct 15 to 31 Mar 16, 01 Oct 16 to 31 Mar 17 and 01 Oct 17 to 31 Mar 18 without actually hardcoding the dates as these will changes as time goes on.
The object structure is like so :
dateperiod,usageval
2015-10-01,21542
2015-10-02,21572
2015-10-03,21342
...
...
2018-03-31,20942
I have another data frame called 'periodvalues' like so :-
startdate,enddate, periodtext
2015-10-01,2016-03-31,1510_1603
2016-10-01,2017-03-31,1610_1703
2017-10-01,2018-03-31,1710_1803
I want to be able to create 3 xts objects like so :-
usagexts_1510_1603 -> xts object containing usage details for relevant period
usagexts_1610_1703 -> xts object containing usage details for relevant period
usagexts_1710_1803 -> xts object containing usage details for relevant period
I only got as far as creating a list of size 3 containing the periodtext from the above data frame. I was trying to somehow specify the start and end period for the xts object using the "objectname fromdate/todate" structure through variables but it didn't work - something like so :
usagexts_1610_1703 <- usagexts[var1/var2]
The LHS came from the list and the variables on the RHS cames from variable defintion done prior.
usagexts_1610_1703 <- usagexts[var1/var2]
Expected results should be like so :
usagexts_1510_1603 <- usagexts["2015-10-01/2016-03-31"]
usagexts_1610_1703 <- usagexts["2016-10-01/2017-03-31"]
usagexts_1710_1803 <- usagexts["2017-10-01/2018-03-31"]
Any assistance on that shall be highly valued.
Best regards
Deepak
If var1 and var2 are variables, then the filter string can be specified using paste as:
usagexts[paste(var1, var2, sep="/")]
I'm just asking how to transform from categorical variables to quantitative variables so as to make a boxplot.
My command is:
wiser_perc<-read.csv("Perca_fluviatilis.csv",header=T, sep=";")
attach(wiser_perc)
summary(wiser_perc)
Country
Sweden :156
Germany: 73
France : 67
Norway : 19
Estonia: 8
(Other):7
Diversity
1,66E+00: 8
1,28E+00: 6
1,64E+00: 5
1,76E+00: 5
2,01E+00: 5
2,36E+00: 5
(Other):299
boxplot(Diversity~Country, data=wiser_perc,boxwex=0.7,cex.axis=0.8,ylab="Size diversity")
Error in boxplot.default(split(mf[[response]], mf[-response]), ...) :
adding class "factor" to an invalid object
#
So, I don't know how to change the variable "Diversity" to a quantitative variable.
Please, I'm stuck in that problem.
You don't want to be using read.csv(), you should be using read.csv2() instead. The latter is designed to be "used in countries that use a comma as decimal point and a semicolon as field separator". That way you don't need to worry about fixing the mess caused by read.csv().
Have a look at: http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
I'm trying to read a csv file into R that has date values in some of the colum headings.
As an example, the data file looks something like this:
ID Type 1/1/2001 2/1/2001 3/1/2001 4/1/2011
A Supply 25 35 45 55
B Demand 26 35 41 22
C Supply 25 35 44 85
D Supply 24 39 45 75
D Demand 26 35 41 22
...and my read.csv logic looks like this
dat10 <- read.csv("c:\data.csv",header=TRUE, sep=",",as.is=TRUE)
The read.csv works fine except it modifies the name of the colums with dates as follows:
x1.1.2001 x2.1.2001 x3.1.2001 x4.1.2001
Is there a way to prevent this, or a easy way to correct afterwards?
Set check.names=FALSE. But be aware that 1/1/2001 et al are syntactically invalid names, therefore they may cause you some headaches.
You can always change the column names using the colnames function. For example,
colnames(dat10) = gsub("\\.", "/", colnames(dat10))
However, having slashes in your column names isn't a particularly good idea. You can always change them just before you print out the table or when you create a graph.