Substituting values in the date field (string) within a dataframe - r

It must be a very easy task, but I can't find the right line of code for this:
Data frame (df) has several columns (Date is the first one, containing string object), and around 200 rows.
Date V1
1 01/01/2011 5
2 02/01/2011 4
3 03/01/2011 2
...
200 05/09/2011
needs to become this (current year):
Date V1
1 01/01/2013 5
2 02/01/2013 4
3 03/01/2013 2
...
200 05/09/2013
Thanks!

df$Date <- sub('11$','13',df$Date)
should work.
But beware: naming a variable Date is a bad idea because R already has an internal data type with that name.

Related

can I select some rows in my data set whose have the same value in 2 of the columns?

I have a data set with 40 columns and 2000 rows. the value of 2 columns are important. I want to select rows whose have the same value in these 2 columns.
a small sample of my data is like this
2 3 4 5 6 3 23 32
4 3 4 1 0 5 6 43
4 4 3 22 1 2 23
Suppose I want to select rows whose have same value in first and third columns. So I want the second row to be stored in a new data set
I take from your comments that you have numbers stored as factors in that dataframe. Factors have different internal values. So when the console output shows the factor level to be 4 it is not necessarily a 4 in the internal representation. In general, two different factors are not compatible with each other except if they have the same level set. To see the 'internal representation' of your first column use as.numeric(df[[1]]).
Now to the solution of your problem. You first have to convert the factors in your columns 1 and 3 (or all columns) into numeric values using the factor levels. Instructions for that can be found here.
## converting factor levels to numeric values
df[[1]] <- as.numeric(levels(df[[1]]))[df[[1]]]
df[[3]] <- as.numeric(levels(df[[3]]))[df[[3]]]
## filter data
df[df[1] == df[3],]

Convert/Maintain Dates in Header from Excel to R

I have an Excel spreadsheet with dates in the header. I want to import the spreadsheet into R, presumably using the read.xlsx() function. However, the dates are converted to a string of the internal value from Excel with an "X" in the front. I am hoping to keep the dates as a Date class, or convert the strings to a Date. I understand I could use as.Date() if the date was at least in a format, or the number of days from a specified origin, but it has the "X".
Thank you very much for the help.
Eg.
the excel spreadsheet "Practice"
Sample 09-Jul 10-Jul 11-Jul
1 3 10 2
2 5 0
3 1 0 0
then in R:
practice<-read.xlsx("Practice.xlsx")
Sample X42925 X42926 X42927
1 1 3 10 2
2 2 5 0 NA
3 3 1 0 0
practice2=gather(practice,Date,value,-Sample,na.rm=TRUE)
Sample Date value
1 1 X42925 3
2 2 X42925 5
3 3 X42925 1
4 1 X42926 10
5 2 X42926 0
6 3 X42926 0
7 1 X42927 2
9 3 X42927 0
practice2$Date=as.Date(practice2$Date)
Error in charToDate(x) :
character string is not in a standard unambiguous format
The value X42925 is an Excel serial date, corresponding roughly to the date which is 42925 days after January 1, 1900. We can convert these serial dates to R dates using as.Date with an appropriate origin.
You should be able to convert your Date column using something like the following. This assumes the dates, prefixed by X, were read in as text.
dates <- as.numeric(substr(practice2$Dates, 2, nchar(practice2$Dates)))
practice2$Dates <- as.Date(dates, origin = '1899-12-30')
Demo
Try saving the Excel file as a .csv instead. That converts your data and dates into plain text, which should import into R with no problem.
Alternatively, try one of the methods here:
http://www.milanor.net/blog/read-excel-files-from-r/
Good luck!

Adding a variable value to a specific record in R

I am working with a large data frame (see example below), where a value is missing in the year variable. I assume that the missing value is 2000 and i would like to add it. I don't like the idea to add the value by hand, is there any other possibility?
dataID dataOrigin year breedSummary breedFCI SNP sex age postcode
1 H00-0012 IVPZ-APPX 2000 1018 3 7 1 12 7000
4 H00-0022 IVPZ-APPX NA 1217 1 5 3 9 7514
Many thanks!
Assuming column dataID has unique variables, this can be done simply with:
data[data$dataID == 'H00-0022',]$year = 2000
data$year[which (data$dataID == 'H00-0022’)]<- 2000

In R, how can I combine two columns within a data.frame

I'm working with some data that looks like this:
AB 123 4 5 3 2 1
AB 234 4 2 7 4 3
...
The row id is actually the combination of the first two columns, so I would like to be able to reference row AB123 or AB234. However, since they are in two columns, I figured the easiest way to do this would be to merge columns 1 and 2 somehow and then convert it to a table with column 1 specified as the row names. Does anyone know how I can do this? Is there an easier way? Thanks.
row.names(df)<-paste(df[,1],df[,2],sep="")

how to select matrix element in R?

Reading the data the following way
data<-read.csv("userStats.csv", sep=",", header=F)
I tried to select an element at the specific position.
The example of the data (first five rows) is the following (V2 is the date and V3 is the day of week):
V1 V2
1 00002781A2ADA816CDB0D138146BD63323CCDAB2 2010-09-04
2 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-04
3 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-07
4 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-08
5 00002D2354C7080C0868CB0E18C46157CA9F0FD4 2010-09-17
V3 V4 V5 V6 V7 V8 V9
1 Saturday 2 2 615 1 1 47
2 Saturday 2 2 77 1 1 43
3 Tuesday 1 3 201 1 1 117
4 Wednesday 1 1 44 1 1 74
5 Friday 1 1 3 1 1 18
I tried to divide 6th column with 9th column in the first row the following way:
data[1,6]/data[1,9]
but it returned an error
[1] NA
Warning message:
In Ops.factor(data[1, 6], data[1, 9]) : / not meaningful for factors
Then I tried to select just one element
> data[2,9]
[1] 43
11685 Levels: 0 1 2 3 ... 55311
but don't know what these Levels are and what causes an error. Does anyone know how to select an element at the specific position data[row, column]?
Thank you!
My favorite tool to check variable class is str().
What you have there is a data frame and at least one of the columns you're trying to work with is a factor. See Dirk's answer on how to change classes of a column.
Command
data[1,6]/data[1,9]
is selecting the value in the first row of sixth column and dividing with the value in first row of the ninth column. Is this what you want? If you want to use values from the entire column (and not just the first row), you would write
data[6] / data[9]
or
data[, 6] / data[, 9]
Both arguments are equivalent for data.frames.
The standard modeling data structure in R is a data.frame.
The data.frame objects can hold various types: numeric, character, factor, ...
Now, when reading data via read.csv() et al, you can get bitten by the default valus of the stringsAsFactors option. I presume that at least a row in your data had text, so R decides to decode it as a factor and presto! you no longer can do direct mathematical operations on the column.
In short, do summary(data) and/or a sweep of class() over all the columns. Convert as necessary, or turn the stringsAsFactors variable to a different value or both.
Once your data is numeric, you can divide, slice, dice, ... as you please.

Resources