I wish to use a variable to specify a particular cell in a csv file. I can use:
emp1 <- read.csv("C:/Database/data/emp1.csv",as.is=TRUE)
numberofemployee <- 1
> emp1["1", "X.name"]
[1] "ALEX"
but if I use:
> emp1["numberofemployee", "X.name"]
[1] NA
I assume R is looking for numberofemployee as a column header.
How do I get it to see it as an integer so I can specify my cells?
csv file
#name,mon,tue,wed,thu,fri
ALEX,98,95,73,88,18
BRAD,66,25,72,8,32
JOHN,22,41,78,43,36
The problem is that you pass strings to the []. This works best when referring to row and columnnames. In case of using "1", R probably makes an educated guess and converts the "1" to a 1 (numeric). However, in case of you passing the name of a variable, R cannot do anything else than assume that you are trying to extract something from the numberofemployee column. If you want to use the content of numberofemployee, you need to omit the ". R will then interpret that as an R object, whose content you want to use:
emp1[numberofemployee, "X.name"]
Related
I am pretty new to R and I wonder if I can replace NA value (which looks like string) with blank, nothing
It is easy when entire table is as.character however my table contains double's as well therefore when I try to run
f <- as.data.frame(replace(df, is.na(df), ""))
or
df[is.na(df)] <- ""
Both does not work.
Error is like
Assigned data `values` must be compatible with existing data.
Error occurred for column `ID`.
Can't convert <character> to <double>.
and I understand, but I really need ID's as well as any other cell in the table (character, double or other) blank to remain in the table, later on it is connected to BI tool and I can't present "NA", just blank for the sake of clarity
If your column is of type double (numbers), you can't replace NAs (which is the R internal for missings) by a character string. And "" IS a character string even though you think it's empty, but it is not.
So you need to choose: converting you whole column to type character or leave the missings as NA.
EDIT:
If you really want to covnert your numeric column to character, you can just use as.character(MYCOLUMN). But I think what you really want is:
Telling your exporting function how to treat NA'S, which is easy, e.g. write.csv(df, na = ""). Also check the help function with ?write.csv.
I have a problem with the selection of column in a dataframe using a for loop. I'm new to R so it's very possible that I missed something obvious, but I did not find anything that works for me.
I have a file with 20 climatic variable measured during 60 years in 399 differents places.
I have a line for each day, and my column are the 20 climatic variable for each place (with a number at the end of the name to identify the place where the measure was taken).
It looks like that :
Temperature_1 Rain_1 .....Temperature_399 Rain_399
Date 1
Date 2
...
I want to select the 20 column corresponding to one place, run some calculations on the variables, put the results in an empty 3D array I have created, then do the same for the next place until the last one.
My problem is that I don't know how to select the right columns automatically. I also have issues with the writing of the results in the array.
I tried to select the columns corresponding to one place using the numbers at the end of the name of the variables, but I don't think it is possible to change automatically the condition.
I also tried to use the position of the columns but I'm not doing it properly
This is my code :
#creation of an empty array
Indice_clim=array(NA,dim = c(60,8,399),dimnames=list(c(1959:2018),c("Huglin","CNI","HD","VHD","SHS","DoF","FreqLF","SLF"),c(1:399)))
#selection of the columns corresponding to the first place using "end with"
maille=select(donnees_SAFRAN,c(1:4),ends_with(".1",ignore.case = FALSE))
# another try using the columns position which I know is really badly done
for (j in seq(from=5, to=7984,by=20)){
paste0("maille",j-4)=select(donnees_SAFRAN,c(1:4),c(j:j+19))
}
#and the calculation on the selected columns, the "i loop" is working.
for(i in 1959:2018)temp=c(maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9)%>%summarise(sum(((T_moy.1-10)+(T_max.1-10))/2)*1.03),
maille%>%filter(an==i,mois==9)%>%summarise(mean(T_min.1)),
maille%>%filter(an==i)%>%summarise(sum(T_max.1>=30)),
maille%>%filter(an==i)%>%summarise(sum(T_max.1>=35)),
maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9,T_moy.1>=28)%>%summarise(sum(T_moy.1-28)),
maille%>%filter(an==i)%>%summarise(sum(T_min.1<=0)),
maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9)%>%summarise(sum(T_min.1<=0)),
maille%>%filter(an==i,mois==4|mois==5|mois==6|mois==7|mois==8|mois==9,T_moy.1<2)%>%summarise(sum(abs(2-T_moy.1))))
Indice_clim[[i-1958,,]]=as.numeric(temp)}
I would like to create a loop or something to do my calculation on each place and write the result in my array.
If you have any idea, I would very much appreciate it !
You can use the grep() function to look for each of the locations 1, 2, ..., 399 in the column names. If your big dataframe containing all the data is called df, then you could do this:
for (i in 1:399) {
selected_indices <- grep(paste0('_', i, '$'), colnames(df))
# do calculations on the selected columns
df[, selected_indices]
}
The for loop will automatically run through each location i from 1 through 399. The paste0() function concatenates '_' with the variable i and the dollar sign $ to create strings like "_1$", "_2$", ..., "_399$", which are then searched for using the grep() function in the column names of df. The '$' is used to specify that you want the patterns _1, _2, ... to appear at the end of the column names (it is a regular expression special character).
The grep() function uses the above regular expressions to returns the column indices required for each location. You can then extract the relevant portion of df and do whatever calculations you want.
I am trying to load data.csv in R using
S<-read.csv(file="data.csv")
Since it is a single column of numbers (I believe tab deliminated) without header, I was hoping for S to be a vector. But S displays as
X40.87
1 40.69
2 40.94
... ...
(The numbers 40.87,40.69... are my numbers.).
To access the third number, I need to invoke S[2,1]. Why not S[3]?
Use scan()
S <- scan("file.csv")
S[3]
# 40.94
Alternatively, as said by billinkc you can use read.csv("file.csv", header=FALSE) or just read.table("file.csv") as the delimiters aren't relevant in a file with a single column.
Since your CSV has no header, you need to indicate it as such when you open the file the interpreter is going to assign the first row as the column name.
Thus with input file like
40.87
40.69
40.94
I open this with the same logic you used
> s <- read.csv(file="~/Documents/r/data.txt",header=FALSE)
> s
V1
1 40.87
2 40.69
3 40.94
References
read.table {utils}
If you really just want a vector, subset the 1-column data frame:
read.csv(file="data.csv", header=FALSE)[,1]
This works because of the argument drop which takes default TRUE, and which drops the empty dimension (in this the column information).
I have a column of gene symbols that I have retrieved directly from a database, and some of the rows contain two or more symbols which are comma separated (see example below).
SLC6A13
ATP5J2-PTCD1,BUD31,PTCD1
ACOT7
BUD31,PDAP1
TTC26
I would like to remove the commas, and place the separated symbols into new rows like so:
SLC6A13
ATP5J2-PTCD1
BUD31
PTCD1
ACOT7
BUD3
PDAP1
TTC26
I haven't been able to find a straight forward way to do this in R, does anyone have any suggestions?
You can use this vector result to put into a matrix or a data.frame:
vec <- scan(text="SLC6A13
ATP5J2-PTCD1,BUD31,PTCD1
ACOT7
BUD31,PDAP1
TTC26", what=character(), sep=",")
Read 8 items
vec
[1] "SLC6A13" "ATP5J2-PTCD1" "BUD31" "PTCD1" "ACOT7" "BUD31" "PDAP1"
[8] "TTC26"
Perhaps:
as.matrix(vec)
(The scan function can also read from files. The "text" parameter was only added relatively recently, but it saves typing file=textConnection("...").)
Another option is to use readLines and strsplit :
unlist(strsplit(readLines(textConnection(txt)),','))
"SLC6A13" "ATP5J2-PTCD1" "BUD31" "PTCD1" "ACOT7"
"BUD31" "PDAP1" "TTC26"
Is it possible to write values of different datatypes to a file in R? Currently, I am using a simple vector as follows:
> vect = c (1,2, "string")
> vect
[1] "1" "2" "string"
> write.table(vect, file="/home/sampleuser/sample.txt", append= FALSE, sep= "|")
However, since vect is a vector of string now, opening the file has following contents being in quoted form as:
"x"
"1"|"1"
"2"|"2"
"3"|"string"
Is it not possible to restore the data types of entries 1 and 2 being treated as numeric value instead of string. So my expected result is:
"x"
"1"|1
"2"|2
"3"|"string"
also, I am assuming the left side values "1", "2" and "3" are vector indexes? I did not understand how the first line is "x"?
I wonder if simply removing all the quotes from the output file will solve your problem? That's easy: Add quote=FALSE to your write.table() call.
write.table(vect, file="/home/sampleuser/sample.txt",
append=FALSE, sep="|", quote=FALSE)
x
1|1
2|2
3|string
Also, you can get rid of the column and row names if you like. But now your separator character doesn't appear because you have a one-column table.
write.table(vect, file="/home/sampleuser/sample.txt", append=FALSE, sep="|",
quote=FALSE, row.names=FALSE, col.names=FALSE)
1
2
string
For vectors and matrices, R requires everything to have the same data type. By default, R will coerce all of the data in the vector/matrix into the same format. R will coerce more specific types of data into less specific data types. In this case, any of the items stored in your vector can be reasonably represented as type "character", so it will automatically coerce the numeric parts of the vector to fit that data type.
As #Dason said, you're better off using a list if this isn't something you want.
Alternatively, you can use a data.frame, which lets you store different datatypes in different columns (internally, R stores data.frames as lists, so it makes sense that this would be another option).