basic R question on manipulating dataframes - r

I have a data frame with several columns. rows have names.
I want to calculate some value for each row (col1/col2) and create a new data frame with the original row names. If I just do something like data$col1/data$col2 I get a vector with the results but lose the row names.
i know it's very basic but I'm quite new to R.

It would help to read ?"[.data.frame" to understand what's going on. Specifically:
Note that there is no ‘data.frame’
method for ‘$’, so ‘x$name’ uses the
default method which treats ‘x’ as a
list.
You will see that the object's names are lost if you convert a data.frame to a list (using Joris' example data):
> as.list(Data)
$col1
[1] -0.2179939 -2.6050843 1.6980104 -0.9712305 1.6953474 0.4422874
[7] -0.5012775 0.2073210 1.0453705 -0.2883248
$col2
[1] -1.3623349 0.4535634 0.3502413 -0.1521901 -0.1032828 -0.9296857
[7] 1.4608866 1.1377755 0.2424622 -0.7814709
My suggestion would be to avoid using $ if you want to keep row names. Use this instead:
> Data["col1"]/Data["col2"]
col1
a 0.1600149
b -5.7435947
c 4.8481157
d 6.3816918
e -16.4146120
f -0.4757387
g -0.3431324
h 0.1822161
i 4.3114785
j 0.3689514

use the function names() to add the names :
Data <- data.frame(col1=rnorm(10),col2=rnorm(10),row.names=letters[1:10])
x <- Data$col1/Data$col2
names(x) <- row.names(Data)
This solution gives a vector with the names. To get a data-frame (solution from Marek) :
NewFrame <- data.frame(x=Data$col1/Data$col2,row.names=row.names(Data))

A very simple and neat way is to use row.names(data frame) to store it as a column and further manipulate

Related

Dynamically change part of variable name in R

I am trying to automatise some post-hoc analysis, but I will try to explain myself with a metaphor that I believe will illustrate what I am trying to do.
Suppose I have a list of strings in two lists, in the first one I have a list of names and in the other a list of adjectives:
list1 <- c("apt", "farm", "basement", "lodge")
list2 <- c("tiny", "noisy")
Let's suppose also I have a data frame with a bunch of data that I have named something like this as they are the results of some previous linear analysis.
> head(df)
qt[apt_tiny,Intercept] qt[apt_noisy,Intercept] qt[farm_tiny,Intercept]
1 4.196321 -0.4477012 -1.0822793
2 3.231220 -0.4237787 -1.1433449
3 2.304687 -0.3149331 -0.9245896
4 2.768691 -0.1537728 -0.9925387
5 3.771648 -0.1109647 -0.9298861
6 3.370368 -0.2579591 -1.0849262
and so on...
Now, what I am trying to do is make some automatic operations where the strings in the previous lists dynamically change as they go in a for loop. I have made a list with all the distinct combinations and called it distinct. Now I am trying to do something like this:
for (i in 1:nrow(distinct)){
var1[[i]] <- list1[[i]]
var2[[i]] <- list2[[i]]
#this being the insertable name part for the rest of the variables and parts of variable,
#i'll put it inside %var[[i]]% for the sake of the explanation.
%var1[[i]]%_%var2[[i]]%_INT <- df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept]`+ df$`qt[%var1[[i]]%,Intercept]`
}
The difficult thing for me here is %var1[[i]]% is at the same time inside a variable and as the name of a column inside a data frame.
Any help would be much appreciated.
You cannot use $ to extract column values with a character variable. So df$`qt[%var1[[i]]%_%var2[[i]]%,Intercept] will not work.
Create the name of the column using sprintf and use [[ to extract it. For example to construct "qt[apt_tiny,Intercept]" as column name you can do :
i <- 1
sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])
#[1] "qt[apt_tiny,Intercept]"
Now use [[ to subset that column from df
df[[sprintf('qt[%s_%s,Intercept]', list1[i], list2[i])]]
You can do the same for other columns.

Comparing character lists in R

I have two lists of characters that i read in from excel files
One is a very long list of all bird species that have been documented in a region (allBirds) and another is a list of species that were recently seen in a sample location (sampleBirds), which is much shorter. I want to write a section of code that will compare the lists and tell me which sampleBirds show up in the allBirds list. Both are lists of characters.
I have tried:
# upload xlxs file
Full_table <- read_excel("Full_table.xlsx")
Pathogen_table <- read_excel("pathogens.xlsx")
# read species columnn into a new dataframe
species <-c(as.data.frame(Full_table[,7], drop=FALSE))
pathogens <- c(as.data.frame(Pathogen_table[,3], drop=FALSE))
intersect(pathogens, species)
intersect(species, pathogens)
but intersect is outputting lists of 0, which I know cannot be true, any suggestions?
Maybe you can try match() function or "==".
You need to run the intersect on the individual columns that are stored in the list:
> a <- c(data.frame(c1=as.factor(c('a', 'q'))))
> b <- c(data.frame(c1=as.factor(c('w', 'a'))))
> intersect(a,b)
list()
> intersect(a$c1,b$c1)
[1] "a"
This will probably do in your case
intersect(Full_table[,7], Pathogen_table[,3])
Or if you insist on creating the data.frames:
intersect(pathogens[1,], species[1,])
where [1,] should select the first column of the data.frame only. Note that by using c(as.data.frame(... you are converting the data.frame to a regular list. I'd go with only as.data.frame(....

Use lists/dataframes as items in for-loops in R

I am quite sure this is basic stuff, but I just can't find the answer by googling. So my problem:
I want to use a for-loop on a list of lists or data frames. But when you use list[i], you get all the values in the data frame instead of the data frame it self. Can anyone point out to me how to code this properly?
Example of the code:
a<-data.frame(seq(1:3),seq(3:1))
b<-data.frame(seq(1:3),seq(3:1))
l<-c(a,b)
Then l[1] returns:
> l[1]
$seq.1.3..
[1] 1 2 3
And I want it to just return: a
You can use the list function:
a<-data.frame(1:3,1:3)
b<-data.frame(3:1,3:1)
l<-list(a,b)
And access it's value with double brackets [[:
l[[1]]
l[[2]]
Ps: seq(1:3) and seq(3:1) outputs the same value, so I used 1:3 and 3:1. :)

R equivalent to the MATLAB structure?

Is there an R type equivalent to the Matlab structure type?
I have a few named vectors and I try to store them in a data frame. Ideally, I would simply access one element of an object and it would return the named vectors (like a structure in Matlab). I feel that using a data frame is not the right thing to do since it can store the values of the named vectors but not the names when they differ from one vector to the other.
More generally, is it possible to store a bunch of different objects in a single one in R?
Edit: As Joran said I think that list does the job.
l = list()
l$vec1 = namedVector1
l$vec2 = namedVector2
...
If I have a list of names
name1 = 'vec1'
name2 = 'vec2'
is there any way for the interpreter to understand that when I use a variable name like name1, I am not referring to the variable name but to its content? I have tried get(name1) but it does not work.
I could still be wrong about what you're trying to do, but I think this is the best you're going to get in terms of accessing each list element by name:
l <- list(a= 1:3,b = 1:10)
> ind <- "a"
> l[[ind]]
[1] 1 2 3
Namely, you're going to have to use [[ explicitly.

How to get a .csv file into R?

I have this .csv file:
ID,GRADES,GPA,Teacher,State
3,"C",2,"Teacher3","MA"
1,"A",4,"Teacher1","California"
And what I want to do is read in the file using the R statistical software and read in the Header into some kind of list or array (I'm new to R and have been looking for how to do this, but so far have had no luck).
Here's some pseudocode of what I want to do:
inputfile=read.csv("C:/somedirectory")
for eachitem in row1:{
add eachitem to list
}
Then I want to be able to use those names to call on each vertical column so that I can perform calculations.
I've been scouring over google for an hour, trying to find out how to this but there is not much out there on dealing with headers specifically.
Thanks for your help!
You mention that you will call on each vertical column so that you can perform calculations. I assume that you just want to examine each single variable. This can be done through the following.
df <- read.csv("myRandomFile.csv", header=TRUE)
df$ID
df$GRADES
df$GPA
Might be helpful just to assign the data to a variable.
var3 <- df$GPA
You need read.csv("C:/somedirectory/some/file.csv") and in general it doesn't hurt to actually look at the help page including its example section at the bottom.
As Dirk said, the function you are after is 'read.csv' or one of the other read.table variants. Given your sample data above, I think you will want to do something like this:
setwd("c:/random/directory")
df <- read.csv("myRandomFile.csv", header=TRUE)
All we did in the above was set the directory to where your .csv file is and then read the .csv into a dataframe named df. You can check that the data loaded properly by checking the structure of the object with:
str(df)
Assuming the data loaded properly, you can think go on to perform any number of statistical methods with the data in your data frame. I think summary(df) would be a good place to start. Learning how to use the help in R will be immensely useful, and a quick read through the help on CRAN will save you lots of time in the future: http://cran.r-project.org/
You can use
df <- read.csv("filename.csv", header=TRUE)
# To loop each column
for (i in 1:ncol(df))
{
dosomething(df[,i])
}
# To loop each row
for (i in 1:nrow(df))
{
dosomething(df[i,])
}
Also, you may want to have a look to the apply function (type ?apply or help(apply))if you want to use the same function on each row/column
Please check this out if it helps you
df<-read.csv("F:/test.csv",header=FALSE,nrows=1)
df
V1 V2 V3 V4 V5
1 ID GRADES GPA Teacher State
a<-c(df)
a[1]
$V1
[1] ID
Levels: ID
a[2]
$V2
[1] GRADES
Levels: GRADES
a[3]
$V3
[1] GPA
Levels: GPA
a[4]
$V4
[1] Teacher
Levels: Teacher
a[5]
$V5
[1] State
Levels: State
Since you say you want to access by position once your data is read in, you should know about R's subsetting/ indexing functions.
The easiest is
df[row,column]
#example
df[1:5,] #rows 1:5, all columns
df[,5] #all rows, column 5.
Other methods are here. I personally use the dplyr package for intuitive data manipulation (not by position).

Resources