Load file to create a dataframe in r - r

I have the file data/spatial/tissue_pos.csv and want to load it and create a dataframe object called spatial.data with the two columns xand y holding the x and y coordinates of the data. I am new to R and do not know how this is done. I am given some hints:
The second-to-last and last columns in tissue_pos.csv
represent the x and y coordinates respectively. You can ignore all
other columns except the barcodes of course.
There are more entries in the file than there are spots in the expression data, the entries are also differently sorted, make
sure to adjust for this and that the rownames and colnames match once
you've loaded the data.
To find common elements in two sets of vectors, there's a nifty command called intersect.
But I am still unsure of how to do this. All help is truly appreciated!

Related

Updating a File in R by adding a column/vector

Is there any way that I can update an existing .csv file by adding a column/vector that I have scraped from the web. I have a webscraper that pulls COVID-19 data and I am trying to create a file that has positive cases in columns and each column is the list of cases for a day in each county (x-axis is counties, y-axis is date). I have toyed around with many different ideas at this point and seem to have hit a roadblock. I'm fairly new to r so any ideas would be appreciated!
Packages I am Currently Using/Planning to Use:
library(tidyverse)
library(funModeling)
library(Hmisc)
library(rvest)
library(ggplot2)
CODE:
#writing the original file
positive <- data.frame(Counties= counties_list, "06/12/2020"= positive_data)
positive[is.na(positive)]= 0
positive = positive[-c(76),]
write.csv(positive, "C:/Users/Nathan May/Desktop/Research Files (ABI)/Covid/Data For
Shiny/Positive/Positive Data.csv")
#creating the new vector and updating the existing file with it
datap <- read.csv("C:/Users/Nathan May/Desktop/Research Files (ABI)/Covid/Data For
Shiny/Positive/Positive Data.csv")
positive_data = positive_data[-c(76),]
datap$DATE <- positive_data
NOTE: The end goal is to create a ShinyApp that displays bar charts for postives, recoveries, and deaths by day in each county. This is the data wrangling portion.
First things first, if you are going to use the tidyverse, use tibble instead of data.frame. Tibbles are the Tidyverse version of data frames.
Next, be aware of the structure of your data frame. The way you create your data.frame now (and later probably your tibble) you get a variable "Counties" and one additional variable for each day. That means that you will have to add columns as time passes (the opposite of what you described: Moving along the x axis (along columns) will move along dates while moving along the y-axis (moving along rows) will move along counties). It's possible but I think a bit unconventional. You might want to initialize your data frame with one column for each county and an additional variable called "date". Then whenever you get new data you can add a row in your dataframe instead of a column (so you're "adding a new case" instead of "adding a new variable").
To actually add the row you will have to load the data as you do in your code, create the new row (or column, if you insist) and then "glue" it to the rest of the data.
Depending on how your data looks you can create a single row dataframe using tibble_row() with the same countries as variable names as you have in your main data frame and then glue them together with add_row(datap, your_new_row). Alternatively, if you want to add the row only using position and not column names, you can have the new row as a vector and use rbind() instead of add_row.
If you persist with the "one variable per date" approach there's column equivalents (add_column and cbind) for both these functions.
Hope this helps, Cheers

Adding a column from a dataframe to a SpatialPolygon Dataframe

I've been trying to add a column of numerical data from a dataframe to a SpatialPolygon dataframe but every attempt leads to the latter dataframe being converted to a standard dataframe similar to the former. I needed to add the column so that I can create a choropleth map with the column's variable as the focus. Obviously the standard dataframe is no good since I'm trying to create a map using tmap.
This is how I've been trying to add the column (where shapefilecomb is the spatial dataframe and wardturnout is the variable containing the column in question):
shapefilecomb <- c(wardturnout)
Adding a column into data slot of SpatialPolygonsDataFrame by assignment operator shapefilecomb$wardturnout <- wardturnout works, but it is not the safest way to do the job. It relies only on position (first data item goes to first polygon, second to second and so on). It can get messy.
It is best reserved for calculated fields - the shapefile$valuepercapita <- shapefile$value / shapefile$population kind of assignment.
For data from external sources it is much better idea to assign value by key. Function append_data from tmap package does it very nicely, and gives you a message not only when error occurs, but also confirmation when all data was matched perfectly (which I found as a nice touch when working with large sets of imperfect data).
outShape <- append_data(srcShape, frmData, key.shp = "KOD_LAU1", key.data = "LAU1")
Edit (as of 9/2019): This answer seems to be still going strong... The world has changed though.
tmap::append_data() has been moved to tmaptools::append_data()
and is by now deprecated
sf has replaced sp as the go-to package in spatial data in R
In
the sf world spatial data are stored in modified data.frames, and the
most appropriate way to assign data items by key is one of the
*_join() functions from dplyr - either dplyr::left_join() to be
on safe side, or dplyr::inner_join() if filtering on both sides is actually desired behavior.

How to calculate combination of Data frame in R

I am a beginner in R program.
I imported a csv file. This file only contains one column with 50 characters, but R classifies it as a dataframe. I need all possible combinations within elements of this column. I think I need to work with a vector not with a data frame, how can I do it?
Thank you!
Actually your data frame already contains the vector you need. You can call it with
dataframe$column_name
The text before the $ operator specifies your data frame, and after is your vector, which is a column in your data frame. So when you run your calculations you can just write
function(dataframe$column_name)
In your specific case with a single vector, it may be simplest to change the dataframe into a 2d vector. But when you start manipulating your data, you'll likely store more vectors of variables. You'll want to keep those vectors organized within data frames.
Do you mean unlist?
You can use it to change a data frame into a vector, then you can use combn to get combination.

What's the easiest way to ignore one row of data when creating a histogram in R?

I have this csv with 4000+ entries and I am trying to create a histogram of one of the variables. Because of the way the data was collected, there was a possibility that if data was uncollectable for that entry, it was coded as a period (.). I still want to create a histogram and just ignore that specific entry.
What would be the best or easiest way to go about this?
I tried making it so that the histogram would only use the data for every entry except the one with the period by doing
newlist <- data1$var[1:3722]+data1$var[3724:4282]
where 3723 is the entry with the period, but R said that + is not meaningful for factors. I'm not sure if I went about this the right way, my intention was to create a vector or list or table conjoining those two subsets above into one bigger list called newlist.
Your problem is deeper that you realize. When R read in the data and saw the lone . it interpreted that column as a factor (categorical variable).
You need to either convert the factor back to a numeric variable (this is FAQ 7.10) or reread the data forcing it to read that column as numeric, if you are using read.table or one of the functions that calls read.table then you can set the colClasses argument to specify a numeric column.
Once the column of data is a numeric variable then a negative subscript or !is.na will work (or some functions will automatically ignore the missing value).

Changing hundreds of column names simultaneously in R

I have a data frame with hundreds of columns whose names I want to change. I'm very new to R, so it's rather easy to think through the logic of this, but I simply can't find a relevant example online.
The closest I could sort of get was this:
projectFileAllCombinedNames <- for (i in 1:200){names(projectFileAllCombined)[i+1] <-variableNames[i]}
Basically, starting at the second column of projectFileAllCombined, I want to loop through the columns in the dataframe and assign them the data values in the second data frame. I was able to change one column name manually with this code:
colnames(projectFileAllCombined)[2]<-"newColumnName"
but I can't possibly do that for hundreds of columns. I've spent multiple hours on this and can't crack it with any number of Google searches on "change multiple columns in r" or "change column names in r". The best I can find online is examples where people change a few columns with a c() function and I get how that works, but that still seems to require typing out all the column names as parameters to the function, unless there is a way to just pass the "variableNames" file into that c() function, but I don't know of one.
Will
colnames(projectFileAllCombined)[-1] <- variableNames
not suffice?
This assumes the ordering of columns in projectFileAllCombined is the same as the ordering of the new variable names in variableNames, and that
length(variableNames) == (ncol(projectFileAllCombined) - 1)
The key point here is that the replacement function 'colnames<-'() is vectorised and can replace any number of column names in a single call if passed a vector of replacement values.

Resources