I have a dataset and I want to add several columns into the dataset in just one step. I was wondering how I can do that (such as dplyr package). For example
R<-seq(1:5)
B<-c(15,16,17,18,19)
f<-c(20,21,22,23,24)
S<-data.frame(R,B,f)
Now I want to add several columns to the data set such as
g=B/f
h=round(B/f*f,3)
d=B+f/f
Related
I want to compare between the week() of the year of two parallel date columns from two different years. I`m using the GGparcoord function and looking for a way to manipulate the dates in the two columns to be the week count of the specific date. I wish not to manipulate the table itself.
my code is:
ggparcoord(data, columns = 38:39)
and I'm looking for something like ggparcoord(data, columns = week(38):week(39)), that actually works.
In addition, if anyone knows how, I would be happy to learn how to use the ggparcoord with column name instead of column number.
Tnx!
Is there any way that I can update an existing .csv file by adding a column/vector that I have scraped from the web. I have a webscraper that pulls COVID-19 data and I am trying to create a file that has positive cases in columns and each column is the list of cases for a day in each county (x-axis is counties, y-axis is date). I have toyed around with many different ideas at this point and seem to have hit a roadblock. I'm fairly new to r so any ideas would be appreciated!
Packages I am Currently Using/Planning to Use:
library(tidyverse)
library(funModeling)
library(Hmisc)
library(rvest)
library(ggplot2)
CODE:
#writing the original file
positive <- data.frame(Counties= counties_list, "06/12/2020"= positive_data)
positive[is.na(positive)]= 0
positive = positive[-c(76),]
write.csv(positive, "C:/Users/Nathan May/Desktop/Research Files (ABI)/Covid/Data For
Shiny/Positive/Positive Data.csv")
#creating the new vector and updating the existing file with it
datap <- read.csv("C:/Users/Nathan May/Desktop/Research Files (ABI)/Covid/Data For
Shiny/Positive/Positive Data.csv")
positive_data = positive_data[-c(76),]
datap$DATE <- positive_data
NOTE: The end goal is to create a ShinyApp that displays bar charts for postives, recoveries, and deaths by day in each county. This is the data wrangling portion.
First things first, if you are going to use the tidyverse, use tibble instead of data.frame. Tibbles are the Tidyverse version of data frames.
Next, be aware of the structure of your data frame. The way you create your data.frame now (and later probably your tibble) you get a variable "Counties" and one additional variable for each day. That means that you will have to add columns as time passes (the opposite of what you described: Moving along the x axis (along columns) will move along dates while moving along the y-axis (moving along rows) will move along counties). It's possible but I think a bit unconventional. You might want to initialize your data frame with one column for each county and an additional variable called "date". Then whenever you get new data you can add a row in your dataframe instead of a column (so you're "adding a new case" instead of "adding a new variable").
To actually add the row you will have to load the data as you do in your code, create the new row (or column, if you insist) and then "glue" it to the rest of the data.
Depending on how your data looks you can create a single row dataframe using tibble_row() with the same countries as variable names as you have in your main data frame and then glue them together with add_row(datap, your_new_row). Alternatively, if you want to add the row only using position and not column names, you can have the new row as a vector and use rbind() instead of add_row.
If you persist with the "one variable per date" approach there's column equivalents (add_column and cbind) for both these functions.
Hope this helps, Cheers
I am going to do repeated measures ANOVA on my data, but to this point, my data is wide. Two independent (categorical) variables are spread across single responsive variable.
See the image: https://imgur.com/1eTWSIM
I want to create two categorical variables that take values from the different parts of the columns (circled on the screenshot). Subject numbers should be kept as a category. So after using gather() function, the data should look something like this:
https://imgur.com/SGM2N69
I've seen in a tutorial (that I can't find anymore) that you can create two columns from a single function, using different parts of the colnames (using "_" as a separator), but I can't exactly remember how it was done.
Any help would be appreciated and ask if anythings is not clear in my explanation.
I have solved the problem by using 'gather()' function first and then 'separate()' to separate it into two new columns. So I guess, if you want to make two key columns, first you have to make a single column containing both values and later separate it into two.
At least that is how I did it.
I want to update my dataframe adding value in various columns.
My dataframe has A-Z columns name.
I created a list specifiying which column want to update.
upd<-c("a","b","c","d")
And i used simple R syntax to update
df<-df[,names(df)%in%upd]+5
But I only have a,b,c,d columns.
is there one way to save all columns?.
thanks
I want to check to see if a dataframe has a certain number of columns, and then conditionally modify the dataframe based on the number of columns it has.
If I try
ifelse(ncol(Table1)=="desired number of columns",Table1[,ColumnsSelected],Table1[,ColumnsSelected2]))
I get Table 1 back, but only one column, and it looks weird. Is there a way that I can change this to make it so that I can return a dataframe with the desired columns based on the number of columns in the dataframe?
I have roughly 700 tables that have 2 types of formatting that I would prefer not to have to individually scan to reformat.
Please advise