Dividing columns in R - r

I have the following dataset and I want to divide the "Dose" and "Count" columns by 100. I want my new dataset to include the "Visit" column as well.
However, using the code below, my new dataset does not have the "Visit" column.
What if I want to divide only the "Dose" column by 100?
I uploaded my dataset into R and called it data.
data_new=data[,2:ncol(data)]/100

The reason you're not preserving the untouched columns in your new data frame is because you're only assigning the selected columns on which you've applied the vectorized division operation back to what you expect (at least based on how you're naming it) is a data frame.
data$dose <- data$dose / 100

Related

mutate function in R

I am trying to add a column from one dataframe to another. The data is long repeated measures data, with each ID having two rows. Both my main dataset (d) and my secondary dataset (d2) use the same column (ID) to link cases to participants. However, when I use mutate like this d <- mutate(d, x = d2$x) the column binds to the dataframe but the values are not tied to the ID. This means that data gets mixed up between participants.
Is there a way to make sure that the values are referenced by ID when I add the column?

Is there a R methodology to select the columns from a dataframe that are listed in a separate array

I have a dataframe with over 100 columns. Post implementation of certain conditions, I need a subset of the dataframe with the columns that are listed in a separate array.
The array has 50 entries with 2 columns. The first column has the selected variable names and the second column has some associated values.
I wish to build a new data frame with just the variables mentioned in the the first column of the separate array. Could you please point me as to how to proceed?
Try this:
library(dplyr)
iris <- iris %>% select(contains(dataframe_with_names$names))
In R you can use square brackets [rows, columns] to select specific rows or specific columns. (Leaving either blank selects all).
If you had a vector of column names you wanted to keep called important_columns you could select only those columns with:
myData[,important_columns]
In your case the vector of column names is actually a column in your array. So you select that column and use it as your vector:
myData[, array$names]

How to copy multiple columns to a new dataframe in R

I have a data set (df2) with 400 columns and thousands of rows. The columns all have different names but all have either 'typeP' or 'typeR' at the end of their names. They are not ordered sequentially (eg. P,P,P,P,R,R,R,R) but randomly (P,P,R,R,R,P,R,P etc). I want to create a new data frame with just those columns whose names have 'type P' in their names.
I'm very new to R and so far I have only managed to find the positions of those columns using: grep("typeP",colnames(df2)). Any help would be appreciated!
After we get the index, we can use that to subset the initial dataset
df3 <- df2[grep("typeP",colnames(df2))]

Extracted data frame selection still contains entries from full data frame set

I have a data frame (originally from a CSV file) with the columns NAME and YEAR. I have extracted a sample from this data frame of the first ten entries like so:
sample<-df(1:10,)
I want to know the frequency of the values in the NAME column so I input the following:
as.data.frame(table(sample$NAME))
This counts the frequency in the sample correctly but also includes every name from the original data frame in the 'Var1' column (all with a Freq of 0).
The same thing happens if I use unique(sample$NAME) as well: it lists the names from the sample along with all of the names from the original data frame as well.
What am I doing wrong?
This could be a case of unused level in the 'NAME' factor column. We can use droplevels or call factor again to remove those unused levels.
as.data.frame(table(droplevels(sample$NAME)))
Or
as.data.frame(table(factor(sample$NAME)))

Changing values of multiple column elements for dataframe in R

I'm trying to update a bunch of columns by adding and subtracting SD to each value of the column. The SD is for the given column.
The below is the reproducible code that I came up with, but I feel this is not the most efficient way to do it. Could someone suggest me a better way to do this?
Essentially, there are 20 rows and 9 columns.I just need two separate dataframes one that has values for each column adjusted by adding SD of that column and the other by subtracting SD from each value of the column.
##Example
##data frame containing 9 columns and 20 rows
Hi<-data.frame(replicate(9,sample(0:20,20,rep=TRUE)))
##Standard Deviation calcualted for each row and stored in an object - i don't what this objcet is -vector, list, dataframe ?
Hi_SD<-apply(Hi,2,sd)
#data frame converted to matrix to allow addition of SD to each value
Hi_Matrix<-as.matrix(Hi,rownames.force=FALSE)
#a new object created that will store values(original+1SD) for each variable
Hi_SDValues<-NULL
#variable re-created -contains sum of first column of matrix and first element of list. I have only done this for 2 columns for the purposes of this example. however, all columns would need to be recreated
Hi_SDValues$X1<-Hi_Matrix[,1]+Hi_SD[1]
Hi_SDValues$X2<-Hi_Matrix[,2]+Hi_SD[2]
#convert the object back to a dataframe
Hi_SDValues<-as.data.frame(Hi_SDValues)
##Repeat for one SD less
Hi_SDValues_Less<-NULL
Hi_SDValues_Less$X1<-Hi_Matrix[,1]-Hi_SD[1]
Hi_SDValues_Less$X2<-Hi_Matrix[,2]-Hi_SD[2]
Hi_SDValues_Less<-as.data.frame(Hi_SDValues_Less)
This is a job for sweep (type ?sweep in R for the documentation)
Hi <- data.frame(replicate(9,sample(0:20,20,rep=TRUE)))
Hi_SD <- apply(Hi,2,sd)
Hi_SD_subtracted <- sweep(Hi, 2, Hi_SD)
You don't need to convert the dataframe to a matrix in order to add the SD
Hi<-data.frame(replicate(9,sample(0:20,20,rep=TRUE)))
Hi_SD<-apply(Hi,2,sd) # Hi_SD is a named numeric vector
Hi_SDValues<-Hi # Creating a new dataframe that we will add the SDs to
# Loop through all columns (there are many ways to do this)
for (i in 1:9){
Hi_SDValues[,i]<-Hi_SDValues[,i]+Hi_SD[i]
}
# Do pretty much the same thing for the next dataframe
Hi_SDValues_Less <- Hi
for (i in 1:9){
Hi_SDValues[,i]<-Hi_SDValues[,i]-Hi_SD[i]
}

Resources