Change name of the output column - r

As a part of the Recommender systems course at Coursera, I am doing assignments in R (https://github.com/eponkratova/projects-recommender-system/blob/master/recommender_knit.Rmd) and so far I got a N result.
Is there a way to rename col (var renamed_mean_1) more elegantly during the step where I calculate the average by a column (var dataset_mean_1)?
install.packages('gsheet', repos="http://cran.rstudio.com/")
library('gsheet')
url <- 'https://docs.google.com/spreadsheets/d/1XDBRCYFTxsw27AivxJ5pWxDHN0WA6GqSP46PVe2BCQ4/edit?usp=sharing'
dataset <- gsheet2tbl(url)
dataset_mean_1 <- data.frame(colMeans(dataset, na.rm = TRUE))
install.packages('plyr', repos="https://cran.r-project.org")
library('plyr')
renamed_mean_1 <- rename(dataset_mean_1,c('colMeans.dataset..na.rm...TRUE.'='Mean'))
ordered_mean_1 <- head(renamed_mean_1[order(-renamed_mean_1$Mean),,drop=FALSE],n=4)
I don't have much experience with R, and for this reason, my code is a bit bulky.
Could you please help me?

Try this:
dataset_mean_1 <- data.frame(colMeans(dataset, na.rm = TRUE))
colnames(dataset_mean_1) <- "renamed_mean_1"
Or just to one call:
dataset_mean_1 <- data.frame(renamed_mean_1 =colMeans(dataset, na.rm = TRUE))

Related

Smarter way to include Row Names into multiple formulas

I have multiple data sets to call on, and present in a few places. The code works fine, but its a laborious task to copy/paste each variable and mutation to get the result. There must be an easier way to incorporate rownames + variable to create some sort of loop!
I am simply copy/pasting the same variables and changing the name to the next row.
row1Mapped <- sum(cnx$row1Connect =="Mapped", na.rm = TRUE)
row2Mapped <- sum(cnx$row2Connect =="Mapped", na.rm = TRUE)
row3Mapped <- sum(cnx$row3Connect =="Mapped", na.rm = TRUE)
row4Mapped <- sum(cnx$row4Connect =="Mapped", na.rm = TRUE)
main <- main %>%
mutate(Mapped = ifelse(Bank == "row1", row1Mapped,
ifelse(Bank == "row2", row2Mapped,
ifelse(Bank == "row3", row3Mapped, NA))))
Everything works, I however would like to be more efficient!

Print row from table in R

I'm new in R, and I'm trying to display in console several columns from a row when a condition is fulfilled. I've searched through the internet and I couldn't find a proper solution. At the moment, I've tried the R where clause with little success.
Here's my script.
#Coordinates
northing <- 398380.16
easting <- 6873865.89
filePath = '/media/jgm/Toshiba\ HDD/SatelliteData/data/'
file = 'MOD09GQ_2006075.csv'
mydata <- read.table(paste(filePath,file, sep = ""),header=TRUE,sep=",")
mydata$'(x-northing)²' <- (mydata$x-northing)**2
mydata$'(y-easting)²' <- (mydata$y-easting)**2
mydata$'DISTANCE' <- sqrt(mydata$`(x-northing)²`+mydata$`(y-easting)²`)
minDistance <- min(mydata[,10], na.rm = T)
I want to display in console the value of the columns sur_refl_b01, sur_refl_b02, NDVI and NDVI_SCALED when the value of the column DISTANCE is minDistance.
Hope this table output helps.
welcome to SO, try something like :
print(mydata[which(mydata$'DISTANCE'==minDistance),4:7])

using lapply to expand lists

I am trying to expand out some input data that comes in character format. In the below example --
jv17 <- list(4.2017,5.2017,6.2017,7.2017,8.2017,9.2017,10.2017)
jv18 <- list(4.2018,5.2018,6.2018,7.2018,8.2018,9.2018,10.2018)
mylist1 <- list("jv17")
mylist2 <- list("jv17", "jv18")
This does what I want:
eval(parse(text = mylist1))
This does not:
eval(parse(text = mylist2))
The output I am looking for:
list(jv17,jv18)
I feel very confident this could be done with lapply but I'm not as good as I should be with it. Here is my best guess but I get an error that is not intuitive (at least to me).
lapply(x = mylist2, FUN = eval(parse(text = .)))
Any help is greatly appreciated, thank you.

How to correct R syntax for summing two fields?

In a dbf I make the new field xyz then attempt to sum existing item1 and item2 fields and replace field xyz with sum and then create a new dbf-- but does not work. All working without the for loop. I hope someone can help. Thank you.
library(foreign)
setwd("C:/temp")
dbfdata <- read.dbf("sldu_500ka.dbf", as.is = TRUE)
dbfdata$xyz <- 1:nrow(dbfdata)
for(i in 1:nrow(dbfdata)) {
row <- dbfdata[i,]
dbfdata$xyz <- dbfdata$item1 + dbfdata$item2
}
write.dbf(dbfdata, "sldu_500k1.dbf")
I'm not sure whether I understand you correctly, but
library(foreign)
setwd("C:/temp")
dbfdata <- read.dbf("sldu_500ka.dbf", as.is = TRUE)
dbfdata$xyz <- dbfdata$item1 + dbfdata$item2
write.dbf(dbfdata, "sldu_500k1.dbf")
should do the job. Instead of looping overall rows, you can add the entire column at once.

R, Getting the top in every category from a data frame?

I have the following data frame
id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33
Note, they are already sorted by value within each (id,category). What I would like to be able to do is to get the top from each (id,category) and make a string, followed by the second in each (id,category) and so on. So for the above example it would look like
A,D,G,B,E,C,F
Is there a way to do it easily in R? Or am I better off relying on a Perl script to do it?
Thanks much in advance
This appears to work, but I'm certain we could simplify it somewhat, particularly if you are able to relax your ordering requirements:
library(plyr)
d <- read.table(text = "id,category,value
A,21,0.89
B,21,0.73
C,21,0.61
D,12,0.95
E,12,0.58
F,12,0.44
G,23,0.33",sep = ',',header = TRUE)
d <- ddply(d,.(category),transform,r = seq_along(category))
d <- arrange(d,id)
> paste(d$id[order(d$r)],collapse = ",")
[1] "A,D,G,B,E,C,F"
This version is probably more robust to ordering, and avoids plyr:
d$r <- unlist(sapply(rle(d$category)$lengths,seq_len))
d$s <- 1:nrow(d)
with(d,paste(id[order(r,s)],collapse = ","))

Resources