This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 7 years ago.
I have a dataframe which looks like this -
>df
words num
his 92
his 91
there 91
you 90
who 90
come 89
you 70
Now, I want to aggregate the common words and sum their num column. So my final output would be like -
words num
his 183
there 91
you 160
who 90
come 89
How can I do this in R? I know it is something simple but not able to figure it out.
Convert to a data.table:
setDT(df)
df[, sum(num), by=words]
words V1
1: his 183
2: there 91
3: you 160
4: who 90
5: come 89
Related
I am trying to create a limit order book and in one of the functions I want to return a list that sums the column 'size' for the ask dataframe and the bid dataframe in the limit order book.
The output should be...
$ask
oid price size
8 a 105 100
7 o 104 292
6 r 102 194
5 k 99 71
4 q 98 166
3 m 98 88
2 j 97 132
1 n 96 375
$bid
oid price size
1 b 95 100
2 l 95 29
3 p 94 87
4 s 91 102
Total volume: 318 1418
Where the input is...
oid,side,price,size
a,S,105,100
b,B,95,100
I have a function book.total_volumes <- function(book, path) { ... } that should return total volumes.
I tried to use aggregate but struggled with the fact that it is both ask and bid in the limit order book.
I appreciate any help, I am clearly a complete beginner. Only hear to learn :)
If there is anything more I can add to this question so is more clear feel free to leave a comment!
This question already has answers here:
Replacing values from a column using a condition in R
(2 answers)
Closed 4 years ago.
Here is my code
nutrients<- read.csv("nutrients.csv", head = TRUE, sep = ",")
> plot(nutrients)
> head(nutrients)
crop Nutrient.dens N..tons.acre. P2O5 K2O sum.nut
1 broccoli 340.0 210 245 100 555
2 carrot 458.0 70 250 50 370
3 cauliflower 315.0 25 35 80 140
4 letuce 318.5 165 150 90 405
5 onion 109.0 120 30 150 300
6 tomato 186.0 175 85 275 535
> df_nutrients<- as.data.frame(nutrients)
> df_nutrients<- df_nutrients[1,1=="broc"]
I am sure this is easy, and Ive tried searching anything i can find to get the answer but i cannot find it. I just need to change that one variable to "broc". is there a specific function i need or something?
If crop is a character type, then a simple subset should work
nutrients$crop[nutrients$crop == "broccoli"] <- "broc"
If crop is a factor, then use this:
levels(nutrients$crop)[levels(nutrients$crop) == "broccoli"] <- "proc"
This question already has answers here:
Summarizing multiple columns with dplyr? [duplicate]
(5 answers)
Closed 5 years ago.
I have a large dataframe in RStudio (15,000 rows, 300 columns) and its a mess. It looks somewhat like this:
ID Exam1 Exam2 Exam3..... Exam299
1 75 76 99 100
2 25 25 25 25
2 22 20 22 22
2 25 25 20 22
2 20 20 25 23
3 79 88 92 96
For each individual student ID I want to add all the individual columns so each student only has 1 row associated with him/her. It should look like this:
ID Exam1 Exam2 Exam3 Exam299
1 75 76 99 100
2 92 90 92 92
3 79 88 92 96
Everything I've tried sums only one column at a time and/or combines entries without summing them:
aggregate(ID~Exam1, data=df, c)
You can use this:
df.sum <- aggregate(. ~ ID, data=df, FUN=sum)
You can also use data.table library:
require(data.table)
dt <- data.table(df)
dt.sum <- dt[, lapply(.SD, sum), by = ID]
I think you can also use dplyr package too for this, but don't have the solution off the top of my head.
This question already has answers here:
How to join (merge) data frames (inner, outer, left, right)
(13 answers)
Closed 6 years ago.
Here is my data Frame "crime":
District Premise Weapon
313 99 99
316 NA
314 20 99
312 13 40
312 9 99
I have a separate list of what all the codes mean. For example, 99 in premise means "residence", 20 means "Street". 99 in Weapon means "hands", 40 means "blunt object".
In another post on stackflow, I was able to use the following code for my purpose:
crime$Premise[crime$Premise == 13] <- "House"
This worked but I realized I have 30 different codes in Premise and Weapon. There has to be a more efficient way of writing the code instead of copy and pasting the code above in multiple times and replacing the integer with the string.
*note, 99 means something else under Premise and something else in Weapon.
What is the best way to write this, so I can replace all the numbers with corresponding codes? Thank you in advance!
if you don't want to build an index vector, you can use recode :
x<-data.frame(district=c(313,316,314),premise=c(99,NA,20),Weapon=c(99,"",99))
district premise Weapon
1 313 99 99
2 316 NA
3 314 20 99
x$premise<-recode(x$premise,"99"="residance","20"="Street")
district premise Weapon
1 313 residance 99
2 316 <NA>
3 314 Street 99
This question already has an answer here:
Delete "" from csv values and change column names when writing to a CSV
(1 answer)
Closed 5 years ago.
I have the following data in a file called "data.txt":
pid 1 2 4 15 18 20
1_at 100 200 89 189 299 788
2_at 8 78 33 89 90 99
3_xt 300 45 53 234 89 34
4_dx 49 34 88 8 9 15
The data is separated by tabs.
Now I wanted to extract some columns on that table, based on the information of csv file called "vector.csv", this vector got the following data:
18,1,4,20
So I wanted to end with a modified file "datamod.txt" separated with tabs that would be:
pid 18 1 4 20
1_at 299 100 89 788
2_at 90 8 33 99
3_xt 89 300 53 34
4_dx 9 49 88 15
I have made, with some help, the following code:
fileName="vector.csv"
con=file(fileName,open="r")
controlfile<-readLines(con)
controls<-controlfile[1]
controlins<-controlfile[2]
test<-paste("pid",controlins,sep=",")
test2<-c(strsplit(test,","))
test3<-c(do.call("rbind",test2))
df<-read.table("data.txt",header=T,check.names=F)
CC <- sapply(df, class)
CC[!names(CC) %in% test3] <- "NULL"
df <- read.table("data.txt", header=T, colClasses=CC,check.names=F)
df<-df[,test3]
write.table(df,"datamod.txt",row.names=FALSE,sep="\t")
The problem that I got is that my resulting file has the following format:
"pid" "18" "1" "4" "20"
"1_at" 299 100 89 788
"2_at" 90 8 33 99
"3_xt" 89 300 53 34
"4_dx" 9 49 88 15
The question I have is how to avoid those quotation "" marks that appear in my saved file, so that the data appears like I would like to.
Any help?
Thanks
To quote from the help file for write.table
quote
a logical value (TRUE or FALSE) or a numeric vector. If TRUE,
any character or factor columns will be surrounded by double quotes.
If a numeric vector, its elements are taken as the indices of columns
to quote. In both cases, row and column names are quoted if they are
written. If FALSE, nothing is quoted.
Therefore
write.table(df,"datamod.txt",row.names=FALSE,sep="\t", quote = FALSE)
should work nicely.