Merge records in data frame in R [duplicate] - r

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Convert data from long format to wide format with multiple measure columns
(6 answers)
Closed 6 years ago.
I have the following example data set:
data.frame(SEX=c("M","F","M","F"),COMPLAINT=c("headache","headache", "dizziness", "dizziness"),
reports=c(5,4,9,12), users = c(1250,3460,2500,1850))
SEX COMPLAINT reports users
1 M headache 5 1250
2 F headache 4 3460
3 M dizziness 9 2500
4 F dizziness 12 1850
My question is how to merge rows 1 and 2 , and 3 and 4 so that my data frame is as follows:
COMPLAINT reports_male reports_female users_male users_female
1 headache 5 4 1250 3460
2 dizziness 9 12 2500 1850
Anyone got a quick solution that I can use for a (much) larger dataset?

We can use the dcast from data.table which can take multiple value.var columns and is quite efficient on big datasets
library(data.table)
dcast(setDT(df1), COMPLAINT ~ SEX, value.var = c("reports", "users"))
# COMPLAINT reports_F reports_M users_F users_M
#1: dizziness 12 9 1850 2500
#2: headache 4 5 3460 1250

As seen in How to reshape data from long to wide format?, we can use library(reshape2) and then
reshape(df, idvar = "COMPLAINT", timevar = "SEX", direction = "wide").
COMPLAINT reports.M users.M reports.F users.F
1 headache 5 1250 4 3460
3 dizziness 9 2500 12 1850

Related

Performing the colsum based on row values [duplicate]

This question already has answers here:
Calculate the mean by group
(9 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
Hi I have 3 data set with contains the items and counts. I need to add the all data sets and combine the count based on the item names. He is my input.
Df1 <- data.frame(items =c("Cookies", "Candys","Toys","Games"), Counts = c( 10,20,30,5))
Df2 <- data.frame(items =c( "Candys","Cookies","Toys"), Counts = c( 5,21,20))
Df3 <- data.frame(items =c( "Playdows","Gummies","Candys"), Counts = c(10,15,20))
Df_all <- rbind(Df1,Df2,Df3)
Df_all
items Counts
1 Cookies 10
2 Candys 20
3 Toys 30
4 Games 5
5 Candys 5
6 Cookies 21
7 Toys 20
8 Playdows 10
9 Gummies 15
10 Candys 20
I need to combine the columns based on the item values. Delete the Row after adding the values. My output should be
items Counts
1 Cookies 31
2 Candys 45
3 Toys 50
4 Games 5
5 Playdows 10
6 Gummies 15
Could you help in getting this output in r.
use dplyr:
library(dplyr)
result<-Df_all%>%group_by(items)%>%summarize(sum(Counts))
> result
# A tibble: 6 x 2
items `sum(Counts)`
<fct> <dbl>
1 Candys 45.0
2 Cookies 31.0
3 Games 5.00
4 Toys 50.0
5 Gummies 15.0
6 Playdows 10.0
You can use tapply
tapply(Df_all$Counts, Df_all$items, FUN=sum)
what returns
Candys Cookies Games Toys Gummies Playdows
45 31 5 50 15 10

Transpose column and group dataframe [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 5 years ago.
I'm trying to change a dataframe in R to group multiple rows by a measurement. The table has a location (km), a size (mm) a count of things in that size bin, a site and year. I want to take the sizes, make a column from each one (2, 4 and 6 in this example), and place the corresponding count into each the row for that location, site and year.
It seems like a combination of transposing and grouping, but I can't figure out a way to accomplish this in R. I've looked at t(), dcast() and aggregate(), but those aren't really close at all.
So I would go from something like this:
df <- data.frame(km=c(rep(32,3),rep(50,3)), mm=rep(c(2,4,6),2), count=sample(1:25,6), site=rep("A", 6), year=rep(2013, 6))
km mm count site year
1 32 2 18 A 2013
2 32 4 2 A 2013
3 32 6 12 A 2013
4 50 2 3 A 2013
5 50 4 17 A 2013
6 50 6 21 A 2013
To this:
km site year mm_2 mm_4 mm_6
1 32 A 2013 18 2 12
2 50 A 2013 3 17 21
Edit: I tried the solution in a suggested duplicate, but I did not work for me, not really sure why. The answer below worked better.
As suggested in the comment above, we can use the sep argument in spread:
library(tidyr)
spread(df, mm, count, sep = "_")
km site year mm_2 mm_4 mm_6
1 32 A 2013 4 20 1
2 50 A 2013 15 14 22
As you mentioned dcast(), here is a method using it.
set.seed(1)
df <- data.frame(km=c(rep(32,3),rep(50,3)),
mm=rep(c(2,4,6),2),
count=sample(1:25,6),
site=rep("A", 6),
year=rep(2013, 6))
library(reshape2)
dcast(df, ... ~ mm, value.var="count")
# km site year 2 4 6
# 1 32 A 2013 13 10 20
# 2 50 A 2013 3 17 1
And if you want a bit of a challenge you can try the base function reshape().
df2 <- reshape(df, v.names="count", idvar="km", timevar="mm", ids="mm", direction="wide")
colnames(df2) <- sub("count.", "mm_", colnames(df2))
df2
# km site year mm_2 mm_4 mm_6
# 1 32 A 2013 13 10 20
# 4 50 A 2013 3 17 1

Should I use for loop? OR apply? [duplicate]

This question already has answers here:
Split dataframe by levels of a factor and name dataframes by those levels
(3 answers)
Closed 5 years ago.
this is my first post.
I have this dataframe of the Nhl draft.
What I would like to do is to use some sort of recursive function to create 10 objects.
So, I want to create these 10 objects by subsetting the Nhl dataframe by Year.
Here are the first 6 rows of the data set (nhl_draft)
Year Overall Team
1 2000 1 New York Islanders
2 2000 2 Atlanta Thrashers
3 2000 3 Minnesota Wild
4 2000 4 Columbus Blue Jackets
5 2000 5 New York Islanders
6 2000 6 Nashville Predators
Player PS
1 Rick DiPietro 49.3
2 Dany Heatley 95.2
3 Marian Gaborik 103.6
4 Rostislav Klesla 34.5
5 Raffi Torres 28.4
6 Scott Hartnell 74.5
I want to create 10 objects by subsetting out the Years, 2000 ~ 2009.
I tried,
for (i in 2000:2009) {
nhl_draft.i <- subset(nhl_draft, Year == "i")
}
BUT this doesn't do anything. What's the problem with this for-loop? Can you suggest any other ways?
Please tell me if this is confusing after all, this is my first post......
The following code may fix your error.
# Create an empty list
nhl_list <- list()
for (i in 2000:2009) {
# Subset the data frame based on Year
nhl_draft_temp <- subset(nhl_draft, Year == i)
# Assign the subset to the list
nhl_list[[as.character(i)]] <- nhl_draft_temp
}
But you can consider split, which is more concise.
nhl_list <- split(nhl_draft, f = nhl_draft$Year)

Pivot table from column to row in excel or R [duplicate]

This question already has answers here:
Reshape data from wide to long? [duplicate]
(3 answers)
Closed 9 years ago.
I have a table with header like this
Id x.1960 x.1970 x.1980 x.1990 x.2000 y.1960 y.1970 y.1980 y.1990 y.2000
I want to pivot this table as
Id time x y
What is the best way to do this in Excel or R?
Something like this using base R reshape:
Get some data first
test <- read.table(text="Id x.1960 x.1970 x.1980 x.1990 x.2000 y.1960 y.1970 y.1980 y.1990 y.2000
a 1 2 3 4 5 6 7 8 9 10
b 10 20 30 40 50 60 70 80 90 100",header=TRUE)
Then reshape:
reshape(
test,
idvar="Id",
varying=list(2:6,7:11),
direction="long",
v.names=c("x","y"),
times=seq(1960,2000,10)
)
Or let reshape guess the names automatically based on the . separator:
reshape(
test,
idvar="Id",
varying=-1,
direction="long",
sep="."
)
Resulting in:
Id time x y
a.1960 a 1960 1 6
b.1960 b 1960 10 60
a.1970 a 1970 2 7
b.1970 b 1970 20 70
a.1980 a 1980 3 8
b.1980 b 1980 30 80
a.1990 a 1990 4 9
b.1990 b 1990 40 90
a.2000 a 2000 5 10
b.2000 b 2000 50 100

Reshape data from wide to long? [duplicate]

This question already has answers here:
Reshaping data.frame from wide to long format
(8 answers)
Closed 6 years ago.
How do I reshape this wide data: (from a csv file)
Name Code Indicator 1960 1961 1962
Into this long format?
Name Code Indicator Year
the reshape2 package does this nicely with the function melt.
yourdata_melted <- melt(yourdata, id.vars=c('Name', 'Code', 'Indicator'), variable.name='Year')
This will add a column of value that you can drop. yourdata_melted$value <- NULL
And just because I like to continue my campaign for using base R functions:
Test data:
test <- data.frame(matrix(1:12,nrow=2))
names(test) <- c("name","code","indicator","1960","1961","1962")
test
name code indicator 1960 1961 1962
1 1 3 5 7 9 11
2 2 4 6 8 10 12
Now reshape it!
reshape(
test,
idvar=c("name","code","indicator"),
varying=c("1960","1961","1962"),
timevar="year",
v.names="value",
times=c("1960","1961","1962"),
direction="long"
)
# name code indicator year value
#1.3.5.1960 1 3 5 1960 7
#2.4.6.1960 2 4 6 1960 8
#1.3.5.1961 1 3 5 1961 9
#2.4.6.1961 2 4 6 1961 10
#1.3.5.1962 1 3 5 1962 11
#2.4.6.1962 2 4 6 1962 12
With tidyr
gather(test, "time", "value", 4:6)
Data
test <- data.frame(matrix(1:12,nrow=2))
names(test) <- c("name","code","indicator","1960","1961","1962")

Resources