How can I transpose dataset in R [duplicate] - r

This question already has answers here:
Transpose / reshape dataframe without "timevar" from long to wide format
(9 answers)
Closed 5 years ago.
I have a dataset A shown as below. How can I transform dataset A to dataset B. Dataset A contains over 10,000 observations in my file. Is there any easy way to do it?
Dataset A:
Line 1:AB 12 23
Line 2:AB 34 56
Line 3:CD 78 90
Line 4:EF 13 45
Dataset B:
Line 1:AB 12 23 34 56
Line 2:CD 78 90 NA NA
Line 3:EF 13 45 NA NA

Try this by using cSplit
library(splitstackshape)
library(dplyr)
DatA['new']=apply(DatA[,-1], 1, paste, collapse=",")
DatA=DatA%>%group_by(Alphabet)%>%summarise(new=paste(new,collapse=','))
cSplit(DatA, 2, drop = TRUE,sep=',')
Alphabet new_1 new_2 new_3 new_4
1: AB 12 23 34 56
2: CD 78 90 NA NA
3: EF 13 45 NA NA
Data input
DatA <- data.frame(Alphabet = c("AB", "AB", "CD","EF"),
Value1 = c(12,34,78,13),Value2 = c(23,56,90,45),stringsAsFactors = F)

Related

Transforming big dataframe in more sensible form [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Reshaping wide to long with multiple values columns [duplicate]
(5 answers)
Closed 1 year ago.
Dataframe consist of 3 rows: wine_id, taste_group and and evaluated matching score for each of that group:
wine_id
taste_group
score
22
tree_fruit
87
22
citrus_fruit
98
22
tropical_fruit
17
22
earth
8
22
microbio
6
22
oak
7
22
vegetal
1
How to achieve to make a separate column for each taste_group and to list scores in rows?
Hence this:
wine_id
tree_fruit
citrus_fruit
tropical_fruit
earth
microbio
oak
vegetal
22
87
98
17
8
6
7
1
There are 13 taste groups overall, along with more than 6000 Wines.
If the wine doesn't have a score for taste_group row takes value 0.
I used
length(unique(tastes$Group))
length(unique(tastes$Wine_Id))
in R to question basic measures.
How to proceed to wanted format?
Assuming your dataframe is named tastes, you'll want something like:
library(tidyr)
tastes %>%
# Get into desired wide format
pivot_wider(names_from = taste_group, values_from = score, values_fill = 0)
In R, this is called as the long-to-wide reshaping, you can also use dcast to do that.
library(data.table)
dt <- fread("
wine_id taste_group score
22 tree_fruit 87
22 citrus_fruit 98
22 tropical_fruit 17
22 earth 8
22 microbio 6
22 oak 7
22 vegetal 1
")
dcast(dt, wine_id ~ taste_group, value.var = "score")
#wine_id citrus_fruit earth microbio oak tree_fruit tropical_fruit vegetal
# <int> <int> <int> <int> <int> <int> <int> <int>
# 22 98 8 6 7 87 17 1
Consider reshape:
wide_df <- reshape(
my_data,
timevar="taste_group",
v.names = "score",
idvar = "wine_id",
direction = "wide"
)

Find lowest value in three columns in R [duplicate]

This question already has answers here:
min for each row in a data frame
(4 answers)
Closed 2 years ago.
I have a dataframe with 400 people who each have three predicted values (so 400 rows, 3 columns). Now I need a function that writes me the lowest of these three values into a variable, so that every person has the best prediction in a fourth column. I can't find any possibility, so I would be very thankful for your help!
Imagine you had 3 columns named Score1, Score2, and Score3. You might use apply as follows:
data$MinScore <- apply(data[,c("Score1","Score2","Score3")],1,min)
head(data)
Person Score1 Score2 Score3 MinScore
1 Person1 11 90 73 11
2 Person2 60 85 76 60
3 Person3 20 16 36 16
4 Person4 95 87 66 66
5 Person5 99 81 20 20
6 Person6 42 79 80 42
Sample Data
data <- data.frame(Person = paste0("Person", 1:400),Score1 = sample(1:100,100),Score2 = sample(1:100,100),Score3 = sample(1:100,100))

How to quote the grouped data frame it self in the function in ddply()

It is possible to apply certain function in the grouping of data frame by certain variables with ddply(), but how to quote the grouped data frame as the argument of the function?
Take min() as an EXAMPLE:
What I have:
> BodyWeight
Treatment day1 day2 day3
1 a 32 33 36
2 a 35 35 26
3 a 33 38 46
4 b 23 24 25
5 b 22 16 34
6 b 36 35 37
7 c 45 45 39
8 c 29 26 12
9 c 43 27 36
What I want:
Treatment min
1 a 26
2 b 16
3 c 12
What I did and what I got:
> ddply(BodyWeight, .(Treatment), summarize, min= min(BodyWeight[,-1]))
Treatment min
1 a 12
2 b 12
3 c 12
The min() is just an example, unspecific solutions are desired.
What you want is to summarize by Treatment and Day. The issue is you have days in multiple columns. You need to convert your data from the wide format its in (multiple columns) into a long format (key-value pairs).
library(tidyr)
library(plyr)
bw_long <- gather(Bodyweight, day, value, day1:day3)
ddply(bw_long, .(Treatment, day), summarize, min = min(value))
p.s. Check out the successor to plyr, dplyr
We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(BodyWeight)), grouped by 'Treatment', unlist the Subset of Data.table (.SD) and get the min value.
library(data.table)
setDT(BodyWeight)[, .(min = min(unlist(.SD))) , by = Treatment]
# Treatment min
#1: a 26
#2: b 16
#3: c 12

R Split a column into multiple column by pattern [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to separate the digits and character in a column of a dataframe d.df:
col1
ab 12 14 56
xb 23 234 2342 2
ad 23 45
Expected output:
col1 col2
ab 12 14 56
xb 23 234 2342 2
ad 23 45
I recognize it will be something similar to this, but I'm not sure about the separators
t <- as.data.frame(str_match(d$col1,"^(.*)"))
I tried many methods and the output was:
col1 col2
a b 12 14 56
x b 23 234 2342 2
a d 23 45
You can use separate from tidyr.
library(tidyr)
d.df %>% separate(col1, c("col1", "col2"), sep="(?<=[a-z]{2} )")
# col1 col2
# 1 ab 12 14 56
# 2 xb 23 234 2342 2
# 3 ad 23 45
The regex, "(?<=[a-z]{2} )", is a look-behind, meaning "split at the position in the string after two lower case characters followed by a space". tidyr seems to have a limit on the length of look-behinds, so {2} is used to specify the number of letters.
Here is an option with data.table.
library(data.table)#v1.9.5+
setnames(setDT(df1)[, tstrsplit(col1,
'(?<=[^0-9]) (?=[0-9])', perl=TRUE)], paste0('col', 1:2))[]
# col1 col2
#1: ab 12 14 56
#2: xb 23 234 2342 2
#3: ad 23 45
We convert the 'data.frame' to 'data.table' (setDT(df1)). Using tstrsplit from the devel version of 'data.table', split at the space in 'col1' by matching the space after a letter and before a numeric part. We use regex lookarounds ((?<=[^0-9]) and ((?=[0-9])) for matching.
The approach here will vary significantly depending on whether this is actually how your strings look like or just an example. If they are always two letters and numbers, you can substring:
> df <- data.frame(col1 = c("ab 12 14 56", "xb 23 234 2342 2", "ad 23 45"))
>
> df$col1.1 <- sapply(df$col1, substring, 0, 2)
>
> df$col1.2 <- sapply(df$col1, substring, 3)
>
> df
col1 col1.1 col1.2
1 ab 12 14 56 ab 12 14 56
2 xb 23 234 2342 2 xb 23 234 2342 2
3 ad 23 45 ad 23 45
If the length and positions of the strings change, regex might be better suited. Using a base R approach, you can extract only the numbers or letters (keeping the white spaces):
> df <- data.frame(col1 = c("ab 12 14 56", "xb 23 234 2342 2", "ad 23 45"))
> df$col1.1 <- sapply(regmatches(df$col1, gregexpr("[a-zA-Z]", df$col1)), paste, collapse = "")
> df$col1.2 <- sapply(regmatches(df$col1, gregexpr("[0-9]\\s*", df$col1)), paste, collapse = "")
> df
col1 col1.1 col1.2
1 ab 12 14 56 ab 12 14 56
2 xb 23 234 2342 2 xb 23 234 2342 2
3 ad 23 45 ad 23 45

Pivot table from column to row in excel or R [duplicate]

This question already has answers here:
Reshape data from wide to long? [duplicate]
(3 answers)
Closed 9 years ago.
I have a table with header like this
Id x.1960 x.1970 x.1980 x.1990 x.2000 y.1960 y.1970 y.1980 y.1990 y.2000
I want to pivot this table as
Id time x y
What is the best way to do this in Excel or R?
Something like this using base R reshape:
Get some data first
test <- read.table(text="Id x.1960 x.1970 x.1980 x.1990 x.2000 y.1960 y.1970 y.1980 y.1990 y.2000
a 1 2 3 4 5 6 7 8 9 10
b 10 20 30 40 50 60 70 80 90 100",header=TRUE)
Then reshape:
reshape(
test,
idvar="Id",
varying=list(2:6,7:11),
direction="long",
v.names=c("x","y"),
times=seq(1960,2000,10)
)
Or let reshape guess the names automatically based on the . separator:
reshape(
test,
idvar="Id",
varying=-1,
direction="long",
sep="."
)
Resulting in:
Id time x y
a.1960 a 1960 1 6
b.1960 b 1960 10 60
a.1970 a 1970 2 7
b.1970 b 1970 20 70
a.1980 a 1980 3 8
b.1980 b 1980 30 80
a.1990 a 1990 4 9
b.1990 b 1990 40 90
a.2000 a 2000 5 10
b.2000 b 2000 50 100

Resources