Last name, First Name to First Name Last Name - r

I have a set of names in last, first format
Name Pos Team Week.x Year.x GID.x h.a.x Oppt.x Week1Points DK.salary.x Week.y Year.y GID.y
1 Abdullah, Ameer RB det 1 2015 2995 a sdg 19.4 4000 2 2015 2995
2 Adams, Davante WR gnb 1 2015 5263 a chi 9.9 4400 2 2015 5263
3 Agholor, Nelson WR phi 1 2015 5378 a atl 1.5 5700 2 2015 5378
4 Aiken, Kamar WR bal 1 2015 5275 a den 0.9 3300 2 2015 5275
5 Ajirotutu, Seyi WR phi 1 2015 3877 a atl 0.0 3000 NA NA NA
6 Allen, Dwayne TE ind 1 2015 4551 a buf 10.7 3400 2 2015 4551
That is just the fist 6 lines. I would like to flip the names to First name Last Name. Here is what I tried.
> strsplit(DKPoints$Name, split = ",")
This splits the name variable, but there are white spaces, so to clear them I tried,
> str_trim(splitnames)
But the results did not come out right. Here is what they look like.
[1] "c(\"Abdullah\", \" Ameer\")" "c(\"Adams\", \" Davante\")"
[3] "c(\"Agholor\", \" Nelson\")" "c(\"Aiken\", \" Kamar\")"
[5] "c(\"Ajirotutu\", \" Seyi\")" "c(\"Allen\", \" Dwayne\")"
Any advice? I would like to get a column for the data frame to look like
Ameer Abdullah
Davabte Adams
Nelson Agholor
Kamar Aiken
Any advice would be much appreciated. Thanks

sub("(\\w+),\\s(\\w+)","\\2 \\1", df$name)
(\\w+) matches the names, ,\\s matches ", "(comma and space), \\2 \\1 returns the names in opposite order.

Assuming all names are "Lastname, firstname" you could do something like this:
names <- c("A, B","C, D","E, F")
newnames <- sapply(strsplit(names, split=", "),function(x)
{paste(rev(x),collapse=" ")})
> newnames
[1] "B A" "D C" "F E"
It splits each name on ", " and then pastes things back together in reverse order.
Edit: probably no problem for small datasets, but the other solutions provided are a lot faster. Microbenchmark results for 100.000 'names':
Unit: milliseconds
expr min lq mean median uq max neval cld
heroka 1103.0419 1242.6418 1276.7765 1274.6746 1311.1218 1557.8579 50 c
lyzander 149.4466 177.0036 206.4558 191.1249 218.1756 345.7960 50 b
johannes 142.7585 144.5943 151.0078 146.0602 147.1980 284.2589 50 a

One way using srt_split_fixed:
library(stringr)
#split Name into two columns
splits <- str_split_fixed(df$Name, ", ", 2)
#now merge these two columns the other way round
df$Name <- paste(splits[,2], splits[,1], sep = ' ')
Output:
Name Pos Team Week.x Year.x GID.x h.a.x Oppt.x Week1Points DK.salary.x Week.y Year.y GID.y
1 Ameer Abdullah RB det 1 2015 2995 a sdg 19.4 4000 2 2015 2995
2 Davante Adams WR gnb 1 2015 5263 a chi 9.9 4400 2 2015 5263
3 Nelson Agholor WR phi 1 2015 5378 a atl 1.5 5700 2 2015 5378
4 Kamar Aiken WR bal 1 2015 5275 a den 0.9 3300 2 2015 5275
5 Seyi Ajirotutu WR phi 1 2015 3877 a atl 0.0 3000 NA NA NA
6 Dwayne Allen TE ind 1 2015 4551 a buf 10.7 3400 2 2015 4551

Try this one:
df$Name2<-paste(gsub("^.+\\,","",df$Name),gsub("\\,.+$","",df$Name),sep=" ")
where df is your data frame.

Related

Split numeric variables by decimals in R

I have a data frame with a column that contains numeric values, which represent the price.
ID
Total
1124
12.34
1232
12.01
1235
13.10
I want to split the column Total by "." and create 2 new columns with the euro and cent amount. Like this:
ID
Total
Euro
Cent
1124
12.34
12
34
1232
12.01
12
01
1235
13.10
13
10
1225
13.00
13
00
The euro and cent column should also be numeric.
I tried:
df[c('Euro', 'Cent')] <- str_split_fixed(df$Total, "(\\.)", 2)
But I get 2 new columns of type character that looks like this:
ID
Total
Euro
Cent
1124
12.34
12
34
1232
12.01
12
01
1235
13.10
13
1
1225
13.00
13
If I convert the character columns (euro and cent) to numeric like this:
as.numeric(df$Euro)
the 00 cent value turns into NULL and the 10 cent turn into 1 cent.
Any help is welcome.
Two methods:
If class(dat$Total) is numeric, you can do this:
dat <- transform(dat, Euro = Total %/% 1, Cent = 100 * (Total %% 1))
dat
# ID Total Euro Cent
# 1 1124 12.34 12 34
# 2 1232 12.01 12 1
# 3 1235 13.10 13 10
%/% is the integer-division operator, %% the modulus operator.
If class(dat$Total) is character, then
dat <- transform(dat, Euro = sub("\\..*", "", Total), Cent = sub(".*\\.", "", Total))
dat
# ID Total Euro Cent
# 1 1124 12.34 12 34
# 2 1232 12.01 12 01
# 3 1235 13.10 13 10
The two new columns are also character. For this, you may want one of two more steps:
Removing leading 0s, and keep them character:
dat[,c("Euro", "Cent")] <- lapply(dat[,c("Euro", "Cent")], sub, pattern = "^0+", replacement = "")
dat
# ID Total Euro Cent
# 1 1124 12.34 12 34
# 2 1232 12.01 12 1
# 3 1235 13.10 13 10
Convert to numbers:
dat[,c("Euro", "Cent")] <- lapply(dat[,c("Euro", "Cent")], as.numeric)
dat
# ID Total Euro Cent
# 1 1124 12.34 12 34
# 2 1232 12.01 12 1
# 3 1235 13.10 13 10
(You can also use as.integer if you know both columns will always be such.)
Just use standard numeric functions:
df$Euro <- floor(df$Total)
df$Cent <- df$Total %% 1 * 100

Merge different dataset

I have a question, I need to merge two different dataset in one but they have a different class. How I can I do? rbind doesn't work, ideas?
nycounties <- rgdal::readOGR("https://raw.githubusercontent.com/openpolis/geojson-italy/master/geojson/limits_IT_provinces.geojson")
city <- c("Novara", "Milano","Torino","Bari")
dimension <- c("150000", "5000000","30000","460000")
df <- cbind(city, dimension)
total <- rbind(nycounties,df)
Are you looking for something like this?
nycounties#data = data.frame(nycounties#data,
df[match(nycounties#data[, "prov_name"],
df[, "city"]),])
Output
nycounties#data[!is.na(nycounties#data$dimension),]
prov_name prov_istat_code_num prov_acr reg_name reg_istat_code reg_istat_code_num prov_istat_code city dimension
0 Torino 1 TO Piemonte 01 1 001 Torino 30000
2 Novara 3 NO Piemonte 01 1 003 Novara 150000
12 Milano 15 MI Lombardia 03 3 015 Milano 5000000
81 Bari 72 BA Puglia 16 16 072 Bari 460000

How can I change row and column indexes of a dataframe in R?

I have a dataframe in R which has three columns Product_Name(name of books), Year and Units (number of units sold in that year) which looks like this:
Product_Name Year Units
A Modest Proposal 2011 10000
A Modest Proposal 2012 11000
A Modest Proposal 2013 12000
A Modest Proposal 2014 13000
Animal Farm 2011 8000
Animal Farm 2012 9000
Animal Farm 2013 11000
Animal Farm 2014 15000
Catch 22 2011 1000
Catch 22 2012 2000
Catch 22 2013 3000
Catch 22 2014 4000
....
I intend to make a R Shiny dashboard with that where I want to keep the year as a drop-down menu option, for which I wanted to have the dataframe in the following format
A Modest Proposal Animal Farm Catch 22
2011 10000 8000 1000
2012 11000 9000 2000
2013 12000 11000 3000
2014 13000 15000 4000
or the other way round where the Product Names are row indexes and Years are column indexes, either way goes.
How can I do this in R?
Your general issue is transforming long data to wide data. For this, you can use data.table's dcast function (amongst many others):
dt = data.table(
Name = c(rep('A', 4), rep('B', 4), rep('C', 4)),
Year = c(rep(2011:2014, 3)),
Units = rnorm(12)
)
> dt
Name Year Units
1: A 2011 -0.26861318
2: A 2012 0.27194732
3: A 2013 -0.39331361
4: A 2014 0.58200101
5: B 2011 0.09885381
6: B 2012 -0.13786098
7: B 2013 0.03778400
8: B 2014 0.02576433
9: C 2011 -0.86682584
10: C 2012 -1.34319590
11: C 2013 0.10012673
12: C 2014 -0.42956207
> dcast(dt, Year ~ Name, value.var = 'Units')
Year A B C
1: 2011 -0.2686132 0.09885381 -0.8668258
2: 2012 0.2719473 -0.13786098 -1.3431959
3: 2013 -0.3933136 0.03778400 0.1001267
4: 2014 0.5820010 0.02576433 -0.4295621
For the next time, it is easier if you provide a reproducible example, so that the people assisting you do not have to manually recreate your data structure :)
You need to use pivot_wider from tidyr package. I assumed your data is saved in df and you also need dplyr package for %>% (piping)
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = Product_Name, values_from = Units)
Assuming that your dataframe is ordered by Product_Name and by year, I will generate artificial data similar to your datafrme, try this:
Col_1 <- sort(rep(LETTERS[1:3], 4))
Col_2 <- rep(2011:2014, 3)
# artificial data
resp <- ceiling(rnorm(12, 5000, 500))
uu <- data.frame(Col_1, Col_2, resp)
uu
# output is
Col_1 Col_2 resp
1 A 2011 5297
2 A 2012 4963
3 A 2013 4369
4 A 2014 4278
5 B 2011 4721
6 B 2012 5021
7 B 2013 4118
8 B 2014 5262
9 C 2011 4601
10 C 2012 5013
11 C 2013 5707
12 C 2014 5637
>
> # Here starts
> output <- aggregate(uu$resp, list(uu$Col_1), function(x) {x})
> output
Group.1 x.1 x.2 x.3 x.4
1 A 5297 4963 4369 4278
2 B 4721 5021 4118 5262
3 C 4601 5013 5707 5637
>
output2 <- output [, -1]
colnames(output2) <- levels(as.factor(uu$Col_2))
rownames(output2) <- levels(as.factor(uu$Col_1))
# transpose the matrix
> t(output2)
A B C
2011 5297 4721 4601
2012 4963 5021 5013
2013 4369 4118 5707
2014 4278 5262 5637
> # or convert to data.frame
> as.data.frame(t(output2))
A B C
2011 5297 4721 4601
2012 4963 5021 5013
2013 4369 4118 5707
2014 4278 5262 5637

Calculation with apply

I have one table with five columns Year,Revenue,Pensions,Income and Wages.With this table I made calculation with code below:
library(dplyr)
#DATA
TEST<-data.frame(
Year= c(2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021),
Revenue =c(8634,5798,6022,6002,6266,6478,6732,7224,6956,6968,7098,7620,7642,8203,9856,20328,22364,22222,23250,25250,26250,27250),
Pensions =c(8734,5798,7011,7002,7177,7478,7731,7114,7957,7978,7098,7710,7742,8203,9857,10328,11374,12211,13150,15150,17150,17150),
Income =c(8834,5898,6033,6002,6366,6488,6833,8334,6956,6968,8098,8630,8642,8203,9856,30328,33364,32233,33350,35350,36350,38350),
Wages =c(8834,5598,8044,8002,8488,8458,8534,5444,8958,8988,5098,5840,5842,8203,9858,40328,44384,42244,43450,45450,48450,45450)
)
#FUNCTION
fun1 <- function(x){ ((x - lag(x))/lag(x))*100}
#CALCULATION
ESTIMATION_0<-mutate(TEST,
Nominal_growth_Revenue=fun1(Revenue),
Nominal_growth_Pensions=fun1(Pensions),
Nominal_growth_Income=fun1(Income),
Nominal_growth_Wages=fun1(Wages)
)
But my intention is to optimize this code and to do this calculation with apply function (or something similar). Namely for this calculation I wrote 4 code line, but I like to do this with one code line. So can anybody help me with this problem ?
Assuming you have a character vector with the relevant columns:
cols <- c("Revenue", "Pensions", "Income", "Wages")
Use apply():
TEST[paste0('nomial_growth', cols)] <- apply(TEST[cols], 2, fun1)
or data.table:
library(data.table)
setDT(TEST)
TEST[, (paste0('nomial_growth', cols)) := lapply(.SD, fun1), .SDcols = cols]
You could do this:
vars_names <- paste0("Nominal_groth", names(select(TEST, -Year)))
TEST %>%
bind_cols( (TEST %>% mutate_at(vars(-Year), ~fun1(.x))) %>% select(-Year) %>% set_names(vars_names) )
Year Revenue Pensions Income Wages Nominal_grothRevenue Nominal_grothPensions Nominal_grothIncome Nominal_grothWages
1 2000 8634 8734 8834 8834 NA NA NA NA
2 2001 5798 5798 5898 5598 -32.8468844 -33.6157545 -33.2352275 -36.63119765
3 2002 6022 7011 6033 8044 3.8634012 20.9210072 2.2889115 43.69417649
4 2003 6002 7002 6002 8002 -0.3321156 -0.1283697 -0.5138405 -0.52212829
5 2004 6266 7177 6366 8488 4.3985338 2.4992859 6.0646451 6.07348163
6 2005 6478 7478 6488 8458 3.3833387 4.1939529 1.9164310 -0.35344015
7 2006 6732 7731 6833 8534 3.9209633 3.3832576 5.3175092 0.89855758
8 2007 7224 7114 8334 5444 7.3083779 -7.9808563 21.9669252 -36.20810874
9 2008 6956 7957 6956 8958 -3.7098560 11.8498735 -16.5346772 64.54812638
10 2009 6968 7978 6968 8988 0.1725129 0.2639186 0.1725129 0.33489618
11 2010 7098 7098 8098 5098 1.8656716 -11.0303334 16.2169920 -43.27992879
12 2011 7620 7710 8630 5840 7.3541843 8.6221471 6.5695233 14.55472734
13 2012 7642 7742 8642 5842 0.2887139 0.4150454 0.1390498 0.03424658
14 2013 8203 8203 8203 8203 7.3410102 5.9545337 -5.0798426 40.41424170
15 2014 9856 9857 9856 9858 20.1511642 20.1633549 20.1511642 20.17554553
16 2015 20328 10328 30328 40328 106.2500000 4.7783301 207.7110390 309.08906472
17 2016 22364 11374 33364 44384 10.0157418 10.1278079 10.0105513 10.05752827
18 2017 22222 12211 32233 42244 -0.6349490 7.3588887 -3.3898813 -4.82155732
19 2018 23250 13150 33350 43450 4.6260463 7.6897879 3.4653926 2.85484329
20 2019 25250 15150 35350 45450 8.6021505 15.2091255 5.9970015 4.60299194
21 2020 26250 17150 36350 48450 3.9603960 13.2013201 2.8288543 6.60066007
22 2021 27250 17150 38350 45450 3.8095238 0.0000000 5.5020633 -6.19195046

How to control number of decimal digits in write.table() output?

When working with data (e.g., in data.frame) the user can control displaying digits by using
options(digits=3)
and listing the data.frame like this.
ttf.all
When the user needs to paste the data in Excell like this
write.table(ttf.all, 'clipboard', sep='\t',row.names=F)
The digits parameter is ignored and numbers are not rounded.
See nice output
> ttf.all
year V1.x.x V1.y.x ratio1 V1.x.y V1.y.y ratioR V1.x.x V1.y.x ratioAL V1.x.y V1.y.y ratioRL
1 2006 227 645 35.2 67 645 10.4 150 645 23.3 53 645 8.22
2 2007 639 1645 38.8 292 1645 17.8 384 1645 23.3 137 1645 8.33
3 2008 1531 3150 48.6 982 3150 31.2 755 3150 24.0 235 3150 7.46
4 2009 1625 3467 46.9 1026 3467 29.6 779 3467 22.5 222 3467 6.40
But what is in excel (clipboard) is not rounded. How to control in in write.table()?
You can use the function format() as in:
write.table(format(ttf.all, digits=2), 'clipboard', sep='\t',row.names=F)
format() is a generic function that has methods for many classes, including data.frames. Unlike round(), it won't throw an error if your dataframe is not all numeric. For more details on the formatting options, see the help file via ?format
Adding a solution for data frame having mixed character and numeric columns. We first use mutate_if to select numeric columns then apply the round() function to them.
# install.packages('dplyr', dependencies = TRUE)
library(dplyr)
df <- read.table(text = "id year V1.x.x V1.y.x ratio1
a 2006 227.11111 645.11111 35.22222
b 2007 639.11111 1645.11111 38.22222
c 2008 1531.11111 3150.11111 48.22222
d 2009 1625.11111 3467.11111 46.22222",
header = TRUE, stringsAsFactors = FALSE)
df %>%
mutate_if(is.numeric, round, digits = 2)
#> id year V1.x.x V1.y.x ratio1
#> 1 a 2006 227.11 645.11 35.22
#> 2 b 2007 639.11 1645.11 38.22
#> 3 c 2008 1531.11 3150.11 48.22
#> 4 d 2009 1625.11 3467.11 46.22
### dplyr v1.0.0+
df %>%
mutate(across(where(is.numeric), ~ round(., digits = 2)))
#> id year V1.x.x V1.y.x ratio1
#> 1 a 2006 227.11 645.11 35.22
#> 2 b 2007 639.11 1645.11 38.22
#> 3 c 2008 1531.11 3150.11 48.22
#> 4 d 2009 1625.11 3467.11 46.22
Created on 2019-03-17 by the reprex package (v0.2.1.9000)

Resources