Metafor Package (R) - Adding Multiple Columns - r

I've currently got a data frame as per below called final
final
Gms AvgPts max min Team salary position playing.status value Players
(dbl) (dbl) (dbl) (dbl) (fctr) (dbl) (fctr) (fctr) (dbl) (chr)
1 2 87.00 113 61 STK 4300 FWD Start 20.23 Tim Membrey, STK, FWD, 4300
2 4 75.50 124 42 STK 4300 MID Start 17.56 Blake Acres, STK, MID, 4300
3 7 77.43 119 50 STK 5500 RU Start 14.08 Tom Hickey, STK, RU, 5500
4 6 87.00 110 54 WCE 6200 RU Interchange 14.03 Scott Lycett, WCE, RU, 6200
5 5 71.40 89 39 STK 5200 FWD Interchange 13.73 Jack Sinclair, STK, FWD, 5200
6 3 73.33 83 68 WCE 5400 MID Start 13.58 Mark Hutchings, WCE, MID, 5400
7 7 98.71 127 83 STK 7400 MID Interchange 13.34 Sebastian Ross, STK, MID, 7400
8 7 79.14 99 53 WCE 6100 DEF Start 12.97 Jeremy McGovern, WCE, DEF, 6100
9 7 108.29 198 67 WCE 8500 FWD Start 12.74 Josh J. Kennedy, WCE, FWD, 8500
10 6 121.00 150 57 STK 9500 FWD Start 12.74 Nick Riewoldt, STK, FWD, 9500
11 6 79.17 101 59 STK 6400 MID Start 12.37 Luke Dunstan, STK, MID, 6400
12 7 84.86 104 60 STK 7000 DEF Start 12.12 Shane Savage, STK, DEF, 7000
13 7 82.14 100 45 WCE 6900 FWD Start 11.90 Jack Darling, WCE, FWD, 6900
14 7 95.29 138 76 WCE 8100 RU Start 11.76 Nic Naitanui, WCE, RU, 8100
15 7 87.43 135 53 WCE 7500 FWD Start 11.66 Mark LeCras, WCE, FWD, 7500
16 7 74.29 92 34 WCE 6400 DEF Start 11.61 Brad Sheppard, WCE, DEF, 6400
I use the following lines of code to produce a forest plot as shown below:
forest(x = final$AvgPts, ci.lb = final$min, ci.ub = final$max, slab = final$Players,ilab = final$value, ilab.xpos = max(final$max)+10,ilab.pos =4,yaxs="i", alim = c(min(final$min)-5, max(final$max)+5),steps = 4, xlim = c(min(final$min)-200, 2*(max(final$max)+5)), xlab = "Moneyball Points Spread", efac = 0.75-.0014*k, cex = 0.75, mgp = c(1, 1, 0),refline=mean(final$AvgPts),digits=1,col="dark blue",pch = 19,main=paste("2016 Moneyball Summary - pos =",paste(as.character(pos), collapse=", "),"\n(avg >=",points,"-- value >=",val,"-- TOG >=",time,")"))
text(min(final$min)-200, (nrow(final) + 1.5), "Player",pos=4,cex=0.75)
text(max(final$max)+10, (nrow(final) + 1.5), "Value",pos=4,cex=0.75)
text(2*(max(final$max)+5), (nrow(final) + 1.5), "Average[min,max]",pos=2,cex=0.75)
What I want to be able to do is add more columns than just the Value column (ilab = final$value).
Ideally I would like a solution which would be able to fit more than just one additional column as I intend to build on the final data frame with more information.
Also, is it possible to add extra columns before AND after the plot line section?

The ilab argument can also take an entire matrix or data frame (and ilab.pos and ilab.xpos should then be vectors). See help(forest.rma) for examples. And yes, if you adjust xlim so there is sufficient space to the left and the right of the points and CI lines, you can place the information to the left and right as well (just use ilab.xpos to specify where you want the various columns placed).

Related

Reading unkown file type with strange entries into R

I am completely new at this and here, so please have mercy.
I want to open an ASCII data file in R.
After several different attempts, I have tried df=read.csv("C:MyDirectory" ,header=FALSE, sep="").
This has produced a table with several variables, but some rows clearly contain the wrong information, some cells are blank, some contain NA values.
Any ideas what has gone wrong? I have gotten the file from an offical Spanish research institute:
http://www.cis.es/cis/opencm/ES/2_bancodatos/estudios/listaTematico.jsp?tema=1&todos=si
Then BARÓMETRO DE OCTUBRE 2017, to the right is a small link entitled "fichero de datos", which allows you to download after providing them with some info. The file giving the trouble is DA3191. If anyone could go through the trouble of helping me with this, it would be awesome. Thank you.
Part 1
This looks like a fixed width format, so you need read.fwf instead of read.csv and friends. I made a screen shot of an almost random place of that file: my hypothesis is that the 99's and 98's etc are missing data codes, so the first 99 marked in yellow would belong to the same column with 4, 2, 0, etc, and the immediately following 99 (not marked) is in the same column with 0, 5, 7, etc.
Part 2
And then look at the file ES3191 -- this looks like SPSS code (pardon my French!) containing the rules about reading in the data file. You can probably figure out the width of each column and what's in there from that file:
DATA LIST FILE= 'DA3191'
/ESTU 1-4 CUES 5-9 CCAA 10-11 PROV 12-13 MUN 14-16 TAMUNI 17 CAPITAL 18 DISTR 19-20 SECCION 21-23
ENTREV 24-27 P0 28 P0A 29-31 P1 32 P2 33 P3 34 P4 35 P5 36 P6 37 P701 38-39 P702 40-41 P703 42-43
P801 44-45 P802 46-47 P803 48-49 P901 50-51 P902 52-53 P903 54-55 P904 56-57 P905 58-59 P906 60-61
P907 62-63 P1001 64 P1002 65 P1003 66 P1101 67 P1102 68 P1103 69 P1104 70 P1201 71 P1202 72
P1203 73 P1204 74 P1205 75 P1206 76 P1207 77 P1208 78 P1209 79 P13 80-81 P13A 82-83 P1401 84-85
P1402 86-87 P1403 88-89 P1404 90-91 P1405 92-93 P1406 94-95 P1407 96-97 P1408 98-99 P1409 100-101
P1410 102-103 P1411 104-105 P1412 106-107 P1413 108-109 P1414 110-111 P1415 112-113 P1416 114-115
I'm not an SPSS expert but I would guess that what it is trying to tell us is that
columns 1-4 contain the variable "ESTU"
columns 5-9 contain the variable "CUES"
etc
For read.fwf you have to calculate each variable's "width" i.e. 4 characters for ESTU (if my reading was right) 5 characters for CUES etc.
Part 3
Using the guesses above, I used the following code to read in your data, and it looks like it works:
# this is copy/pasted SPSS code from file "ES3191"
txt <- "ESTU 1-4 CUES 5-9 CCAA 10-11 PROV 12-13 MUN 14-16 TAMUNI 17 CAPITAL 18 DISTR 19-20 SECCION 21-23
ENTREV 24-27 P0 28 P0A 29-31 P1 32 P2 33 P3 34 P4 35 P5 36 P6 37 P701 38-39 P702 40-41 P703 42-43
P801 44-45 P802 46-47 P803 48-49 P901 50-51 P902 52-53 P903 54-55 P904 56-57 P905 58-59 P906 60-61
P907 62-63 P1001 64 P1002 65 P1003 66 P1101 67 P1102 68 P1103 69 P1104 70 P1201 71 P1202 72
P1203 73 P1204 74 P1205 75 P1206 76 P1207 77 P1208 78 P1209 79 P13 80-81 P13A 82-83 P1401 84-85
P1402 86-87 P1403 88-89 P1404 90-91 P1405 92-93 P1406 94-95 P1407 96-97 P1408 98-99 P1409 100-101
P1410 102-103 P1411 104-105 P1412 106-107 P1413 108-109 P1414 110-111 P1415 112-113 P1416 114-115
P1501 116-117 P1502 118-119 P1503 120-121 P1504 122-123 P1505 124-125 P1506 126-127 P1507 128-129
P1508 130-131 P1509 132-133 P1510 134-135 P1511 136-137 P1512 138-139 P1513 140-141 P1514 142-143
P1515 144-145 P1516 146-147 P16 148 P17 149 P1801 150-151 P1802 152-153 P1803 154-155 P1804 156-157
P1805 158-159 P1806 160-161 P1807 162-163 P1808 164-165 P1809 166-167 P1810 168-169 P1811 170-171
P1812 172-173 P1813 174-175 P19 176 P20 177 P21 178-179 P22 180-181 P23 182-183 P2401 184-185
P2402 186-187 P2403 188-189 P2404 190-191 P2405 192-193 P2406 194-195 P2407 196-197 P2408 198-199
P2409 200-201 P2410 202-203 P2411 204-205 P2412 206-207 P2413 208-209 P2414 210-211 P2415 212-213
P2416 214-215 P25 216 P26 217 P27 218 P27A 219-220 P28 221-222 P29 223 P30 224-225 P31 226 P31A 227-228
P32 229 P32A 230 P33 231 P34 232 P35 233 P35A 234 P36 235 P37 236 P37A 237 P37B 238 P38 239-241
P39 242 P39A 243 P40 244-246 P41 247-248 P42 249-250 P43 251 P43A 252 P43B 253 P44 254 P4501 255
P4502 256 P4503 257 P4504 258 P4601 259-261(A) P4602 262-264(A) P4603 265-267(A) P4604 268-270(A)
P4605 271-273(A) P4701 274-276(A) P4702 277-279(A) P4703 280-282(A) P4704 283-285(A) P4705 286-288(A)
P48 289 P49 290 P50 291 P51 292 I1 293-295 I2 296-298 I3 299-301 I4 302-304 I5 305-307 I6 308-310
I7 311-313 I8 314-316 I9 317-319 E101 320-321 E102 322-323 E103 324-325 E2 326 E3 327-329 E4 330
C1 331 C1A 332-333 C2 334 C2A 335 C2B 336-337 C3 338 C4 339-340 P21R 341-342 P22R 343-344 VOTOSIMG 345-346
P27AR 347-348 RECUERDO 349-350 ESTUDIOS 351 OCUMAR11 352-353 RAMA09 354 CONDICION11 355-356
ESTATUS 357 "
# making a 2-column matrix (name = left column, position = right column)
m <- matrix(scan(text=txt, what=""), ncol=2, byrow=TRUE)
m <- as.data.frame(m, stringsAsFactors=FALSE)
names(m) <- c("Var", "Pos")
pos <- sub("(A)", "", m$Pos, fixed = TRUE) # some entries contain '(A)' - no idea what it means so deleting it
pos <- strsplit(pos, "-")
starts <- as.numeric(sapply(pos, head, 1)) # get the first element from left
ends <- as.numeric(sapply(pos, tail, 1)) # get the first element from right
w <- ends - starts +1
MyData <- read.fwf("R/MD3191/DA3191", widths = w)
names(MyData) <- m$Var
head(MyData)
# ESTU CUES CCAA PROV MUN TAMUNI CAPITAL DISTR SECCION ENTREV P0 P0A P1 P2 P3 P4 P5 P6
# 1 3191 1 16 1 59 5 1 0 0 0 1 0 3 2 2 5 1 2
# 2 3191 2 16 1 59 5 1 0 0 0 1 0 4 2 3 5 2 3
# 3 3191 3 16 1 59 5 1 0 0 0 1 0 4 2 2 4 2 2

Change axises' scale in a plot without creating new varibale

I have a dataset like below (this is only the first 20 rows and the first 3 columns of data):
row fitted measured
1 1866 1950
2 2489 2500
3 1486 1530
4 1682 1720
5 1393 1402
6 2524 2645
7 2676 2789
8 3200 3400
9 1455 1456
10 1685 1765
11 2587 2597
12 3040 3050
13 2767 2769
14 3300 3310
15 4001 4050
16 1918 2001
17 2889 2907
18 2063 2150
19 1591 1640
20 3578 3601
I plotted this data
plot(data$measured~data$fitted, ylab = expression("Measured Length (" * mu ~ "m)"),
xlab = expression("NIR Fitted Length (" * mu ~ "m)"), cex.lab=1.5, cex.axis=1.5)
and got the following:
As you can see the axises scales are in micrometer, I need the axis to be in millimeter.
How can I plot the data while axises are in millimeter, WITHOUT creating a new variable?
Like this;
If I want to create a new variable, I have to change the whole 2000 lines code that I've written before and that's not a road that I want to go! :|
Thanks much :)
I used #bdemarest method for plot and #IukeA method for abline ;
plot(y=data$measured/1000,x=data$fitted/1000, ylab = expression("Measured Length (mm)"),
xlab = expression("NIR Fitted Length (mm)"), cex.lab=1.5, cex.axis=1.5)
a = lm(I(data$measured/1000)~I(data$fitted/1000), data=data)
abline(a)
Here is the final plot;

Store values in a cell dataframe

I am trying to store in multiple cells in a dataframe. But, my code is storing the data in the last cell (on the dd array). Please see my output below.
Can somebody please correct me? Cannot figure out what I am doing wrong.
Thanks in advance,
MyData <- read.csv(file="Pat_AR_035.csv", header=TRUE, sep=",")
dd <- unique(MyData$POLICY_NUM)
for (j in length(dd)) {
myDF <- data.frame(i=1:length(dd), m=I(vector('list', length(dd))))
myDF$m[[j]] <- data.frame(j,MyData[which(MyData$POLICY_NUM==dd[j] & MyData$ACRES), ],ncol(MyData),nrow(MyData))
}
[[60]]
NULL
[[61]]
NULL
[[62]]
NULL
[[63]]
j OBJECTID DIVISION POLICY_SYM POLICY_NUM YIELD_ID LINE_ID RH_CLU_ID ACRES PLANT_DATE ACRE_TYPE CLU_DETERM STATE COUNTY FARM_SERIA TRACT
1646 63 1646 8 MP 754033 3 20 39565604 8.56 5/3/2014 PL A 3 35 109 852
1647 63 1647 8 MP 754033 1 10 39565605 30.07 4/19/2014 PL A 3 35 109 852
1648 63 1648 8 MP 754033 1 10 39565606 56.59 4/19/2014 PL A 3 35 109 852
CLU_NUMBER FIELD_ACRE RMA_CLU_ID UPDATE_DAT Percent_Ar RHCLUID Field1 OBJECTID_1 DIVISION_1 STATE_1 COUNTY_1
1646 3 8.56 F68E591A-ECC2-470B-A012-201C3BB20D7F 9/21/2014 63.4990 39565604 1646 1646 8 3 35
1647 1 30.07 eb04cfc0-e78b-415f-b447-9595c81ef09e 9/21/2014 100.0000 39565605 1647 1647 8 3 35
1648 2 56.59 5922d604-e31c-4b9d-b846-9f38e2d18abe 9/21/2014 92.1442 39565606 1648 1648 8 3 35
POLICY_N_1 YIELD_ID_1 RH_CLU_ID_ short_dist coords_x1 coords_x2 optional SHAPE_Leng SHAPE_Area ncol.MyData. nrow.MyData.
1646 754033 3 39565604 5.110837 516747.8 -221751.4 TRUE 831.3702 34634.73 35 1757
1647 754033 1 39565605 5.606284 515932.1 -221702.0 TRUE 1469.4800 121611.46 35 1757
1648 754033 1 39565606 5.325399 516380.1 -221640.9 TRUE 1982.8757 228832.22 35 1757
for (j in length(dd))
This doesn’t iterate over dd — it iterates over a single number: the length of dd. Not much of an iteration. You probably meant to write the following or something similar:
for (j in seq_along(dd))
However, there are more issues with your code. For instance, the myDF variable is continuously overwritten inside your loop, which probably isn’t what you intended at all. Instead, you should probably create objects in an lapply statement and forego the loop.

Selecting pairs of odd even values in R

I have a large dataset as follows:
head(humic)
SUERC.No GU.Number d13.C Age.(BP) error Batch.Number AMS.USED Year Type
Sampletype
400 32691 535 -28 3382.981 34.74480 1 S3 2011 2 ha
401 32701 536 -28 3375.263 34.86087 1 S3 2011 2 ha
402 32711 537 -28 3308.103 34.83100 1 S3 2011 2 ha
403 32721 538 -28 3368.721 31.58641 1 S3 2011 2 ha
404 32731 539 -28 3368.604 34.72326 1 S3 2011 2 ha
405 32741 540 -28 3314.713 32.83147 1 S3 2011 2 ha
tail(humic)
SUERC.No GU.Number d13.C Age.(BP) error Batch.Number AMS.USED Year Type Sampletype
5445 70880 3962 -28.4 3390.458 29.12815 34 S4 2016 2 ha
5446 70890 3963 -28.5 3358.861 37.14896 34 S4 2016 2 ha
5447 70900 3964 -28.5 3363.626 26.71573 34 S4 2016 2 ha
5448 70910 3965 -28.5 3408.907 26.69665 34 S4 2016 2 ha
5449 70920 3966 -28.5 3348.463 29.01492 34 S4 2016 2 ha
5450 70930 3967 -28.4 3375.247 26.78261 34 S4 2016 2 ha
I am looking to create a variable to identify pairs of odd and even based on the variable GU.Number. These numbers identify duplicates of the same object - have same d13.C values.
For example,
535 - 536
537 - 538
3963-3964
3965-3966 are pairs.
Note, the column of GU.Number is not a sequence, some numbers are missing.
even.rows <- which(!(humic$GU.Number %% 2))
has.pair <- rep(0,nrow(humic))
for(i in even.rows){
has.pair[i] <- max((humic$GU.Number[i] + c(1,-1)) %in% humic$GU.Number)
}
# add as column of data
humic$has.pair <- has.pair
The has.pair column will be 1 if the GU.Number is even and there exists an odd GU.Number one less or one greater than the given GU.Number. Otherwise it will be 0. As a one-liner:
humic$has.pair <- sapply(1:nrow(humic),
function(x) with(humic,(!(GU.Number[x] %% 2))*max((GU.Number[x] + c(1,-1)) %in% GU.Number)))

How to sum a variable and extract it related variable to appear in the answer

I am new to R and to Stackoverflow and I need an assistant in sorting and extracting information from a data frame I created. I need to extract which IATA and NAME has received the most commission. The result should print: 3301, you pay, 12. I can subset each and every IATA but it is a long process. What will be the best function in R to sort all this information and print out this information.
IATA NAME TICKET_NUM PAX FARE TAX COMM NET
3300 pay more 700 john cohen 10 1.1 2 8
3300 pay more 701 james levy 11 1.2 2 9
3300 pay more 702 jonathan arbel 12 1.2 3 9
3300 pay more 703 gil matan 9 1.0 2 7
3301 you pay 704 ron natan 19 2.0 6 9
3301 you pay 705 don horvitz 18 2.0 6 9
3302 pay by ticket 706 lutter kaplan 9 1.2 0 9
3303 enjoy 707 lutter omega 12 1.2 0 12
3303 enjoy 708 graig daniel 14 1.3 1 13
3303 enjoy 730 orly rotenberg 15 1.0 1 14
3303 enjoy 731 yohan bach 12 1.0 1 11
This seems to return what you requested (using Jeremy's code for the second part):
comm <- read.table(text = '
IATA NAME TICKET_NUM PAX FARE TAX COMM NET
3300 pay.more 700 john.cohen 10 1.1 2 8
3300 pay.more 701 james.levy 11 1.2 2 9
3300 pay.more 702 jonathan.arbel 12 1.2 3 9
3300 pay.more 703 gil.matan 9 1.0 2 7
3301 you.pay 704 ron.natan 19 2.0 6 9
3301 you.pay 705 don.horvitz 18 2.0 6 9
3302 pay.by.ticket 706 lutter.kaplan 9 1.2 0 9
3303 enjoy 707 lutter.omega 12 1.2 0 12
3303 enjoy 708 graig.daniel 14 1.3 1 13
3303 enjoy 730 orly.rotenberg 15 1.0 1 14
3303 enjoy 731 yohan.bach 12 1.0 1 11
', header=TRUE, stringsAsFactors = FALSE)
comm2 <- with(comm, aggregate(COMM ~ IATA + NAME, FUN = function(x) sum(x, na.rm = TRUE)))
comm2
max_comm <- comm2[comm2$COMM == max(comm2$COMM),]
max_comm
IATA NAME COMM
4 3301 you.pay 12
Here is an explanation of the first statement:
The with function identifies the data set to use (here comm). The function aggregate is a general function for performing operations on groups. You want to operate on COMM by IATA and NAME. You write that: COMM ~ IATA + NAME. Next you specify the desired function to perform on COMM (here sum). You do that with FUN = function(x) sum(x). In case there are any missing observations in COMM I added na.rm = TRUE within the sum(x) function.
Call that table comm
max_comm <- comm[comm$COMM == max(comm$COMM),]
or just sort it and look at the head
head(comm[order(-comm$COMM),])
Edit: If you want to sum by IATA first then use data.table
library(data.table)
comm2 <- data.table(comm)
sum_comm <- comm2[, list(COMM_SUM=sum(COMM)), by = c("IATA","NAME")]
data.table has an unusual syntax, you could also try dplyr which is supposed to be roughly as good as data.table now

Resources