i am working on Shiny app and want to convert entire data set into numeric form.I have used this code for retrieving file from local PC. what changes can be done that while retrieving i can convert entire data set into numeric form
datami <- reactive({
file1 <- input$file
if(is.null(file1)){return()}
read.csv(file=file1$datapath, sep=input$sep, header = input$header, stringsAsFactors = input$stringAsFactors)})
output$table <- renderPrint({
if(is.null(datami())){return ()}
str(datami())})
tabsetPanel(tabPanel("Data",div(h5("Data",style="color:red")),verbatimTextOutput("table"))```
Depending on how you want to deal with lower/uppercase letters (if you have them in your data) we could do one of the following:
MRE:
letter_variable <- c(letters, LETTERS)
Same numeric value for upper and lower case letters:
letter_variable_as_numeric1 <- as.numeric(factor(toupper(letter_variable), levels = LETTERS))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
[22] 22 23 24 25 26 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[43] 17 18 19 20 21 22 23 24 25 26
Different numeric value for upper and lower case letters:
letter_variable_as_numeric2 <- as.numeric(factor(letter_variable), levels = c(letters, LETTERS))
[1] 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41
[22] 43 45 47 49 51 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32
[43] 34 36 38 40 42 44 46 48 50 52
Related
I have a very big dataset which is a data frame containing Date/Time in 1 column and closing price in the next.
enter image description here
I am using the following code:
read.zoo(df,tz="GMT",format = "%d.%m.%Y %H:%M")
but this is shown:
Error in read.zoo(df, tz = "GMT", format = "%d.%m.%Y %H:%M") :
index has 28290 bad entries at data rows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42...
What should I do?
The image is part of my dataframe
I want to convert this variable into numeric, as you can see:
> class(DATA$estimate)
[1] "factor"
> head(DATA$estimate)
[1] 0,253001909 0,006235543 0,005285019 0,009080499 6,580140903 0,603060006
57 Levels: 0,000263863 0,000634365 0,004405696 0,005285019 0,006235543 0,009080499 0,009700147 0,018568434 0,253001909 ... 7,790580873
>
But when I want to convert, look what I have got
> DATA$estimate<-as.numeric(DATA$estimate)
> DATA$estimate
[1] 9 5 4 6 51 12 3 53 11 8 1 7 15 27 30 29 28 31 21 23 22 39 38 37 33 26 34 52 57 50 24 18 20 10 2 55 54 56 36 32 35 44 46
[44] 48 19 25 16 43 41 40 49 42 47 14 17 13 45
It's not numeric and I don't understand how the program gives these numbers!
data:
fac <- factor(c("0,253001909" ,"0,006235543" ,"0,005285019" ,"0,009080499" ,"6,580140903" ,"0,603060006"))
I convert to character, then turn the "," into ".", then convert to numeric.
as.numeric(sub(",",".",as.character(fac)))
in your case its:
DATA$estimate<-as.numeric(sub(",",".",as.character(DATA$estimate)))
You can also scan() your factor variable and specify , as decimal separator
fac <- factor(c("0,253001909" ,"0,006235543" ,"0,005285019" ,"0,009080499" ,
"6,580140903" ,"0,603060006"))
scan(text = as.character(fac), dec = ",")
#output
[1] 0.253001909 0.006235543 0.005285019 0.009080499 6.580140903
[6] 0.603060006
I have the following data frame with the name dataValues:
dates hours
1 2015-10-12 1
5 2015-10-12 5
9 2015-10-12 9
11 2015-10-12 11
14 2015-10-12 14
15 2015-10-12 15
17 2015-10-12 17
19 2015-10-12 19
22 2015-10-12 22
23 2015-10-12 23
24 2015-10-12 24
27 2015-10-13 3
29 2015-10-13 5
33 2015-10-13 9
36 2015-10-13 12
37 2015-10-13 13
38 2015-10-13 14
40 2015-10-13 16
42 2015-10-13 18
44 2015-10-13 20
45 2015-10-13 21
46 2015-10-13 22
47 2015-10-13 23
49 2015-10-14 1
54 2015-10-14 6
56 2015-10-14 8
59 2015-10-14 11
60 2015-10-14 12
61 2015-10-14 13
63 2015-10-14 15
64 2015-10-14 16
66 2015-10-14 18
69 2015-10-14 21
71 2015-10-14 23
72 2015-10-14 24
I have preprocessed this data frame to get all hours on a certain day, which is variable totallist and has output:
[[1]]
[1] 1 5 9 11 14 15 17 19 22 23 24
[[2]]
[1] 3 5 9 12 13 14 16 18 20 21 22 23
[[3]]
[1] 1 6 8 11 12 13 15 16 18 21 23 24
The code I used for this is the following:
uniqueDates <- unique(dataValues$dates)
totallist <- {}
for(date in uniqueDates){
templist <- {}
for(i in 1:length(dataValues$dates)){
if(dataValues$dates[i]==date){
newlist <- append(templist,dataValues$hours[i])
}
}
totallist <- append(totallist,list(templist))
}
For the example in this question (with 3 days) it works fine and the result is what I want, but if I use this on a large dataset (which has about 260 days), it takes about 6 to 7 minutes to finish.
My question is if there is an optimized way to do what I want?
Try any of these:
# 1
with(unique(dataValues), split(hours, dates))
# 1a - variation of last solution
with(dataValues, lapply(split(hours, dates), unique))
# 2
unstack(unique(dataValues), hours ~ dates)
# 2a - variation of last solution
lapply(unstack(dataValues, hours ~ dates), unique)
Note that if the data values are known to be unique already, as is the case in the sample data shown in the question, then unique(dataValues) in #1 and #2 could be replaced with just dataValues.
I believe you would be better by using the tapply function. I've created a simpler dataframe just to show what it is doing:
df <- data.frame(dates=rep(c("2015-01-02","2015-01-03","2015-01-04"),10),hours=trunc(runif(30,1,10)))
tapply(df$hours,df$dates,unique)
Output:
$`2015-01-02`
[1] 2 8 6 1 5
$`2015-01-03`
[1] 7 5 2 3
$`2015-01-04`
[1] 1 2 6 5 8 4 9
This shouldn't be too hard, but I always have issues when tying to run calculations on a column in a dataframe that relies on the value of a another column in the data frame. Here is my data.frame
stream reach length.km length.m total.sa pools.sa
1 Stream Reach_Code 109 109 1 1
2 Brooks BRK_001 17 14 108 13
3 Brooks BRK_002 15 12 99 9
4 Brooks BRK_003 24 21 94 95
5 Brooks BRK_004 32 29 97 33
6 Brooks BRK_005 27 24 92 79
7 Brooks BRK_006 26 23 95 6
8 Brooks BRK_007 16 13 77 15
9 Brooks BRK_008 29 26 84 26
10 Brooks BRK_009 18 15 87 46
11 Brooks BRK_010 23 20 88 47
12 Brooks BRK_011 22 19 91 40
13 Brooks BRK_012 30 27 98 37
14 Brooks BRK_013 25 22 93 29
19 Buncombe_Hollow BNH_0001 7 4 75 65
20 Buncombe_Hollow BNH_0002 8 5 66 21
21 Buncombe_Hollow BNH_0003 9 6 68 53
22 Buncombe_Hollow BNH_0004 19 16 81 11
23 Buncombe_Hollow BNH_0005 6 3 65 27
24 Buncombe_Hollow BNH_0006 13 10 63 23
25 Buncombe_Hollow BNH_0007 12 9 71 57
I would like to calculate the mean of a column (lets say length.m) where stream = Brooks and then do the same thing for stream = Buncombe_Hollow. I actually have 17 different stream names, and plan on calculating the mean of some column for each stream. I will then store these means as a vector, and bind them to another vector of the stream names, so the end result is something like this
stream truevalue
1 Brooks 0.9440620
2 Siouxon 0.5858527
3 Speelyai 0.5839844
Thanks!
try using aggregate:
# Generate some data to use
someDf <- data.frame(stream = rep(c("Brooks", "Buncombe_Hollow"), each = 10),
length.m = rpois(20, 4))
# Calculate the means with aggregate
with(someDf, aggregate(list(truevalue = length.m), list(stream = stream), mean))
The reason for the "list" bits is to specifically name the columns in the (data frame) output
Start using the dplyr package. It makes such calculations quick as well as very easy to write
library(dplyr)
result <- data %>% group_by(stream) %>% summarize(truevalue = mean(length.m))
I am conducting a network meta-analysis on R with two packages, gemtc and rjags. However, when I type
Model <- mtc.model (network, linearmodel=’fixed’).
R always returns “
Error in [.data.frame(data, sel1 | sel2, columns, drop = FALSE) :
undefined columns selected In addition: Warning messages: 1: In
mtc.model(network, linearModel = "fixed") : Likelihood can not be
inferred. Defaulting to normal. 2: In mtc.model(network, linearModel =
"fixed") : Link can not be inferred. Defaulting to identity “
How to fix this problem? Thanks!
I am attaching my codes and data here:
SAE <- read.csv(file.choose(),head=T, sep=",")
head(SAE)
network <- mtc.network(data.ab=SAE)
summary(network)
plot(network)
model.fe <- mtc.model (network, linearModel="fixed")
plot(model.fe)
summary(model.fe)
cat(model.fe$code)
model.fe$data
# run this model
result.fe <- mtc.run(model.fe, n.adapt=0, n.iter=50)
plot(result.fe)
gelman.diag(result.fe)
result.fe <- mtc.run(model.fe, n.adapt=1000, n.iter=5000)
plot(result.fe)
gelman.diag(result.fe)
following is my data: SAE
study treatment responder sample.size
1 1 3 0 76
2 1 30 2 72
3 2 3 99 1389
4 2 23 132 1383
5 3 1 6 352
6 3 30 2 178
7 4 2 6 106
8 4 30 3 95
9 5 3 49 393
10 5 25 18 198
11 6 1 20 65
12 6 22 10 26
13 7 1 1 76
14 7 30 3 76
15 8 3 7 441
16 8 26 1 220
17 9 2 1 47
18 9 30 0 41
19 10 3 10 156
20 10 30 9 150
21 11 1 4 85
22 11 25 5 85
23 11 30 4 84
24 12 3 6 152
25 12 30 5 160
26 13 18 4 158
27 13 21 8 158
28 14 1 3 110
29 14 30 2 111
30 15 3 3 83
31 15 30 1 92
32 16 1 3 124
33 16 22 6 123
34 16 30 4 125
35 17 3 236 1553
36 17 23 254 1546
37 18 6 5 398
38 18 7 6 403
39 19 1 64 588
40 19 22 73 584
How about reading the manual ?mtc.model. It clearly states the following:
Required columns [responders, sampleSize]
So your responder variable should be responders and your sample.size variable should be sampleSize.
Next, your plot(network) should help you determine that some comparisons can not be made. In your data, there are 2 subgroups of trials that were compared. Treatment 18 and 21 were not compared with any of the others. Therefore you can only do a meta-analysis of 21 and 18 or a network meta-analysis of the rest.
network <- mtc.network(data.ab=SAE[!SAE$treatment %in% c(21, 18), ])
model.fe <- mtc.model(network, linearModel="fixed")