ifelse dplyr showing wrong output - r

I want to create a new column which selects the minimum value of three possible columns and then use add or subtract depending on condition.
I have the next data frame called df:
a b c
1 0.60 0.27 0.14
2 0.48 0.32 0.21
3 0.42 0.24 0.35
4 0.28 0.33 0.41
5 0.52 0.28 0.22
6 0.34 0.30 0.37
7 0.38 0.28 0.35
8 0.34 0.28 0.40
9 0.53 0.26 0.22
10 0.17 0.27 0.58
11 0.34 0.35 0.33
12 0.19 0.27 0.56
13 0.56 0.29 0.17
14 0.55 0.28 0.19
15 0.29 0.24 0.48
16 0.23 0.31 0.47
17 0.40 0.32 0.28
18 0.50 0.27 0.24
19 0.45 0.28 0.27
20 0.68 0.26 0.05
21 0.40 0.32 0.28
22 0.23 0.26 0.50
23 0.46 0.33 0.20
24 0.46 0.24 0.28
25 0.44 0.24 0.31
26 0.46 0.26 0.27
27 0.30 0.29 0.40
28 0.45 0.20 0.34
29 0.53 0.27 0.20
30 0.33 0.34 0.33
31 0.20 0.26 0.55
32 0.65 0.29 0.06
33 0.45 0.24 0.32
34 0.30 0.26 0.45
35 0.20 0.36 0.45
36 0.38 0.16 0.38
Every row must sum to 1, but as you can notice, just some of them satisfy that condition.
df_total <- rowSums(df[c("a", "b", "c")])
print(df_total)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
1.01 1.01 1.01 1.02 1.02 1.01 1.01 1.02 1.01 1.02 1.02 1.02 1.02 1.02 1.01 1.01 1.00 1.01 1.00
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
0.99 1.00 0.99 0.99 0.98 0.99 0.99 0.99 0.99 1.00 1.00 1.01 1.00 1.01 1.01 1.01 0.92
So for example in row number 36 from df, I need to sum the lowest value (Which is 0.16) with a number that will make a, b and c sum to 1.
I guess there's an easier way to do this, but I have done this code so far and it doesn't work...Why?
df_total <- rowSums(df[c("a", "b", "c")])
df_for_sum <- df_total[df_total > 1] - 1 #The ones which are above 1
df_for_minus <- -(df_total[df_total < 1]) + 1 #The ones which are below 1
equal_to_100 <- df_total[df_total == 1] #The ones which are ok
df <- df %>%
mutate(d = ifelse(rowSums(df[c("a","b","c")]) > 1,
apply(df[rowSums(df[c("a","b","c")]) > 1,], 1, min) - df_for_sum,
ifelse(rowSums(df[c("a","b","c")]) < 1,
apply(df[rowSums(df[c("a","b","c")]) < 1,], 1, min) + df_for_minus,
ifelse(rowSums(df[c("a","b","c")]) == 1,
apply(df[rowSums(df[c("a","b","c")]) == 1,], 1, min), ""))))
And this is the output:
a b c d
1 0.60 0.27 0.14 0.13
2 0.48 0.32 0.21 0.2
3 0.42 0.24 0.35 0.23
4 0.28 0.33 0.41 0.26
5 0.52 0.28 0.22 0.2
6 0.34 0.30 0.37 0.29
7 0.38 0.28 0.35 0.27
8 0.34 0.28 0.40 0.26
9 0.53 0.26 0.22 0.21
10 0.17 0.27 0.58 0.15
11 0.34 0.35 0.33 0.31
12 0.19 0.27 0.56 0.17
13 0.56 0.29 0.17 0.15
14 0.55 0.28 0.19 0.17
15 0.29 0.24 0.48 0.23
16 0.23 0.31 0.47 0.22
17 0.40 0.32 0.28 0.33 #From here til the end it's wrong!
18 0.50 0.27 0.24 0.19
19 0.45 0.28 0.27 0.28
20 0.68 0.26 0.05 0.24
21 0.40 0.32 0.28 0.28
22 0.23 0.26 0.50 0.26
23 0.46 0.33 0.20 0.25
24 0.46 0.24 0.28 0.27
25 0.44 0.24 0.31 0.3
26 0.46 0.26 0.27 0.21
27 0.30 0.29 0.40 0.24
28 0.45 0.20 0.34 0.0599999999999999
29 0.53 0.27 0.20 0.33
30 0.33 0.34 0.33 0.06
31 0.20 0.26 0.55 0.15
32 0.65 0.29 0.06 0.27
33 0.45 0.24 0.32 0.17
34 0.30 0.26 0.45 0.15
35 0.20 0.36 0.45 0.17
36 0.38 0.16 0.38 0.24
Any thoughts? Any easier way?

You want to calculate the excess difference first:
diff <- 1 - rowSums(df)
then add that to the minimum:
df$d <- apply(df, 1, min) + diff

Here's how to do that without ifelse in dplyr:
df2 <- df1 %>%
mutate(difference = 1-rowSums(.) ) %>%
rowwise() %>%
mutate(d = min(c(a,b,c))+difference )
df2
a b c difference d
(dbl) (dbl) (dbl) (dbl) (dbl)
1 0.60 0.27 0.14 -0.01 0.13
2 0.48 0.32 0.21 -0.01 0.20
3 0.42 0.24 0.35 -0.01 0.23
4 0.28 0.33 0.41 -0.02 0.26
5 0.52 0.28 0.22 -0.02 0.20
6 0.34 0.30 0.37 -0.01 0.29
7 0.38 0.28 0.35 -0.01 0.27
8 0.34 0.28 0.40 -0.02 0.26
9 0.53 0.26 0.22 -0.01 0.21
10 0.17 0.27 0.58 -0.02 0.15
11 0.34 0.35 0.33 -0.02 0.31
12 0.19 0.27 0.56 -0.02 0.17
13 0.56 0.29 0.17 -0.02 0.15
14 0.55 0.28 0.19 -0.02 0.17
15 0.29 0.24 0.48 -0.01 0.23
16 0.23 0.31 0.47 -0.01 0.22
17 0.40 0.32 0.28 0.00 0.28
18 0.50 0.27 0.24 -0.01 0.23
19 0.45 0.28 0.27 0.00 0.27
20 0.68 0.26 0.05 0.01 0.06
21 0.40 0.32 0.28 0.00 0.28
22 0.23 0.26 0.50 0.01 0.24
23 0.46 0.33 0.20 0.01 0.21
24 0.46 0.24 0.28 0.02 0.26
25 0.44 0.24 0.31 0.01 0.25
26 0.46 0.26 0.27 0.01 0.27
27 0.30 0.29 0.40 0.01 0.30
28 0.45 0.20 0.34 0.01 0.21
29 0.53 0.27 0.20 0.00 0.20
30 0.33 0.34 0.33 0.00 0.33
31 0.20 0.26 0.55 -0.01 0.19
32 0.65 0.29 0.06 0.00 0.06
33 0.45 0.24 0.32 -0.01 0.23
34 0.30 0.26 0.45 -0.01 0.25
35 0.20 0.36 0.45 -0.01 0.19
36 0.38 0.16 0.38 0.08 0.24
Data:
df1 <-read.table(text="a b c
0.6 0.27 0.14
0.48 0.32 0.21
0.42 0.24 0.35
0.28 0.33 0.41
0.52 0.28 0.22
0.34 0.3 0.37
0.38 0.28 0.35
0.34 0.28 0.4
0.53 0.26 0.22
0.17 0.27 0.58
0.34 0.35 0.33
0.19 0.27 0.56
0.56 0.29 0.17
0.55 0.28 0.19
0.29 0.24 0.48
0.23 0.31 0.47
0.4 0.32 0.28
0.5 0.27 0.24
0.45 0.28 0.27
0.68 0.26 0.05
0.4 0.32 0.28
0.23 0.26 0.5
0.46 0.33 0.2
0.46 0.24 0.28
0.44 0.24 0.31
0.46 0.26 0.27
0.3 0.29 0.4
0.45 0.2 0.34
0.53 0.27 0.2
0.33 0.34 0.33
0.2 0.26 0.55
0.65 0.29 0.06
0.45 0.24 0.32
0.3 0.26 0.45
0.2 0.36 0.45
0.38 0.16 0.38",header=TRUE,stringsAsFactors=FALSE)

Related

Julia plot applies which function for colors?

When using the Plots.plot function in the case below, apparently the matrix m is taken as the colors, but also is not shown on every point. There seems to be a function applied to the values before it is displayed.
The matrix m has no value at 0 nor 1, but the image shows a lot of white areas... It seems to work out some kind of levels...
How can I find out which function is used?
For example I would like to be able to use the data after that transformation.
# I define a matrix of 21 x 21 pixels
m = Float32[0.22 0.24 0.24 0.26 0.3 0.33 0.33 0.36 0.42 0.4 0.38 0.39 0.42 0.44 0.49 0.53 0.54 0.55 0.56 0.56 0.56; 0.23 0.24 0.25 0.29 0.32 0.36 0.39 0.41 0.44 0.42 0.41 0.44 0.45 0.46 0.54 0.59 0.61 0.61 0.59 0.58 0.58; 0.26 0.26 0.27 0.33 0.36 0.4 0.41 0.44 0.48 0.49 0.46 0.48 0.46 0.48 0.54 0.56 0.58 0.62 0.6 0.6 0.59; 0.27 0.28 0.32 0.36 0.4 0.41 0.44 0.46 0.47 0.47 0.46 0.48 0.46 0.5 0.54 0.57 0.56 0.61 0.6 0.58 0.57; 0.19 0.2 0.25 0.32 0.39 0.42 0.47 0.47 0.47 0.46 0.45 0.47 0.46 0.5 0.56 0.58 0.57 0.62 0.6 0.59 0.58; 0.2 0.2 0.24 0.32 0.34 0.36 0.39 0.42 0.47 0.48 0.46 0.47 0.45 0.49 0.56 0.57 0.6 0.63 0.58 0.59 0.59; 0.21 0.2 0.27 0.34 0.35 0.35 0.36 0.37 0.39 0.45 0.46 0.47 0.45 0.48 0.56 0.62 0.62 0.61 0.58 0.58 0.58; 0.23 0.24 0.31 0.35 0.36 0.38 0.37 0.38 0.38 0.4 0.44 0.45 0.47 0.47 0.55 0.64 0.58 0.58 0.58 0.58 0.57; 0.22 0.28 0.35 0.36 0.37 0.38 0.39 0.39 0.4 0.42 0.43 0.43 0.45 0.47 0.53 0.57 0.56 0.56 0.57 0.57 0.57; 0.21 0.28 0.34 0.36 0.37 0.38 0.39 0.41 0.42 0.42 0.44 0.45 0.45 0.49 0.51 0.54 0.56 0.56 0.56 0.56 0.56; 0.22 0.27 0.31 0.32 0.34 0.37 0.39 0.39 0.39 0.43 0.46 0.46 0.49 0.51 0.51 0.54 0.55 0.56 0.56 0.55 0.55; 0.23 0.27 0.3 0.31 0.33 0.35 0.38 0.38 0.39 0.44 0.46 0.49 0.51 0.5 0.52 0.53 0.55 0.56 0.56 0.56 0.56; 0.23 0.27 0.31 0.34 0.36 0.36 0.38 0.4 0.42 0.45 0.45 0.49 0.51 0.52 0.55 0.55 0.56 0.58 0.58 0.57 0.58; 0.26 0.32 0.35 0.36 0.36 0.36 0.4 0.44 0.47 0.47 0.47 0.49 0.51 0.54 0.56 0.58 0.58 0.58 0.59 0.58 0.57; 0.3 0.33 0.35 0.35 0.36 0.37 0.4 0.47 0.5 0.47 0.46 0.47 0.5 0.53 0.56 0.58 0.6 0.6 0.61 0.6 0.59; 0.31 0.35 0.36 0.36 0.37 0.4 0.44 0.48 0.49 0.46 0.45 0.45 0.5 0.55 0.58 0.6 0.62 0.62 0.61 0.6 0.58; 0.33 0.39 0.41 0.39 0.38 0.44 0.47 0.49 0.5 0.47 0.46 0.45 0.47 0.52 0.54 0.56 0.57 0.58 0.56 0.56 0.55; 0.33 0.4 0.42 0.45 0.46 0.48 0.49 0.5 0.51 0.48 0.46 0.45 0.46 0.49 0.54 0.56 0.56 0.59 0.6 0.59 0.57; 0.37 0.41 0.43 0.47 0.5 0.52 0.49 0.48 0.51 0.49 0.47 0.47 0.47 0.49 0.58 0.61 0.62 0.62 0.62 0.61 0.6; 0.4 0.43 0.47 0.49 0.49 0.51 0.49 0.48 0.51 0.49 0.48 0.49 0.46 0.51 0.59 0.59 0.6 0.6 0.59 0.59 0.59; 0.42 0.47 0.5 0.5 0.49 0.48 0.49 0.49 0.51 0.49 0.5 0.49 0.49 0.55 0.58 0.56 0.57 0.61 0.6 0.58 0.58]
using Plots
plot(1:21, 1:21, m)
It looks like plot called with two vectors and a matrix produces a contour plot - so you'll get the same output from contour(1:21, 1:21, m).
If you want to get rid of the whitespace, use the fill = true keyword:
julia> plot(1:21, 1:21, m, fill = true)

Create data frame from EFA output in R

I am working on EFA and would like to customize my tables. There is a function, psych.print to suppress factor loadings of a certain value to make the table easier to read. When I run this function, it produces this data and the summary stats in the console (in an .RMD document, it produces console text and a separate data frame of the factor loadings with loadings suppressed). However, if I attempt to save this as an object, it does not keep this data.
Here is an example:
library(psych)
bfi_data=bfi
bfi_data=bfi_data[complete.cases(bfi_data),]
bfi_cor <- cor(bfi_data)
factors_data <- fa(r = bfi_cor, nfactors = 6)
print.psych(fa_ml_oblimin_2, cut=.32, sort="TRUE")
In an R script, it produces this:
item MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
N2 17 0.83 0.654 0.35 1.0
N1 16 0.82 0.666 0.33 1.1
N3 18 0.69 0.549 0.45 1.1
N5 20 0.47 0.376 0.62 2.2
N4 19 0.44 0.43 0.506 0.49 2.4
C4 9 -0.67 0.555 0.45 1.3
C2 7 0.66 0.475 0.53 1.4
C5 10 -0.56 0.433 0.57 1.4
C3 8 0.56 0.317 0.68 1.1
C1 6 0.54 0.344 0.66 1.3
In R Markdown, it produces this:
How can I save that data.frame as an object?
Looking at the str of the object it doesn't look that what you want is built-in. An ugly way would be to use capture.output and try to convert the character vector to dataframe using string manipulation. Else since the data is being displayed it means that the data is present somewhere in the object itself. I could find out vectors of same length which can be combined to form the dataframe.
loadings <- unclass(factors_data$loadings)
h2 <- factors_data$communalities
#There is also factors_data$communality which has same values
u2 <- factors_data$uniquenesses
com <- factors_data$complexity
data <- cbind(loadings, h2, u2, com)
data
This returns :
# MR2 MR3 MR1 MR5 MR4 MR6 h2 u2 com
#A1 0.11 0.07 -0.07 -0.56 -0.01 0.35 0.38 0.62 1.85
#A2 0.03 0.09 -0.08 0.64 0.01 -0.06 0.47 0.53 1.09
#A3 -0.04 0.04 -0.10 0.60 0.07 0.16 0.51 0.49 1.26
#A4 -0.07 0.19 -0.07 0.41 -0.13 0.13 0.29 0.71 2.05
#A5 -0.17 0.01 -0.16 0.47 0.10 0.22 0.47 0.53 2.11
#C1 0.05 0.54 0.08 -0.02 0.19 0.05 0.34 0.66 1.32
#C2 0.09 0.66 0.17 0.06 0.08 0.16 0.47 0.53 1.36
#C3 0.00 0.56 0.07 0.07 -0.04 0.05 0.32 0.68 1.09
#C4 0.07 -0.67 0.10 -0.01 0.02 0.25 0.55 0.45 1.35
#C5 0.15 -0.56 0.17 0.02 0.10 0.01 0.43 0.57 1.41
#E1 -0.14 0.09 0.61 -0.14 -0.08 0.09 0.41 0.59 1.34
#E2 0.06 -0.03 0.68 -0.07 -0.08 -0.01 0.56 0.44 1.07
#E3 0.02 0.01 -0.32 0.17 0.38 0.28 0.51 0.49 3.28
#E4 -0.07 0.03 -0.49 0.25 0.00 0.31 0.56 0.44 2.26
#E5 0.16 0.27 -0.39 0.07 0.24 0.04 0.41 0.59 3.01
#N1 0.82 -0.01 -0.09 -0.09 -0.03 0.02 0.67 0.33 1.05
#N2 0.83 0.02 -0.07 -0.07 0.01 -0.07 0.65 0.35 1.04
#N3 0.69 -0.03 0.13 0.09 0.02 0.06 0.55 0.45 1.12
#N4 0.44 -0.14 0.43 0.09 0.10 0.01 0.51 0.49 2.41
#N5 0.47 -0.01 0.21 0.21 -0.17 0.09 0.38 0.62 2.23
#O1 -0.05 0.07 -0.01 -0.04 0.57 0.09 0.36 0.64 1.11
#O2 0.12 -0.09 0.01 0.12 -0.43 0.28 0.30 0.70 2.20
#O3 0.01 0.00 -0.10 0.05 0.65 0.04 0.48 0.52 1.06
#O4 0.10 -0.05 0.34 0.15 0.37 -0.04 0.24 0.76 2.55
#O5 0.04 -0.04 -0.02 -0.01 -0.50 0.30 0.33 0.67 1.67
#gender 0.20 0.09 -0.12 0.33 -0.21 -0.15 0.18 0.82 3.58
#education -0.03 0.01 0.05 0.11 0.12 -0.22 0.07 0.93 2.17
#age -0.06 0.07 -0.02 0.16 0.03 -0.26 0.10 0.90 2.05
Ronak Shaw answered my question above, and I used his answer to help create the following function, which nearly reproduces the psych.print data.frame of fa.sort output
fa_table <- function(x, cut) {
#get sorted loadings
loadings <- fa.sort(fa_ml_oblimin)$loadings %>% round(3)
#cut loadings
loadings[loadings < cut] <- ""
#get additional info
add_info <- cbind(x$communalities,
x$uniquenesses,
x$complexity) %>%
as.data.frame() %>%
rename("commonality" = V1,
"uniqueness" = V2,
"complexity" = V3) %>%
rownames_to_column("item")
#build table
loadings %>%
unclass() %>%
as.data.frame() %>%
rownames_to_column("item") %>%
left_join(add_info) %>%
mutate(across(where(is.numeric), round, 3))
}

extract entry of a vector or Matrix in R

i have this matrix
> dist
1 2 3 4 5 6 7
1 0.00 0.52 0.34 0.37 0.37 0.52 0.54
2 0.52 0.00 0.77 0.57 0.57 0.00 0.56
3 0.34 0.77 0.00 0.29 0.29 0.77 0.53
4 0.37 0.57 0.29 0.00 0.00 0.57 0.32
5 0.37 0.57 0.29 0.00 0.00 0.57 0.32
6 0.52 0.00 0.77 0.57 0.57 0.00 0.56
7 0.54 0.56 0.53 0.32 0.32 0.56 0.00
I WANT TO extract the row number 1 but since the third position
0.34 0.37 0.37 0.52 0.54
i try with dis[1,>=3] but there is a error
You can give a sequence for the columns to select with [:
dist[1,3:7]

Reading a file in R columns are separated with multiple space

I'm trying to load a file, file columns separated with space, but there are different number of space
between columns. because of this while i'm reading, R thing every space is another column and producing extra empty columns. Is there any other way to load data without problem.
Example Data :
AAT_ECOLI 0.49 0.29 0.48 0.50 0.56 0.24 0.35 cp
ACEA_ECOLI 0.07 0.40 0.48 0.50 0.54 0.35 0.44 cp
ACEK_ECOLI 0.56 0.40 0.48 0.50 0.49 0.37 0.46 cp
ACKA_ECOLI 0.59 0.49 0.48 0.50 0.52 0.45 0.36 cp
you can see that, between first column and second there 3 space, and 2nd column and 3th column there are two space.
I'm using this code for loading data
xxx <- read.csv("../Datasets/Ecoli/ecoli.data", header=FALSE,sep=" ")
I tried 3 space or other things but none of them worked.
Original data file : https://drive.google.com/file/d/0B_XEmkrWR-hCMXVySVI2bU5waGs/view?usp=sharing
Thank you
read.table works perfectly on your downloaded data set. No arguments other than file are necessary (unless you don't want factors). I tend to reserve read.csv for files that are actually comma-separated.
df <- read.table("Downloads/ecoli.data")
str(df)
# 'data.frame': 336 obs. of 9 variables:
# $ V1: Factor w/ 336 levels "AAS_ECOLI","AAT_ECOLI",..: 2 3 4 5 6 8 9 12 ...
# $ V2: num 0.49 0.07 0.56 0.59 0.23 0.67 0.29 0.21 0.2 0.42 ...
# $ V3: num 0.29 0.4 0.4 0.49 0.32 0.39 0.28 0.34 0.44 0.4 ...
# $ V4: num 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 0.48 ...
# $ V5: num 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 ...
# $ V6: num 0.56 0.54 0.49 0.52 0.55 0.36 0.44 0.51 0.46 0.56 ...
# $ V7: num 0.24 0.35 0.37 0.45 0.25 0.38 0.23 0.28 0.51 0.18 ...
# $ V8: num 0.35 0.44 0.46 0.36 0.35 0.46 0.34 0.39 0.57 0.3 ...
# $ V9: Factor w/ 8 levels "cp","im","imL",..: 1 1 1 1 1 1 1 1 1 1 ...
You need to set strip.white=T and sep='' :
xxx <- read.csv("c:\\r_stack_overflow\\test.csv", header=FALSE, strip.white=T, sep='')
> xxx
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 AAT_ECOLI 0.49 0.29 0.48 0.5 0.56 0.24 0.35 cp
2 ACEA_ECOLI 0.07 0.40 0.48 0.5 0.54 0.35 0.44 cp
3 ACEK_ECOLI 0.56 0.40 0.48 0.5 0.49 0.37 0.46 cp
4 ACKA_ECOLI 0.59 0.49 0.48 0.5 0.52 0.45 0.36 cp
> dim(xxx)
[1] 4 9
And it works!
UPDATE:
It works perfect with your data too:
xxx <- read.csv("c:\\r_stack_overflow\\ecoli.data", header=FALSE, strip.white=T, sep='')
Output:
> xxx
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 AAT_ECOLI 0.49 0.29 0.48 0.5 0.56 0.24 0.35 cp
2 ACEA_ECOLI 0.07 0.40 0.48 0.5 0.54 0.35 0.44 cp
3 ACEK_ECOLI 0.56 0.40 0.48 0.5 0.49 0.37 0.46 cp
4 ACKA_ECOLI 0.59 0.49 0.48 0.5 0.52 0.45 0.36 cp
5 ADI_ECOLI 0.23 0.32 0.48 0.5 0.55 0.25 0.35 cp
6 ALKH_ECOLI 0.67 0.39 0.48 0.5 0.36 0.38 0.46 cp
7 AMPD_ECOLI 0.29 0.28 0.48 0.5 0.44 0.23 0.34 cp
8 AMY2_ECOLI 0.21 0.34 0.48 0.5 0.51 0.28 0.39 cp
9 APT_ECOLI 0.20 0.44 0.48 0.5 0.46 0.51 0.57 cp
10 ARAC_ECOLI 0.42 0.40 0.48 0.5 0.56 0.18 0.30 cp
11 ASG1_ECOLI 0.42 0.24 0.48 0.5 0.57 0.27 0.37 cp
12 BTUR_ECOLI 0.25 0.48 0.48 0.5 0.44 0.17 0.29 cp
13 CAFA_ECOLI 0.39 0.32 0.48 0.5 0.46 0.24 0.35 cp
14 CAIB_ECOLI 0.51 0.50 0.48 0.5 0.46 0.32 0.35 cp
15 CFA_ECOLI 0.22 0.43 0.48 0.5 0.48 0.16 0.28 cp
16 CHEA_ECOLI 0.25 0.40 0.48 0.5 0.46 0.44 0.52 cp
17 CHEB_ECOLI 0.34 0.45 0.48 0.5 0.38 0.24 0.35 cp
18 CHEW_ECOLI 0.44 0.27 0.48 0.5 0.55 0.52 0.58 cp
19 CHEY_ECOLI 0.23 0.40 0.48 0.5 0.39 0.28 0.38 cp
20 CHEZ_ECOLI 0.41 0.57 0.48 0.5 0.39 0.21 0.32 cp
21 CRL_ECOLI 0.40 0.45 0.48 0.5 0.38 0.22 0.00 cp
22 CSPA_ECOLI 0.31 0.23 0.48 0.5 0.73 0.05 0.14 cp
23 CYNR_ECOLI 0.51 0.54 0.48 0.5 0.41 0.34 0.43 cp
24 CYPB_ECOLI 0.30 0.16 0.48 0.5 0.56 0.11 0.23 cp
25 CYPC_ECOLI 0.36 0.39 0.48 0.5 0.48 0.22 0.23 cp
26 CYSB_ECOLI 0.29 0.37 0.48 0.5 0.48 0.44 0.52 cp
27 CYSE_ECOLI 0.25 0.40 0.48 0.5 0.47 0.33 0.42 cp
28 DAPD_ECOLI 0.21 0.51 0.48 0.5 0.50 0.32 0.41 cp
29 DCP_ECOLI 0.43 0.37 0.48 0.5 0.53 0.35 0.44 cp
30 DDLA_ECOLI 0.43 0.39 0.48 0.5 0.47 0.31 0.41 cp
31 DDLB_ECOLI 0.53 0.38 0.48 0.5 0.44 0.26 0.36 cp
32 DEOC_ECOLI 0.34 0.33 0.48 0.5 0.38 0.35 0.44 cp
33 DLDH_ECOLI 0.56 0.51 0.48 0.5 0.34 0.37 0.46 cp
34 EFG_ECOLI 0.40 0.29 0.48 0.5 0.42 0.35 0.44 cp
35 EFTS_ECOLI 0.24 0.35 0.48 0.5 0.31 0.19 0.31 cp
36 EFTU_ECOLI 0.36 0.54 0.48 0.5 0.41 0.38 0.46 cp
37 ENO_ECOLI 0.29 0.52 0.48 0.5 0.42 0.29 0.39 cp
38 FABB_ECOLI 0.65 0.47 0.48 0.5 0.59 0.30 0.40 cp
39 FES_ECOLI 0.32 0.42 0.48 0.5 0.35 0.28 0.38 cp
40 G3P1_ECOLI 0.38 0.46 0.48 0.5 0.48 0.22 0.29 cp
41 G3P2_ECOLI 0.33 0.45 0.48 0.5 0.52 0.32 0.41 cp
42 G6PI_ECOLI 0.30 0.37 0.48 0.5 0.59 0.41 0.49 cp
43 GCVA_ECOLI 0.40 0.50 0.48 0.5 0.45 0.39 0.47 cp
44 GLNA_ECOLI 0.28 0.38 0.48 0.5 0.50 0.33 0.42 cp
45 GLPD_ECOLI 0.61 0.45 0.48 0.5 0.48 0.35 0.41 cp
46 GLYA_ECOLI 0.17 0.38 0.48 0.5 0.45 0.42 0.50 cp
47 GSHR_ECOLI 0.44 0.35 0.48 0.5 0.55 0.55 0.61 cp
48 GT_ECOLI 0.43 0.40 0.48 0.5 0.39 0.28 0.39 cp
49 HEM6_ECOLI 0.42 0.35 0.48 0.5 0.58 0.15 0.27 cp
50 HEMN_ECOLI 0.23 0.33 0.48 0.5 0.43 0.33 0.43 cp
51 HPRT_ECOLI 0.37 0.52 0.48 0.5 0.42 0.42 0.36 cp
52 IF1_ECOLI 0.29 0.30 0.48 0.5 0.45 0.03 0.17 cp
53 IF2_ECOLI 0.22 0.36 0.48 0.5 0.35 0.39 0.47 cp
54 ILVY_ECOLI 0.23 0.58 0.48 0.5 0.37 0.53 0.59 cp
55 IPYR_ECOLI 0.47 0.47 0.48 0.5 0.22 0.16 0.26 cp
56 KAD_ECOLI 0.54 0.47 0.48 0.5 0.28 0.33 0.42 cp
57 KDSA_ECOLI 0.51 0.37 0.48 0.5 0.35 0.36 0.45 cp
58 LEU3_ECOLI 0.40 0.35 0.48 0.5 0.45 0.33 0.42 cp
59 LON_ECOLI 0.44 0.34 0.48 0.5 0.30 0.33 0.43 cp
60 LPLA_ECOLI 0.42 0.38 0.48 0.5 0.54 0.34 0.43 cp
61 LYSR_ECOLI 0.44 0.56 0.48 0.5 0.50 0.46 0.54 cp
62 MALQ_ECOLI 0.52 0.36 0.48 0.5 0.41 0.28 0.38 cp
63 MALZ_ECOLI 0.36 0.41 0.48 0.5 0.48 0.47 0.54 cp
64 MASY_ECOLI 0.18 0.30 0.48 0.5 0.46 0.24 0.35 cp
65 METB_ECOLI 0.47 0.29 0.48 0.5 0.51 0.33 0.43 cp
66 METC_ECOLI 0.24 0.43 0.48 0.5 0.54 0.52 0.59 cp
67 METK_ECOLI 0.25 0.37 0.48 0.5 0.41 0.33 0.42 cp
And dimensions:
> dim(xxx)
[1] 336 9
There's probably a better way, but I believe this should work:
file_df <- scan('data.txt', what = list("","","","","","","","",""))
df <- data.frame(matrix(unlist(file_df), nrow=4))

R: Toggle <select> options, submit action and web scrape HTML from site with no <form> tag

How can I manage to get the data from a website that presents multiple options like the ticker of the stock, and the beginning and the end of the period I want the data.
The code that generates this data comes from this line:
<td><input name="button" type="button" class="boton" id="button" value="Buscar" onclick="getInf_Cotizaciones('SIDERC1',document.getElementById('anoIni').value+document.getElementById('mesIni').value+'01',document.getElementById('anoFin').value+document.getElementById('mesFin').value+'01')" /></td>
However the data doesn't show on the HTML source code. How can I get R to download this data.
If you use "Developer Mode" on any modern browser and sort the "timeline" view of the "network resources" (they all have this) by "start time", you'd see that site submits the following URL:
http://www.bvl.com.pe/jsp/cotizacion.jsp?fec_inicio=20140901&fec_fin=20141001&nemonico=SIDERC1
when posting data based on the <select> box choices. You can, then, use the rvest package to grab the resultant table:
library(rvest)
pg <- html("http://www.bvl.com.pe/jsp/cotizacion.jsp?fec_inicio=20140901&fec_fin=20141001&nemonico=SIDERC1")
pg %>% html_table()
## [[1]]
## Precio fecha actual NA NA NA NA NA NA NA Precios fecha anterior NA
## 1 Fecha cotización Apertura Cierre Máxima Mínima Promedio CantidadNegociada MontoNegociado (S/.) Fechaanterior Cierreanterior
## 2 01/10/2014 0.33 0.32 0.33 0.32 0.32 193,148.00 62,707.36 30/09/2014 0.32
## 3 30/09/2014 0.33 0.32 0.33 0.32 0.33 542,761.00 177,545.23 29/09/2014 0.34
## 4 29/09/2014 0.34 0.34 0.34 0.34 0.34 42,738.00 14,530.92 26/09/2014 0.34
## 5 26/09/2014 0.34 0.34 0.34 0.34 0.34 139,829.00 47,503.57 25/09/2014 0.35
## 6 25/09/2014 0.35 0.35 0.35 0.35 0.35 56,100.00 19,635.00 23/09/2014 0.35
## 7 24/09/2014 23/09/2014 0.35
## 8 23/09/2014 0.35 0.35 0.35 0.35 0.35 79,800.00 27,900.00 19/09/2014 0.35
## 9 22/09/2014 19/09/2014 0.35
## 10 19/09/2014 0.35 0.35 0.35 0.35 0.35 73,655.00 25,592.70 18/09/2014 0.35
## 11 18/09/2014 0.35 0.35 0.35 0.35 0.35 50,000.00 17,500.00 17/09/2014 0.35
## 12 17/09/2014 0.35 0.35 0.35 0.35 0.35 94,000.00 32,900.00 16/09/2014 0.36
## 13 16/09/2014 0.36 0.36 0.36 0.36 0.36 49,582.00 17,666.87 15/09/2014 0.35
## 14 15/09/2014 0.35 0.35 0.35 0.35 0.35 63,900.00 22,365.00 12/09/2014 0.35
## 15 12/09/2014 0.35 0.35 0.35 0.35 0.35 100,000.00 35,000.00 11/09/2014 0.36
## 16 11/09/2014 0.36 0.36 0.36 0.36 0.36 79,680.00 28,684.80 10/09/2014 0.36
## 17 10/09/2014 0.36 0.36 0.36 0.36 0.36 136,169.00 49,020.84 09/09/2014 0.36
## 18 09/09/2014 0.35 0.36 0.36 0.35 0.36 420,200.00 151,074.07 08/09/2014 0.35
## 19 08/09/2014 0.35 0.35 0.35 0.35 0.35 90,344.00 31,620.40 05/09/2014 0.34
## 20 05/09/2014 0.34 0.34 0.34 0.34 0.34 212,500.00 72,250.00 04/09/2014 0.33
## 21 04/09/2014 0.33 0.33 0.33 0.33 0.33 12,500.00 4,125.00 03/09/2014 0.34
## 22 03/09/2014 0.33 0.34 0.34 0.33 0.33 186,000.00 61,970.00 02/09/2014 0.33
## 23 02/09/2014 0.34 0.33 0.34 0.33 0.34 221,613.00 74,654.42 28/08/2014 0.35
## 24 01/09/2014 28/08/2014 0.35

Resources