R: Overlapping ggplots - r

I have a quick question about R. I am trying to make a layered histogram from some data I am pulling out of files but I am having a hard time getting ggplot to work with me. I keep getting this error and I have been looking around for an answer but I haven't seen much.
Error: ggplot2 doesn't know how to deal with data of class uneval
Execution halted
Here is a brief look at my program so far.
library("ggplot2")
ex <- '/home/Data/run1.DOC'
ex2 <- '/home/Data/run2.DOC'
...
ex<- read.table(ex,header=TRUE)
ex2<- read.table(ex2,header=TRUE)
...
colnames(ex) <- c(1:18)
colnames(ex2) <- c(1:18)
...
Ex <- c(ex$'14')
Ex2 <- c(ex2$'14')
...
ggplot()+
geom_histogram(data = Ex, fill = "red", alpha = 0.2) +
geom_histogram(data = Ex2, fill = "blue", alpha = 0.2)
And my data is in the files and look a bit like this:
head(ex,10)
1 2 3 4 5 6 7 8 9 10 11 12
1 1:28 400 0.42 400 0.42 1 1 2 41.8 0 0.0 0.0
2 1:96 5599 39.99 5599 39.99 34 42 50 100.0 100 100.0 100.0
3 1:53 334 0.63 334 0.63 1 2 2 62.1 0 0.0 0.0
4 1:27 6932 49.51 6932 49.51 48 52 57 100.0 100 100.0 100.0
5 1:36 27562 124.15 27562 124.15 97 123 157 100.0 100 100.0 100.0
6 1:14 2340 16.71 2340 16.71 13 17 21 100.0 100 100.0 95.7
7 1:96 8202 49.71 8202 49.71 23 43 80 100.0 100 100.0 100.0
8 1:34 3950 28.21 3950 28.21 22 33 36 100.0 100 100.0 100.0
9 1:60 5563 24.62 5563 24.62 11 24 41 100.0 100 96.5 75.2
10 1:06 1646 8.11 1646 8.11 7 8 13 100.0 100 87.2 32.0
13 14 15 16 17 18
1 0.0 0.0 0.0 0.0 0.0 0.0
2 93.6 82.9 57.9 24.3 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0 0.0
4 100.0 97.1 87.1 57.1 0.0 0.0
5 100.0 100.0 100.0 100.0 88.3 71.2
6 40.0 0.0 0.0 0.0 0.0 0.0
7 81.2 66.7 54.5 47.9 29.1 0.0
8 76.4 55.7 0.0 0.0 0.0 0.0
9 57.5 35.4 26.5 4.4 0.0 0.0
10 0.0 0.0 0.0 0.0 0.0 0.0
But much larger. This means that ex and ex2 will be a percentage from 0 to 100. The colnames line changes the column heads like %_above_30 to something R likes better so I change it to number each column name.
Does anyone know/see the problem here because I am not really getting it.
Thanks!!

Maybe try combining the two data frames in one and supply that to one geom_histogram:
#maybe reshape it something like this (base reshape or the
#reshape package may be a better tool)
dat <- data.frame(rbind(ex, ex2),
colvar=factor(c(rep("ex", nrow(ex)), rep("ex2", nrow(ex2))))
ggplot(data = dat, fill = colvar)+
geom_histogram(position="identity", alpha = 0.2)
This is untested as your code isn't reproducible (please see this link on how to make a reproducible example).
Here's the idea I'm talking about with a reproducible example:
library(ggplot2)
path = "http://www-stat.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.data"
saheart <- read.table(path, sep=",",head=T,row.names=1)
fmla <- "chd ~ sbp + tobacco + ldl + adiposity + famhist + typea + obesity"
model <- glm(fmla, data=saheart, family=binomial(link="logit"),
na.action=na.exclude)
dframe <- data.frame(chd=as.factor(saheart$chd),
prediction=predict(model, type="response"))
ggplot(dframe, aes(x=prediction, fill=chd)) +
geom_histogram(position="identity", binwidth=0.05, alpha=0.5)

Related

How to sample data non-random

I have weather dataset my data is date-dependent
I want to predict the temperature from 07 May 2008 until 18 May 2008 (which is maybe a total of 10-15 observations) my data size is around 200
I will be using decision tree/RF and SVM & NN to make my prediction
I've never handled data like this so I'm not sure how to sample non random data
I want to sample data 80% train data and 30% test data but I want to sample the data in the original order not randomly. Is that possible ?
install.packages("rattle")
install.packages("RGtk2")
library("rattle")
seed <- 42
set.seed(seed)
fname <- system.file("csv", "weather.csv", package = "rattle")
dataset <- read.csv(fname, encoding = "UTF-8")
dataset <- dataset[1:200,]
dataset <- dataset[order(dataset$Date),]
set.seed(321)
sample_data = sample(nrow(dataset), nrow(dataset)*.8)
test<-dataset[sample_data,] # 30%
train<-dataset[-sample_data,] # 80%
output
> head(dataset)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
1 2007-11-01 Canberra 8.0 24.3 0.0 3.4 6.3 NW 30
2 2007-11-02 Canberra 14.0 26.9 3.6 4.4 9.7 ENE 39
3 2007-11-03 Canberra 13.7 23.4 3.6 5.8 3.3 NW 85
4 2007-11-04 Canberra 13.3 15.5 39.8 7.2 9.1 NW 54
5 2007-11-05 Canberra 7.6 16.1 2.8 5.6 10.6 SSE 50
6 2007-11-06 Canberra 6.2 16.9 0.0 5.8 8.2 SE 44
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
1 SW NW 6 20 68 29 1019.7
2 E W 4 17 80 36 1012.4
3 N NNE 6 6 82 69 1009.5
4 WNW W 30 24 62 56 1005.5
5 SSE ESE 20 28 68 49 1018.3
6 SE E 20 24 70 57 1023.8
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
1 1015.0 7 7 14.4 23.6 No 3.6 Yes
2 1008.4 5 3 17.5 25.7 Yes 3.6 Yes
3 1007.2 8 7 15.4 20.2 Yes 39.8 Yes
4 1007.0 2 7 13.5 14.1 Yes 2.8 Yes
5 1018.5 7 7 11.1 15.4 Yes 0.0 No
6 1021.7 7 5 10.9 14.8 No 0.2 No
> head(test)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
182 2008-04-30 Canberra -1.8 14.8 0.0 1.4 7.0 N 28
77 2008-01-16 Canberra 17.9 33.2 0.0 10.4 8.4 N 59
88 2008-01-27 Canberra 13.2 31.3 0.0 6.6 11.6 WSW 46
58 2007-12-28 Canberra 15.1 28.3 14.4 8.8 13.2 NNW 28
96 2008-02-04 Canberra 18.2 22.6 1.8 8.0 0.0 ENE 33
126 2008-03-05 Canberra 12.0 27.6 0.0 6.0 11.0 E 46
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
182 E N 2 19 80 40 1024.2
77 N NNE 15 20 58 62 1008.5
88 N WNW 4 26 71 28 1013.1
58 NNW NW 6 13 73 44 1016.8
96 SSE ENE 7 13 92 76 1014.4
126 SSE WSW 7 6 69 35 1025.5
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
182 1020.5 1 7 5.3 13.9 No 0.0 No
77 1006.1 6 7 24.5 23.5 No 4.8 Yes
88 1009.5 1 4 19.7 30.7 No 0.0 No
58 1013.4 1 5 18.3 27.4 Yes 0.0 No
96 1011.5 8 8 18.5 22.1 Yes 9.0 Yes
126 1022.2 1 1 15.7 26.2 No 0.0 No
> head(train)
Date Location MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed
7 2007-11-07 Canberra 6.1 18.2 0.2 4.2 8.4 SE 43
9 2007-11-09 Canberra 8.8 19.5 0.0 4.0 4.1 S 48
11 2007-11-11 Canberra 9.1 25.2 0.0 4.2 11.9 N 30
16 2007-11-16 Canberra 12.4 32.1 0.0 8.4 11.1 E 46
22 2007-11-22 Canberra 16.4 19.4 0.4 9.2 0.0 E 26
25 2007-11-25 Canberra 15.4 28.4 0.0 4.4 8.1 ENE 33
WindDir9am WindDir3pm WindSpeed9am WindSpeed3pm Humidity9am Humidity3pm Pressure9am
7 SE ESE 19 26 63 47 1024.6
9 E ENE 19 17 70 48 1026.1
11 SE NW 6 9 74 34 1024.4
16 SE WSW 7 9 70 22 1017.9
22 ENE E 6 11 88 72 1010.7
25 SSE NE 9 15 85 31 1022.4
Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday RISK_MM RainTomorrow
7 1022.2 4 6 12.4 17.3 No 0.0 No
9 1022.7 7 7 14.1 18.9 No 16.2 Yes
11 1021.1 1 2 14.6 24.0 No 0.2 No
16 1012.8 0 3 19.1 30.7 No 0.0 No
22 1008.9 8 8 16.5 18.3 No 25.8 Yes
25 1018.6 8 2 16.8 27.3 No 0.0 No
I use mtcars as an example. An option to non-randomly split your data in train and test is to first create a sample size based on the number of rows in your data. After that you can use split to split the data exact at the 80% of your data. You using the following code:
smp_size <- floor(0.80 * nrow(mtcars))
split <- split(mtcars, rep(1:2, each = smp_size))
With the following code you can turn the split in train and test:
train <- split$`1`
test <- split$`2`
Let's check the number of rows:
> nrow(train)
[1] 25
> nrow(test)
[1] 7
Now the data is split in train and test without losing their order.

How to manipulate a .xlsx data with using nested for loops in R?

This is what the sample looks like:
Year Categories January February March April May June July August September October November December
1990 A 4564.0 465465.0 12 468 4884.0 12788.00 4218.00 -58445.86 -90643.00 -122840.1 -155037.29 -187234.4286
1990 B 6487.0 421214.0 878 2112 421283.0 56456.00 54654.00 515.00 212.00 515.0 212.00 515.0000
1990 C 42862.0 512.0 484 48 515.0 212.00 515.00 137858.33 48.00 137858.3 48.00 465.0000
1990 D 15.0 -169222.7 90 456 137858.3 48.00 465.00 135673.83 778.00 135673.8 778.00 12.0000
1990 E 19164.0 -401699.2 -304 246 135673.8 778.00 12.00 133489.33 57.00 133489.3 57.00 478.0000
1991 A 21436.8 -634175.7 -698 36 133489.3 57.00 478.00 131304.83 3.00 131304.8 3.00 331.3333
1991 B 23709.6 -866652.2 -1092 -174 131304.8 3.00 -8210.60 129120.33 30425.33 129120.3 -11463.57 337.8333
1992 A 32800.8 -1796558.2 -2668 -1014 122566.8 -27597.89 -29087.86 292051.00 82253.33 331147.5 -12728.17 363.8333
1992 B 35073.6 -2029034.7 -3062 -1224 120382.3 -32976.00 -34307.17 321333.47 95210.33 367329.4 -14420.56 370.3333
1992 C 37346.4 -2261511.2 -3456 -1434 118197.8 -38354.11 -39526.49 350615.94 108167.33 403511.2 -16112.96 376.8333
Hey guys, first of all sorry for the format of the sample data. I am newb in stackoverflow and R. I have a problem about manipulation of a data in R. I want to use nested for-loops to manipulate this data.
What I want is exactly this:
I want to keep the A, B, C, D, E categories fixed on a "Category" column and put the data of the months that are next to them. Like January 1990, February 1990, ...., December 1990, January 1991, February 1991, ..., December 1991. These months will be side by side and will be in column form. Since there is no data for certain categories in some years, it should give a value of "0" in the months corresponding to those categories. How should I write a nested for loop for this?
Thank you in advance for your help.
Actually I tried to write a loop like this but I can't write if-else statement in that loops. This code was for my actual big data but I created a sample data to upload there.
require(xlsx)
data = read.xlsx("sample-data.xlsx", sheetName = "Sheet1")
library("tidyverse")
cat_data <- as.data.frame(data$Category)
unique_cat <- as.data.frame(unique(cat_data))
names(unique_cat)[names(unique_cat) == "data$Category"] <- "Category"
month_data <- subset(data, select = -c(1:4))
year_data <- subset(data, select = -c(2:16))
data %>%
for (i in 1:length(data)){
for (j in 1:length(cat_data)){
for (k in 1:length(month_data)){
print(c(data)[i,j,k])
}
}
}
This is the expected output (Since it i:
Categories Jan.90 Feb.90 Mar.90 Apr.90 May.90 June.90 July.90
1 A 4564 465465.0 12 468 4884.0 12788 4218
2 B 6487 421214.0 878 2112 421283.0 56456 54654
3 C 42862 512.0 484 48 515.0 212 515
4 D 15 -169222.7 90 456 137858.3 48 465
5 E 19164 -401699.2 -304 246 135673.8 778 12
Aug.90 Sep.90 Oct.90 Nov.90 Dec.90 Jan.91 Feb.91
1 -58445.86 -90643 -122840.1 -155037.3 -187234.4 21436.8 -634175.7
2 515.00 212 515.0 212.0 515.0 23709.6 -866652.2
3 137858.33 48 137858.3 48.0 465.0 0.0 0.0
4 135673.83 778 135673.8 778.0 12.0 0.0 0.0
5 133489.33 57 133489.3 57.0 478.0 0.0 0.0
Mar.91 Apr.91 May.91 June.91 July.91 Aug.91 Sep.91 Oct.91
1 -698 36 133489.3 57 478.0 131304.8 3.00 131304.8
2 -1092 -174 131304.8 3 -8210.6 129120.3 30425.33 129120.3
3 0 0 0.0 0 0.0 0.0 0.00 0.0
4 0 0 0.0 0 0.0 0.0 0.00 0.0
5 0 0 0.0 0 0.0 0.0 0.00 0.0
Nov.91 Dec.91 Jan.92 Feb.92 Mar.92 Apr.92 May.92
1 3.00 331.3333 32800.8 -1796558 -2668 -1014 122566.8
2 -11463.57 337.8333 35073.6 -2029035 -3062 -1224 120382.3
3 0.00 0.0000 37346.4 -2261511 -3456 -1434 118197.8
4 0.00 0.0000 0.0 0 0 0 0.0
5 0.00 0.0000 0.0 0 0 0 0.0
June.92 July.92 Aug.92 Sep.92 Oct.92 Nov.92 Dec.92
1 -27597.89 -29087.86 292051.0 82253.33 331147.5 -12728.17 363.8333
2 -32976.00 -34307.17 321333.5 95210.33 367329.4 -14420.56 370.3333
3 -38354.11 -39526.49 350615.9 108167.33 403511.2 -16112.96 376.8333
4 0.00 0.00 0.0 0.00 0.0 0.00 0.0000
5 0.00 0.00 0.0 0.00 0.0 0.00 0.0000

Tabulize function in R

I want to extract the table of page 112 in this pdf document:
http://publications.credit-suisse.com/tasks/render/file/index.cfm?fileid=432759CA-0A73-57F6-04C67EF7EE506040
# report 2017
url_location <-"http://publications.credit-suisse.com/tasks/render/file/index.cfm?fileid=432759CA-0A73-57F6-04C67EF7EE506040"
out <- extract_tables(url_location, pages = 112)
I have tried using these tutorials (link1,link2) about 'tabulize' package but I largely failed. There are some difficult aspects which I am not very experienced how to handle in R.
Can someone suggest something and help me with that ?
Installation
devtools::install_github("ropensci/tabulizer")
# load package
library(tabulizer)
Java deps — while getting easier to deal with — aren't necessary when the tables are this clean. Just a bit of string wrangling will get you what you need:
library(pdftools)
library(stringi)
library(tidyverse)
# read it with pdftools
book <- pdf_text("global-wealth-databook.pdf")
# go to the page
lines <- stri_split_lines(book[[113]])[[1]]
# remove footer
lines <- discard(lines, stri_detect_fixed, "Credit Suisse")
# find line before start of table
start <- last(which(stri_detect_regex(lines, "^[[:space:]]+")))+1
# find line after table
end <- last(which(lines == ""))-1
# smuch into something read.[table|csv] can read
tab <- paste0(stri_replace_all_regex(lines[start:end], "[[:space:]][[:space:]]+", "\t"), collapse="\n")
#read it
read.csv(text=tab, header=FALSE, sep="\t", stringsAsFactors = FALSE)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
## 1 Egypt 56,036 3,168 324 98.1 1.7 0.2 0.0 100.0 91.7
## 2 El Salvador 3,957 14,443 6,906 66.0 32.8 1.2 0.0 100.0 65.7
## 3 Equatorial Guinea 670 8,044 2,616 87.0 12.2 0.7 0.1 100.0 77.3
## 4 Eritrea 2,401 3,607 2,036 94.5 5.4 0.1 100.0 57.1 NA
## 5 Estonia 1,040 43,158 27,522 22.5 72.2 5.1 0.2 100.0 56.4
## 6 Ethiopia 49,168 153 103 100.0 0.0 100.0 43.4 NA NA
## 7 Fiji 568 6,309 3,059 85.0 14.6 0.4 0.0 100.0 68.2
## 8 Finland 4,312 159,098 57,850 30.8 33.8 33.5 1.9 100.0 76.7
## 9 France 49,239 263,399 119,720 25.3 21.4 49.3 4.0 100.0 70.2
## 10 Gabon 1,098 15,168 7,367 62.0 36.5 1.5 0.0 100.0 68.4
## 11 Gambia 904 898 347 99.2 0.7 0.0 100.0 72.4 NA
## 12 Georgia 2,950 19,430 9,874 50.7 47.6 1.6 0.1 100.0 66.8
## 13 Germany 67,244 203,946 47,091 29.5 33.7 33.9 2.9 100.0 79.1
## 14 Ghana 14,574 809 411 99.5 0.5 0.0 100.0 66.1 NA
## 15 Greece 9,020 111,684 54,665 20.7 52.9 25.4 1.0 100.0 67.7
## 16 Grenada 70 17,523 4,625 74.0 24.3 1.5 0.2 100.0 81.5
## 17 Guinea 5,896 814 374 99.4 0.6 0.0 100.0 69.7 NA
## 18 Guinea-Bissau 884 477 243 99.8 0.2 100.0 65.6 NA NA
## 19 Guyana 467 5,345 2,510 89.0 10.7 0.3 0.0 100.0 67.2
## 20 Haiti 6,172 2,879 894 96.2 3.6 0.2 0.0 100.0 76.9
## 21 Hong Kong 6,172 193,248 46,079 26.3 50.9 20.9 1.9 100.0 85.1
## 22 Hungary 7,846 39,813 30,111 11.8 83.4 4.8 0.0 100.0 45.3
## 23 Iceland 245 587,649 444,999 13.0 72.0 15.0 100.0 46.7 NA
## 24 India 834,608 5,976 1,295 92.3 7.2 0.5 0.0 100.0 83.0
## 25 Indonesia 167,559 11,001 1,914 81.9 17.0 1.1 0.1 100.0 83.7
## 26 Iran 56,306 3,831 1,856 94.1 5.7 0.2 0.0 100.0 67.3
## 27 Ireland 3,434 248,466 84,592 31.2 22.7 42.3 3.6 100.0 81.3
## 28 Israel 5,315 198,406 78,244 22.3 38.7 36.7 2.3 100.0 74.2
## 29 Italy 48,544 223,572 124,636 21.3 22.0 54.1 2.7 100.0 66.0
## 30 Jamaica 1,962 9,485 3,717 79.0 20.2 0.8 0.0 100.0 74.3
## 31 Japan 105,228 225,057 123,724 7.9 35.7 53.9 2.6 100.0 60.9
## 32 Jordan 5,212 13,099 6,014 65.7 33.1 1.2 0.0 100.0 76.1
## 33 Kazakhstan 12,011 4,441 334 97.6 2.1 0.3 0.0 100.0 92.6
## 34 Kenya 23,732 1,809 662 97.4 2.5 0.1 0.0 100.0 77.2
## 35 Korea 41,007 160,609 67,934 20.0 40.5 37.8 1.7 100.0 70.0
## 36 Kuwait 2,996 97,304 37,788 30.3 48.3 20.4 1.0 100.0 76.9
## 37 Kyrgyzstan 3,611 4,689 2,472 92.7 7.0 0.2 0.0 100.0 62.9
## 38 Laos 3,849 5,662 1,382 94.6 4.7 0.7 0.0 100.0 84.9
## 39 Latvia 1,577 27,631 17,828 29.0 68.6 2.2 0.1 100.0 53.6
## 40 Lebanon 4,085 24,161 6,452 69.0 28.5 2.3 0.2 100.0 82.0
## 41 Lesotho 1,184 3,163 945 95.9 3.8 0.3 0.0 100.0 79.8
## 42 Liberia 2,211 2,193 959 97.3 2.6 0.1 0.0 100.0 71.6
## 43 Libya 4,007 45,103 24,510 29.6 61.1 9.2 0.2 100.0 59.9
## 44 Lithuania 2,316 27,507 17,931 27.3 70.4 2.1 0.1 100.0 51.6
## 45 Luxembourg 450 313,687 167,664 17.0 20.0 58.8 4.2 100.0 68.1
## 46 Macedonia 1,607 9,044 5,698 77.0 22.5 0.5 0.0 100.0 56.4
UPDATE
This is more generic but you'll still have to do some manual cleanup. I think you would even if you used Tabula.
library(pdftools)
library(stringi)
library(tidyverse)
# read it with pdftools
book <- pdf_text("~/Downloads/global-wealth-databook.pdf")
transcribe_page <- function(book, pg) {
# go to the page
lines <- stri_split_lines(book[[pg]])[[1]]
# remove footer
lines <- discard(lines, stri_detect_fixed, "Credit Suisse")
# find line before start of table
start <- last(which(stri_detect_regex(lines, "^[[:space:]]+")))+1
# find line after table
end <- last(which(lines == ""))-1
# get the target rows
rows <- lines[start:end]
# map out where data values are
stri_replace_first_regex(rows, "([[:alpha:]]) ([[:alpha:]])", "$1_$2") %>%
stri_replace_all_regex("[^[:blank:]]", "X") %>%
map(~rle(strsplit(.x, "")[[1]])) -> pos
# compute the number of data fields
nfields <- ceiling(max(map_int(pos, ~length(.x$lengths))) / 2)
# do our best to get them into columns
data_frame(rec = rows) %>%
separate(rec, into=sprintf("X%s", 1:nfields), sep="[[:space:]]{2,}", fill="left") %>%
print(n=length(rows))
}
transcribe_page(book, 112)
transcribe_page(book, 113)
transcribe_page(book, 114)
transcribe_page(book, 115)
Take a look at the outputs for ^^. They aren't in terrible shape and some of the cleanup can be programmatic.

making a fully labelled scatter plot using R

can someone help me show me how I could make a fully labelled scatter plot for 2 variables, showing the axis labels with units(such as "cm"), and also including the chart title. Forexample, how would i make a fully labelled scatter plot including all the above listed features for age and height, using the following data using R?
Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes
With base graphics:
df <- read.table(header=T, sep=" ", text="
Yes Distance Age Height Coning
1 21.4 18 3.3 Yes
2 13.9 17 3.4 Yes
3 23.9 16 2.9 Yes
4 8.7 18 3.6 No
5 241.8 6 0.7 No
6 44.5 17 1.3 Yes
7 30.0 15 2.5 Yes
8 32.3 16 1.8 Yes
9 31.4 17 5.0 No
10 32.8 13 1.6 No
11 53.3 12 2.0 No
12 54.3 6 0.9 No
13 96.3 11 2.6 No
14 133.6 4 0.6 No
15 32.1 15 2.3 No
16 57.9 12 2.4 Yes
17 30.8 17 1.8 No
18 59.9 7 0.8 No
19 42.7 15 2.0 Yes
20 20.6 18 1.7 Yes
21 62.0 8 1.3 No
22 53.1 7 1.6 No
23 28.9 16 2.2 Yes
24 177.4 5 1.1 No
25 24.8 14 1.5 Yes
26 75.3 14 2.3 Yes
27 51.6 7 1.4 No
28 36.1 9 1.1 No
29 116.1 6 1.1 No
30 28.1 16 2.5 Yes
31 8.7 19 2.2 Yes
32 105.1 6 0.8 No
33 46.0 15 3.0 Yes
34 102.6 7 1.2 No
35 15.8 15 2.2 No
36 60.0 7 1.3 No
37 96.4 13 2.6 No
38 24.2 14 1.7 No
39 14.5 15 2.4 No
40 36.6 14 1.5 No
41 65.7 5 0.6 No
42 116.3 7 1.6 No
43 113.6 8 1.0 No
44 16.7 15 4.3 Yes
45 66.0 7 1.0 No
46 60.7 7 1.0 No
47 90.6 7 0.7 No
48 91.3 7 1.3 No
49 14.4 18 3.1 Yes
50 72.8 14 3.0 Yes")
attach(df)
lab <- sprintf("%.1fcm, %dyr", Height, Age)
plot(Age ~ Height, main="The Title", pch=20, xlab="Height in cm", ylab="Age in years")
text(y=Age, x=Height, labels=lab, cex=.7, col=rgb(0,0,0,.5), pos=4)
detach(df)
And with the help of wordcloud::textplot():
if (!require(wordcloud)) {
install.packages("wordcloud")
library(wordcloud)
}
plot(Age ~ Height, main="The Title", pch=20, xlab="Height in cm", ylab="Age in years", type="n")
textplot(y=Age, x=Height, words=lab, cex=.5, new=F, show.lines=T)
You can use the ggplot2 library. Example -
library(ggplot2)
ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))+
geom_point() +
geom_text()
What that code snippet is doing is taking the 'mtcars' dataset, assigning the x variable as the wt column, the y variable as the mpg column, and the labels as the rownames. geom_point adds a scatterplot based on the above x,y, and geom_text places the labels at the x,y coordinates.
Check out the help entry on geom_text to see the formatting options.
Examples taken from ggplot2 documentation, page 98
p <- ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))
p + geom_text()
# Change size of the label
p + geom_text(size=10)
p <- p + geom_point()
# Set aesthetics to fixed value
p + geom_text()
p + geom_point() + geom_text(hjust=0, vjust=0)
p + geom_point() + geom_text(angle = 45)
# Add aesthetic mappings
p + geom_text(aes(colour=factor(cyl)))
p + geom_text(aes(colour=factor(cyl))) + scale_colour_discrete(l=40)
p + geom_text(aes(size=wt))
p + geom_text(aes(size=wt)) + scale_size(range=c(3,6))
# You can display expressions by setting parse = TRUE. The
# details of the display are described in ?plotmath, but note that
# geom_text uses strings, not expressions.
p + geom_text(aes(label = paste(wt, "^(", cyl, ")", sep = "")),
parse = TRUE)
# Add an annotation not from a variable source
c <- ggplot(mtcars, aes(wt, mpg)) + geom_point()
c + geom_text(data = NULL, x = 5, y = 30, label = "plot mpg vs. wt")
# Or, you can use annotate
c + annotate("text", label = "plot mpg vs. wt", x = 2, y = 15, size = 8, colour = "red")
# Use qplot instead
qplot(wt, mpg, data = mtcars, label = rownames(mtcars),
geom=c("point", "text"))
qplot(wt, mpg, data = mtcars, label = rownames(mtcars), size = wt) +
geom_text(colour = "red")
# You can specify family, fontface and lineheight
p <- ggplot(mtcars, aes(x=wt, y=mpg, label=rownames(mtcars)))
p + geom_text(fontface=3)
p + geom_text(aes(fontface=am+1))
p + geom_text(aes(family=c("serif", "mono")[am+1]))

similar to excel vlookup

Hi
i have a 10 year, 5 minutes resolution data set of dust concentration
and i have seperetly a 15 year data set with a day resolution of the synoptic clasification
how can i combine these two datasets they are not the same length or resolution
here is a sample of the data
> head(synoptic)
date synoptic
1 01/01/1995 8
2 02/01/1995 7
3 03/01/1995 7
4 04/01/1995 20
5 05/01/1995 1
6 06/01/1995 1
>
head(beit.shemesh)
X........................ StWd SHT PRE GSR RH Temp WD WS PM10 CO O3
1 NA 64 19.8 0 -2.9 37 15.2 61 2.2 241 0.9 40.6
2 NA 37 20.1 0 1.1 38 15.2 344 2.1 241 0.9 40.3
3 NA 36 20.2 0 0.7 39 15.1 32 1.9 241 0.9 39.4
4 NA 52 20.1 0 0.9 40 14.9 20 2.1 241 0.9 38.7
5 NA 42 19.0 0 0.9 40 14.6 11 2.0 241 0.9 38.7
6 NA 75 19.9 0 0.2 40 14.5 341 1.3 241 0.9 39.1
No2 Nox No SO2 date
1 1.4 2.9 1.5 1.6 31/12/2000 24:00
2 1.7 3.1 1.4 0.9 01/01/2001 00:05
3 2.1 3.5 1.4 1.2 01/01/2001 00:10
4 2.7 4.2 1.5 1.3 01/01/2001 00:15
5 2.3 3.8 1.5 1.4 01/01/2001 00:20
6 2.8 4.3 1.5 1.3 01/01/2001 00:25
any idea's
Make an extra column for calculating the dates, and then merge. To do this, you have to generate a variable in each dataframe bearing the same name, hence you first need some renaming. Also make sure that the merge column you use has the same type in both dataframes :
beit.shemesh$datetime <- beit.shemesh$date
beit.shemesh$date <- as.Date(beith.shemesh$datetime,format="%d/%m/%Y")
synoptic$date <- as.Date(synoptic$date,format="%d/%m/%Y")
merge(synoptic, beit.shemesh,by="date",all.y=TRUE)
Using all.y=TRUE keeps the beit.shemesh dataset intact. If you also want empty rows for all non-matching rows in synoptic, you could use all=TRUE instead.

Resources