windRose with Package openair - r

I am trying to plot Windrose using package openair. I have used a simple code and i am getting the error "Subscript out of bound". I couldn't figure out what the error means. Here, Obs_WR is my data and ws and wd are the column names for wind speed and wind direction respectively.
windRose(Obs_WR, ws="ws", wd="wd")
Error in mydata[[wd]] : subscript out of bounds
A part of dataframe is as follows :
> Obs_WR
ws wd
[1,] 3.715714 0.1250627
[2,] 4.491868 351.4789611
[3,] 5.312253 346.4029396
[4,] 6.047143 349.8645344
[5,] 6.071389 329.2137482
.... ........ ...........
[38,] 16.589769 274.0356269
[39,] 8.065556 273.2977654
[40,] 7.953387 130.6359338

Related

Quanteda with topicmodels: removed stopwords appear in results (Chinese)

My code:
library(quanteda)
library(topicmodels)
# Some raw text as a vector
postText <- c("普京 称 俄罗斯 未 乌克兰 施压 来自 头 条 新闻", "长期 电脑 前进 食 致癌 环球网 报道 乌克兰 学者 认为 电脑 前进 食 会 引发 癌症 等 病症 电磁 辐射 作用 电脑 旁 水 食物 会 逐渐 变质 有害 物质 累积 尽管 人体 短期 内 会 感到 适 会 渐渐 引发 出 癌症 阿尔茨海默 式 症 帕金森 症 等 兔子", "全 木 手表 乌克兰 木匠 瓦列里·达内维奇 木头 制作 手表 共计 154 手工 零部件 唯一 一个 非 木制 零件 金属 弹簧 驱动 指针 运行 其他 零部件 材料 取自 桦树 苹果树 杏树 坚果树 竹子 黄杨树 愈疮木 非洲 红木 总共 耗时 7 打造 手表 不仅 能够 正常 运行 天 时间 误差 保持 5 分钟 之内 ")
# Create a corpus of the posts
postCorpus <- corpus(postText)
# Make a dfm, removing numbers and punctuation
myDocTermMat <- dfm(postCorpus, stem = FALSE, removeNumbers = TRUE, removeTwitter = TRUE, removePunct = TRUE)
# Estimate a LDA Topic Model
if (require(topicmodels)) {
myLDAfit <- LDA(convert(myDocTermMat, to = "topicmodels"), k = 2)
}
terms(myLDAfit, 11)
The code works and I see a result. Here is an example of the output:
Topic 1 Topic 2
[1,] "木" "会"
[2,] "手表" "电脑"
[3,] "零" "乌克兰"
[4,] "部件" "前进"
[5,] "运行" "食"
[6,] "乌克兰" "引发"
[7,] "内" "癌症"
[8,] "全" "等"
[9,] "木匠" "症"
[10,] "瓦" "普"
[11,] "列" "京"
Here is the problem. All of my posts have been segmented (necessary pre-processing step for Chinese) and had stop words removed. Nonetheless, the topic model returns topics containing single-character stop terms that have already been removed. If I open the raw .txt files and do ctrl-f for a given single-character stop word, no results are returned. But those terms show up in the returned topics from the R code, perhaps because the individual characters occur as part of other multi-character words. E.g. 就 is a preposition treated as a stop word, but 成就 means "success."
Related to this, certain terms are split. For example, one of the events I am examining contains references to Russian president Putin ("普京"). In the topic model results, however, I see separate term entries for "普" and "京" and no entries for "普京". (See lines 10 and 11 in output topic 2, compared to the first word in the raw text.)
Is there an additional tokenization step occurring here?
Edit: Modified to make reproducible. For some reason it wouldn't let me post until I also deleted my introductory paragraph.
Here's a workaround, based on using a faster but "dumber" word tokeniser based on space ("\\s") splitting:
# fails
features(dfm(postText, verbose = FALSE))
## [1] "普" "京" "称" "俄罗斯" "未" "乌克兰" "施压" "来自" "头" "条" "新闻"
# works
features(dfm(postText, what = "fasterword", verbose = FALSE))
## [1] "普京" "称" "俄罗斯" "未" "乌克兰" "施压" "来自" "头" "条" "新闻"
So add what = "fasterword" to the dfm() call and you will get this as a result, where Putin ("普京") is not split.
terms(myLDAfit, 11)
## Topic 1 Topic 2
## [1,] "会" "手表"
## [2,] "电脑" "零部件"
## [3,] "乌克兰" "运行"
## [4,] "前进" "乌克兰"
## [5,] "食" "全"
## [6,] "引发" "木"
## [7,] "癌症" "木匠"
## [8,] "等" "瓦列里达内维奇"
## [9,] "症" "木头"
## [10,] "普京" "制作"
## [11,] "称" "共计"
This is an interesting case of where quanteda's default tokeniser, built on the definition of stringi's definition of text boundaries (see stri_split_boundaries, does not work in the default setting. It might after experimentation with locale, but these are not currently options that can be passed to quanteda::tokenize(), which dfm() calls.
Please file this as an issue at https://github.com/kbenoit/quanteda/issues and I'll try to get working on a better solution using the "smarter" word tokeniser.

What are the equivalents of MCA variables coordinates and supplementary variables coordinates in mjca?

I would like to use mjca (package 'ca') on my data in order to estimate explained variation more realistically for the dimensions. The problem is that I would like to extract the coordinates of the active and supplementary variables in order to edit them in a data frame. However, the names of the variables and the dimensions are not given in the output of mjca. In MCA (package 'FactoMineR') the output is given as follows:
> mca$var$coord
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
a 0.620468268 0.011534137 -0.542655702 0.47922448 0.15548571
cl 1.231043177 4.591555841 -0.323929172 0.19597918 -0.41446395
np -0.347646238 0.003735466 -0.006099464 0.02238883 0.16510343
num 0.417635652 -0.351884061 -0.760499677 0.60590774 -0.35647256
pr 0.945109906 -0.227098798 3.411969743 2.70823750 -0.64981046
vp 0.809895398 -0.303805822 0.048900811 -0.50023568 -0.53191069
EMB_no 0.396034450 -0.046768029 -0.058069978 0.05448188 0.06326411
EM_yes -1.009887848 0.119258474 0.148078445 -0.13892880 -0.16132349
ca -0.345163332 -0.088791765 -0.222907122 0.16679404 -0.12407031
to 0.375618920 0.096626332 0.242575397 -0.18151117 0.13501769
ART_no -0.006456155 0.021963298 0.049258256 -0.05919682 -0.07539649
ART_yes 0.044475732 -0.151302718 -0.339334655 0.40780032 0.51939806
> mca$quali.sup$coord
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
ipva -0.1508708 0.04768873 -0.0233159 0.08795449 0.01645747
isv 0.6731160 -0.21276510 0.1040248 -0.39241234 -0.07342562
Is there a way to extract and paste the names of the variables to the coordinates in mjca? In mjca the output is not easily interpretable:
> mjca$colcoord
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] -1.14877950 -0.03284730 1.85933139 1.71512222 0.59783898 2.33329527 -1.96334559
[2,] -2.27924173 -13.07598521 1.10989653 0.70140040 -1.59360435 -0.53239520 0.59206669
[3,] 0.64365721 -0.01063798 0.02089893 0.08012859 0.63481889 -0.18685503 0.16878106
[4,] -0.77324063 1.00210711 2.60574231 2.16851574 -1.37062878 1.00967151 -7.97223158
[5,] -1.74984434 0.64673950 -11.69062156 9.69265663 -2.49850629 -0.05490287 -0.25510472
[6,] -1.49949849 0.86518831 -0.16755157 -1.79032033 -2.04518437 -0.20949668 0.65583755
[7,] -0.73324662 0.13318755 0.19896839 0.19498813 0.24324906 0.05337135 0.15195814
[8,] 1.86977887 -0.33962824 -0.50736940 -0.49721974 -0.62028511 -0.13609695 -0.38749326
[9,] 0.63906018 0.25286414 0.76375906 0.59694816 -0.47704750 -0.12352216 -0.44654719
[10,] -0.69544784 -0.27517568 -0.83114957 -0.64962006 0.51913993 0.13442117 0.48594841
[11,] 0.01195339 -0.06254781 -0.16877630 -0.21186268 -0.28989778 -0.27854165 -0.39280823
[12,] -0.08234556 0.43088491 1.16268119 1.45949845 1.99707361 1.91884250 2.70601223
[13,] -0.76304235 0.48124458 1.23544015 1.02426084 0.57083939 1.79253772 -0.83528819
[14,] -1.64355033 -5.87863606 0.76112230 0.59773026 -0.22298799 0.40707187 -0.23302386
[15,] -0.40920789 0.55803109 1.06737902 0.94054386 0.10598994 -0.60665661 -1.11993976
I think I should first extract the coordinates like this:
coord.mjca<-as.data.frame(mjca$colcoord)
row.names(coord.mjca)<-mjca$levelnames
colnames(coord.mjca)<-c("Dim 1", "Dim 2", "Dim 3", "Dim 4", "Dim 5", "Dim 6", "Dim 7")
Do you think I should do it like this?
Thank you for your help!
In ca package, you can use cacoord for this.
For example
cacoord(mca, type='rowprincipal', rows=T)
cacoord(mca, type='symmetric', cols=T)
I hope this helps.

Assigning variable names within a function in R

I am currently working on a dataset in R which is assigned to the global enviroment in R by a function of i, due to the nature of my work I am unable to disclose the dataset so let's use an example.
DATA
[,1] [,2] [,3] [,4] [,5]
[1,] 32320 27442 29275 45921 162306
[2,] 38506 29326 33290 45641 175386
[3,] 42805 30974 33797 47110 198358
[4,] 42107 34690 47224 62893 272305
[5,] 54448 39739 58548 69470 316550
[6,] 53358 48463 63793 79180 372685
Where DATA(i) is a function and the above is an output for a certain i
I want to assign variable names based on i such as:-
names(i)<-c(a(i),b(i),c(i),d(i),e(i))
for argument sake, let's say that the value of names for this specific i is
c("a","b","c","d","e")
I hope that it will produce the following:-
a b c d e
[1,] 32320 27442 29275 45921 162306
[2,] 38506 29326 33290 45641 175386
[3,] 42805 30974 33797 47110 198358
[4,] 42107 34690 47224 62893 272305
[5,] 54448 39739 58548 69470 316550
[6,] 53358 48463 63793 79180 372685
This is the code I currently use:-
VarName<-function(i){
colnames(DATA(i))<<-names(i)
}
However this produces an error message when I run it: "Error in colnames(DATA(i)) <- names(i)) :
target of assignment expands to non-language object" which we can see from my input that isn't true. Is there another way to do this?
Sorry for the basic questions. I'm fairly new to programming.

How to select a value from a table in R

I have the following data, called fit.2.sim:
An object of class "sim"
Slot "coef":
fit.2.sim
[,1] [,2]
[1,] -1.806363 5.148728
[2,] -3.599123 5.183769
[3,] 4.192562 4.855095
[4,] 2.658218 4.967007
[5,] -2.304084 5.220325
[6,] -1.010406 5.071663
[7,] 2.601671 5.129750
[8,] 5.977764 4.757826
[9,] 3.873432 4.932319
[10,] 1.281331 5.138091
Slot "sigma":
[1] 8.285497 10.659971 9.568340 8.649106 8.611894 9.041444 8.316859 7.990499 8.985450
[10] 7.947142
The command I have been using, to no avail unfortunately is:
fit.2.sim$coef[i,j]
i,j being the respective rows and columns. The error I get is:
"Error in fit.2.sim$coef : $ operator not defined for this S4 class"
Could you please tell me if there is another way to make this work?
S4 classes use # not $ to access slots, so you probably wanted
fit.t.sim#coef[i,j]

tkplot in igraph within R

Here is my code with the corresponding output
> tkplot(g.2,vertex.label=nodes,
+ canvas.width=700,
+ canvas.height=700)
[1] 6
> ?tkplot
Warning message:
In rm(list = cmd, envir = .tkplot.env) : object 'tkp.6' not found
I get this error no matter what command I run after building and viewing my plot.
This may be obvious, but I can't get at the data from the plot.
> tkp.6.getcoords
Error: object 'tkp.6.getcoords' not found
Any thoughts? On Windows 2007 Pro.
R is a functional programming language. tkplot is a bit odd (for R users anyway) in that it returns numeric handles to its creations. Try this instead:
tkplot.getcoords(6)
When I run the example on the tkplot page, I then get this from tkplot.getcoords(1) since it was my first igraph plot:
> tkplot.getcoords(1)
[,1] [,2]
[1,] 334.49319 33.82983
[2,] 362.43837 286.10754
[3,] 410.61862 324.98319
[4,] 148.00673 370.91116
[5,] 195.69191 20.00000
[6,] 29.49197 430.00000
[7,] 20.00000 155.05409
[8,] 388.51103 62.61010
[9,] 430.00000 133.44695
[10,] 312.76239 168.90260

Resources