IT Hit WebDAV PowerPoint and Excel Unlocking before document opens - webdav

I have implemented the IT Hit WebDAV server on our company website. I've been looking at the logs and I can see it unlocks PowerPoint presentation twice. Once just before it open and then when you close MS PowerPoint.
Can I stop this from happening so when you open a PowerPoint document it unlock only when the document closes like MS Word.
When I open a Word document it only unlocks it once when you close the MS Word.
As you can see from the logs below there are less request in MS Word than in MS PowerPoint. Both documents I followed the same process:
Open
Edit
Save
Close
The reason why I would like it to unlock only once is so I can implement some custom code for security which should only run when the user has finished using the document.
Microsoft Word
[29] [OPTIONS] /DAV/
[45] [HEAD] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
[99] [OPTIONS] /DAV/
[79] [LOCK] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
[99] [GET] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
[54] [PROPFIND] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
[74] [LOCK] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
[94] [PUT] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
[94] [UNLOCK] /DAV/437f144e-c42a-4e8d-97b2-45fa3d1f0a71/Document.docx
PowerPoint
[89] [OPTIONS] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/
[86] [HEAD] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[89] [OPTIONS] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/
[86] [LOCK] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[89] [GET] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[97] [PROPFIND] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[65] [HEAD] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[68] [UNLOCK] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[97] [OPTIONS] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/
[86] [HEAD] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[97] [GET] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[100] [PROPFIND] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[68] [HEAD] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[86] [LOCK] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[89] [GET] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[68] [PROPFIND] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[97] [HEAD] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[59] [LOCK] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[59] [PUT] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx
[86] [UNLOCK] /DAV/c763764d-3ba2-46f1-abee-07fa33241309/PowerPoint.pptx

I guess PowerPoint displayed the "Protected View" yellow ribbon on top with "Enable Editing" button, while Word - did not.
This is probably because your Word document was empty (0 bytes), while the PowerPoint - not.
Try the following and you will get 2 locks with Word file:
Create the word file in your local file system, edit it and save, so
it is not 0-bytes.
Upload it to your WebDAV server.
Open it for editting.
Another possible reason - you already opened Word document on this computer in the past (so the "Protected View" did not activate), while the PowerPoint document was opened for the firs time.
MS Office 2013 locks the document when it is being opened. In case it activates "Protected View" the document is unlocked immediately after opening. If you click "Enable Editing" the document is locked again. It is unlocked when the user closes document or when the lock token expires.
In general there should be no problem in multiple locking and unlocking the document, they always come in pairs, as in your log.
Please also note that locking is requested for limited period of time. In case the MS Office needs a longer lock it will prolong the lock, the server will call ILock.RefreshLock in this case.

Related

How to make Julia PkgServer.jl work offline

I work mostly on offline machines and really want to begin to migrate from Python to Julia. Currently the biggest problem I face is how can I setup a package server on my own network that does not have access to the internet. I can copy files to the offline computer/network and want to be able to just cache a good percentage of the Julia Package Ecosystem and copy it to my network, so that I and others can install packages as needed.
I have experimented with PkgSever.jl by using the deployment docker-compose script they have, then just installing a long list of packages so that the PkgServer instance would cache everything. Next took the PkgServer machine offline and attempted to install packages from it. This worked well, however when I restarted the docker container the server was running in, everything fell apart quickly.
It seems that maybe the PkgServer needs to be able to talk to the Storage Server at least once before being able to serve packages. I tried setting:
JULIA_PKG_SERVER_STORAGE_SERVERS from: "https://us-east.storage.juliahub.com,https://kr.storage.juliahub.com" to: "" but that failed miserably.
Can someone please point me in the right direction.
TIA
It looks like the PkgServer is actually trying to contact the Registry before it starts. I don't know enough about the registry stuff enough to know if there is a way to hack this to look locally or just ignore this..
pkgserver_1 | ERROR: LoadError: DNSError: kr.storage.juliahub.com, temporary failure (EAI_AGAIN)
pkgserver_1 | Stacktrace:
pkgserver_1 | [1] getalladdrinfo(::String) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Sockets/src/addrinfo.jl:112
pkgserver_1 | [2] getalladdrinfo at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.5/Sockets/src/addrinfo.jl:121 [inlined]
pkgserver_1 | [3] getconnection(::Type{TCPSocket}, ::SubString{String}, ::String; keepalive::Bool, connect_timeout::Int64, kw::Base.Iterators.Pairs{Symbol,Union{Nothing, Bool},NTuple{4,Symbol},NamedTuple{(:require_ssl_verification, :iofunction, :reached_redirect_limit, :status_exception),Tuple{Bool,Nothing,Bool,Bool}}}) at /depot/packages/HTTP/IAI92/src/ConnectionPool.jl:630
pkgserver_1 | [4] #getconnection#29 at /depot/packages/HTTP/IAI92/src/ConnectionPool.jl:682 [inlined]
pkgserver_1 | [5] newconnection(::HTTP.ConnectionPool.Pod, ::Type{T} where T, ::SubString{String}, ::SubString{String}, ::Int64, ::Bool, ::Int64; kw::Base.Iterators.Pairs{Symbol,Union{Nothing, Bool},Tuple{Symbol,Symbol,Symbol},NamedTuple{(:iofunction, :reached_redirect_limit, :status_exception),Tuple{Nothing,Bool,Bool}}}) at /depot/packages/HTTP/IAI92/src/ConnectionPool.jl:597
pkgserver_1 | [6] getconnection(::Type{HTTP.ConnectionPool.Transaction{MbedTLS.SSLContext}}, ::SubString{String}, ::SubString{String}; connection_limit::Int64, pipeline_limit::Int64, idle_timeout::Int64, reuse_limit::Int64, require_ssl_verification::Bool, kw::Base.Iterators.Pairs{Symbol,Union{Nothing, Bool},Tuple{Symbol,Symbol,Symbol},NamedTuple{(:iofunction, :reached_redirect_limit, :status_exception),Tuple{Nothing,Bool,Bool}}}) at /depot/packages/HTTP/IAI92/src/ConnectionPool.jl:541
pkgserver_1 | [7] request(::Type{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}, ::HTTP.URIs.URI, ::HTTP.Messages.Request, ::Array{UInt8,1}; proxy::Nothing, socket_type::Type{T} where T, reuse_limit::Int64, kw::Base.Iterators.Pairs{Symbol,Union{Nothing, Bool},Tuple{Symbol,Symbol,Symbol},NamedTuple{(:iofunction, :reached_redirect_limit, :status_exception),Tuple{Nothing,Bool,Bool}}}) at /depot/packages/HTTP/IAI92/src/ConnectionRequest.jl:73
pkgserver_1 | [8] (::Base.var"#56#58"{Base.var"#56#57#59"{ExponentialBackOff,HTTP.RetryRequest.var"#2#3"{Bool,HTTP.Messages.Request},typeof(HTTP.request)}})(::Type{T} where T, ::Vararg{Any,N} where N; kwargs::Base.Iterators.Pairs{Symbol,Union{Nothing, Bool},Tuple{Symbol,Symbol,Symbol},NamedTuple{(:iofunction, :reached_redirect_limit, :status_exception),Tuple{Nothing,Bool,Bool}}}) at ./error.jl:301
pkgserver_1 | [9] #request#1 at /depot/packages/HTTP/IAI92/src/RetryRequest.jl:44 [inlined]
pkgserver_1 | [10] request(::Type{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}; http_version::VersionNumber, target::String, parent::Nothing, iofunction::Nothing, kw::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol,Symbol},NamedTuple{(:reached_redirect_limit, :status_exception),Tuple{Bool,Bool}}}) at /depot/packages/HTTP/IAI92/src/MessageRequest.jl:51
pkgserver_1 | [11] request(::Type{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}; kw::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol,Symbol},NamedTuple{(:reached_redirect_limit, :status_exception),Tuple{Bool,Bool}}}) at /depot/packages/HTTP/IAI92/src/BasicAuthRequest.jl:28
pkgserver_1 | [12] request(::Type{HTTP.RedirectRequest.RedirectLayer{HTTP.BasicAuthRequest.BasicAuthLayer{HTTP.MessageRequest.MessageLayer{HTTP.RetryRequest.RetryLayer{HTTP.ConnectionRequest.ConnectionPoolLayer{HTTP.StreamRequest.StreamLayer{Union{}}}}}}}}, ::String, ::HTTP.URIs.URI, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}; redirect_limit::Int64, forwardheaders::Bool, kw::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:status_exception,),Tuple{Bool}}}) at /depot/packages/HTTP/IAI92/src/RedirectRequest.jl:24
pkgserver_1 | [13] request(::String, ::String, ::Array{Pair{SubString{String},SubString{String}},1}, ::Array{UInt8,1}; headers::Array{Pair{SubString{String},SubString{String}},1}, body::Array{UInt8,1}, query::Nothing, kw::Base.Iterators.Pairs{Symbol,Bool,Tuple{Symbol},NamedTuple{(:status_exception,),Tuple{Bool}}}) at /depot/packages/HTTP/IAI92/src/HTTP.jl:314
pkgserver_1 | [14] #get#12 at /depot/packages/HTTP/IAI92/src/HTTP.jl:391 [inlined]
pkgserver_1 | [15] get_registries(::String) at /app/src/resource.jl:21
pkgserver_1 | [16] update_registries() at /app/src/resource.jl:130
pkgserver_1 | [17] start(; kwargs::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:listen_addr, :storage_root, :storage_servers),Tuple{Sockets.InetAddr{IPv4},String,Array{SubString{String},1}}}}) at /app/src/PkgServer.jl:88
pkgserver_1 | [18] top-level scope at /app/bin/run_server.jl:43
pkgserver_1 | [19] include(::Function, ::Module, ::String) at ./Base.jl:380
pkgserver_1 | [20] include(::Module, ::String) at ./Base.jl:368
pkgserver_1 | [21] exec_options(::Base.JLOptions) at ./client.jl:296
pkgserver_1 | [22] _start() at ./client.jl:506
pkgserver_1 | in expression starting at /app/bin/run_server.jl:43
This might be helpful but I'm not sure, yet, how to get it started
LocalRegistry.jl
Here is a solution that seems to work, based on LocalPackageServer.
Preliminary steps
Install all required packages. You can either put them in your default environment (e.g. #v1.5) or in a dedicated project.
LocalRegistry
LocalPackageServer
In order to use LocalPackageServer, we'll need to set up a local registry, even though we won't really use it (but it can still be handy if you also have to serve local packages).
Something like this should create an empty local registry as local-registry.gitt in the current folder:
# Create an empty (bare) git repository to host the registry
run(`git init --bare local-registry.git`)
# Populate the repository with a new, empty local registry
using LocalRegistry
Base.Filesystem.mktempdir() do dir
create_registry(joinpath(dir, "local-registry"),
abspath("local-registry.git"),
description = "(unused) local registry",
push=true)
end
Step 1a - run the local package server (online)
A script like the following should run a local package server listening on http://localhost:8000.
#################
# run_server.jl #
#################
using LocalPackageServer
config = LocalPackageServer.Config(Dict(
# Server parameters
"host" => "127.0.0.1",
"port" => 8000,
"pkg_server" => "https://pkg.julialang.org",
# This is where persistent data will be stored
# (I use the current directory here; adjust to your constraints)
"local_registry" => abspath("local-registry.git"), # In accordance with the preliminary step above
"cache_dir" => abspath("cache"),
"git_clones_dir" => abspath("data"),
))
# The tricky part: arrange for the server to never update its registries
# when it is offline
if get(ENV, "LOCAL_PKG_SERVER_OFFLINE", "0") == "1"
#info "Running offline => no registry updates"
config.min_time_between_registry_updates = typemax(Int)
end
# Start the server
LocalPackageServer.start(config)
Use this script to run the server online first:
shell$ julia --project run_server.jl
Step 1b - cache some packages (online)
Configure a Julia process to use your local server, and install the packages you want to cache:
# Take care to specify the "http://" protocol
# (otherwise https might be used by the client, and the server won't answer)
shell$ JULIA_PKG_SERVER=http://localhost:8000 julia
julia> using Pkg
julia> pkg"add Example"
[...]
At this point, the server should be caching things.
Step 2 - run the package server (offline)
When you're offline, simply restart the server, ensuring it won't try to update the registries:
shell$ LOCAL_PKG_SERVER_OFFLINE=1 julia --project run_server.jl
As before, set JULIA_PKG_SERVER as needed for Julia clients to use the local server; they should now be able to install packages, provided that the project environments resolve to the exact same dependency versions that were cached.
(You might want to resolve and instantiate your project environments online, and then transfer the relevant manifests to offline systems: this might help guarantee the consistency between the package server cache and what clients ask for.)

Deploy ASP.NET Application to AWS

I have problem while I deploy application to AWS. I have never done this part of job before, and when I deploy from Visual Studio I get error message
2018-07-03 12:23:53,788 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Loading hosted file 'flags/us.png' from local path 'C:\Users\denni\AppData\Local/AWSToolkit/downloadedfiles/flags/us.png'
2018-07-03 12:27:27,008 [53] INFO Amazon.AWSToolkit.MobileAnalytics.AMAServiceCallHandler - Reponse from AMAClient.PutEvents(request) meta data: Amazon.Runtime.ResponseMetadata, response HttpStatusCode: Amazon.Runtime.ResponseMetadata
2018-07-03 12:35:27,857 [1] INFO Amazon.AWSToolkit.VisualStudio.Shared.VSWebProjectInfo - EnvDTEProject.FullName lookup yielded 'C:\Users\denni\Desktop\JerrichoTerrace\JerichoTerrace\JerichoTerrace\JerichoTerrace.csproj'
2018-07-03 12:35:27,905 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Null/empty hosted files location override
2018-07-03 12:35:27,905 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Loading hosted file 'CloudFormationTemplates/TemplatesManifest.xml' from local path 'C:\Users\denni\AppData\Local/AWSToolkit/downloadedfiles/CloudFormationTemplates/TemplatesManifest.xml'
2018-07-03 12:35:28,038 [1] INFO Amazon.AWSToolkit.MobileAnalytics.SimpleMobileAnalytics - Queuing analytics event in local queue with timestamp: 07/03/2018 10:35:28
2018-07-03 12:35:28,038 [1] INFO Amazon.AWSToolkit.MobileAnalytics.SimpleMobileAnalytics - Queuing analytics event in local queue with timestamp: 07/03/2018 10:35:28
2018-07-03 12:35:28,102 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Null/empty hosted files location override
2018-07-03 12:35:28,102 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Loading hosted file 'flags/us.png' from local path 'C:\Users\denni\AppData\Local/AWSToolkit/downloadedfiles/flags/us.png'
2018-07-03 12:35:28,150 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Null/empty hosted files location override
2018-07-03 12:35:28,150 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Loading hosted file 'AccountTypes.xml' from local path 'C:\Users\denni\AppData\Local/AWSToolkit/downloadedfiles/AccountTypes.xml'
2018-07-03 12:35:28,166 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Null/empty hosted files location override
2018-07-03 12:35:28,166 [1] INFO Amazon.AWSToolkit.S3FileFetcher - Loading hosted file 'AccountTypes.xml' from local path 'C:\Users\denni\AppData\Local/AWSToolkit/downloadedfiles/AccountTypes.xml'
2018-07-03 12:35:36,491 [1] INFO Amazon.AWSToolkit.MobileAnalytics.SimpleMobileAnalytics - Queuing analytics event in local queue with timestamp: 07/03/2018 10:35:36
2018-07-03 12:35:40,945 [1] INFO Amazon.AWSToolkit.MobileAnalytics.SimpleMobileAnalytics - Queuing analytics event in local queue with timestamp: 07/03/2018 10:35:40
2018-07-03 12:35:42,709 [37] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - Publishing 'JerichoTerrace' to Amazon Web Services
2018-07-03 12:35:42,741 [57] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ..building configuration 'Debug|Any CPU' for project 'JerichoTerrace'
2018-07-03 12:35:52,725 [1] INFO Amazon.AWSToolkit.VisualStudio.BuildProcessors.WebAppProjectBuildProcessor - IVsUpdateSolutionEvents.UpdateSolution_Done, fSucceeded=1, fModified=1, fCancelCommand=0
2018-07-03 12:35:53,029 [57] INFO Amazon.AWSToolkit.VisualStudio.BuildProcessors.WebAppProjectBuildProcessor - Project build completed successfully, starting package build
2018-07-03 12:35:53,036 [57] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ..creating deployment package obj\Debug\Package\JerichoTerrace.zip...
2018-07-03 12:35:53,077 [50] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ....packaging - executing target(s) "Build;Package"
2018-07-03 12:36:02,400 [21] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ....packaging - Csc: build warning: 'Controllers\EstimatesViewModelsController.cs' at (581,41): This async method lacks 'await' operators and will run synchronously. Consider using the 'await' operator to await non-blocking API calls, or 'await Task.Run(...)' to do CPU-bound work on a background thread.
2018-07-03 12:36:15,260 [8] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ....packaging - CreateProviderList: build error: 'C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v14.0\Web\Deploy\Microsoft.Web.Publishing.MSDeploy.Common.targets' at (55,5): Web deployment task failed. (Cannot connect to the database 'MojaBaza'. Learn more at: http://go.microsoft.com/fwlink/?LinkId=221672#ERROR_CANNOT_CONNECT_TO_DATABASE.)
Cannot connect to the database 'MojaBaza'. Learn more at: http://go.microsoft.com/fwlink/?LinkId=221672#ERROR_CANNOT_CONNECT_TO_DATABASE.
Object of type 'dbFullSql' and path 'data source=DESKTOP-0L8HK6U\SQLEXPRESS;initial catalog=MojaBaza;integrated security=True;user id=awsuser;pooling=False' cannot be created.
Failed to connect to server DESKTOP-0L8HK6U\SQLEXPRESS.
Cannot open database "MojaBaza" requested by the login. The login failed.
Login failed for user 'DESKTOP-0L8HK6U\denni'.
Learn more at: http://go.microsoft.com/fwlink/?LinkId=221672#ERROR_EXECUTING_METHOD.
Failed to connect to server DESKTOP-0L8HK6U\SQLEXPRESS.
Cannot open database "MojaBaza" requested by the login. The login failed.
Login failed for user 'DESKTOP-0L8HK6U\denni'.
2018-07-03 12:36:15,260 [8] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ....packaging - project build completed with errors.
2018-07-03 12:36:15,260 [57] ERROR Amazon.AWSToolkit.VisualStudio.BuildProcessors.WebAppProjectBuildProcessor - Deployment package build failed to complete.
2018-07-03 12:36:15,260 [57] INFO Amazon.AWSToolkit.MobileAnalytics.SimpleMobileAnalytics - Queuing analytics event in local queue with timestamp: 07/03/2018 10:36:15
2018-07-03 12:36:15,260 [37] INFO Amazon.AWSToolkit.VisualStudio.AWSToolkitPackage - ..build of project archive failed, abandoning deployment
2018-07-03 12:37:28,061 [77] INFO Amazon.AWSToolkit.MobileAnalytics.AMAServiceCallHandler - Reponse from AMAClient.PutEvents(request) meta data: Amazon.Runtime.ResponseMetadata, response HttpStatusCode: Amazon.Runtime.ResponseMetadata
When I go to console.aws I found one field Upload and Deploy
So, will this field work If I deploy application in this step, will database will be deploy correctly. Guys, please help me
You need to change the database connection string.
You must use database connection string with userid and password. Remove integrated security=True;
And then you need to also check your database's user permission for your application.

How to determine the file size of a remote download without reading the entire file with R

Is there a reasonably straightforward way to determine the file size of a remote file without downloading the entire file? Stack Overflow answers how to do this with PHP and curl, so I imagine it's possible in R as well. If possible, I believe it would be better to avoid RCurl, since that requires an additional installation for non-Windows users?
On this survey analysis website, I write lots of scripts to automatically download large data files from government agencies (like the us census bureau and the cdc). I am trying to implement an additional component that will not download a file that has already been downloaded, by creating a "download cache" - but I am concerned that this "download cache" might get corrupted if: 1) the host website changes a file or 2) the user cancels a download midway through. Therefore, when deciding whether to download a file from the source HTTP or FTP site, I want to compare the local file size to the remote file size.. And if they are not the same, download the file again.
Nowadays a straight-forward approach might be
response = httr::HEAD(url)
httr::headers(response)[["Content-Length"]]
My original answer was: A more 'by hand' approach is to set the CURLOPT_NOBODY option (see man curl_easy_setopt on Linux, basically inspired by looking at the answers to the linked question) and tell getURL and friends to return the header along with the request
library(RCurl)
url = "http://stackoverflow.com/questions/20921593/how-to-determine-the-file-size-of-a-remote-download-without-reading-the-entire-f"
xx = getURL(url, nobody=1L, header=1L)
strsplit(xx, "\r\n")
## [[1]]
## [1] "HTTP/1.1 200 OK"
## [2] "Cache-Control: public, max-age=60"
## [3] "Content-Length: 60848"
## [4] "Content-Type: text/html; charset=utf-8"
## [5] "Expires: Sat, 04 Jan 2014 14:09:58 GMT"
## [6] "Last-Modified: Sat, 04 Jan 2014 14:08:58 GMT"
## [7] "Vary: *"
## [8] "X-Frame-Options: SAMEORIGIN"
## [9] "Date: Sat, 04 Jan 2014 14:08:57 GMT"
## [10] ""
A peak at url.exists suggests parseHTTPHeader(xx) for parsing HTTP headers. getURL also works with ftp URLs.
url = "ftp://ftp2.census.gov/AHS/AHS_2004/AHS_2004_Metro_PUF_Flat.zip"
getURL(url, nobody=1L, header=1L)
## [1] "Content-Length: 21288307\r\nAccept-ranges: bytes\r\n"
url <- "http://cdn.meclabs.com/training/misc/2013_Marketing_Analytics_BMR-StrongView.pdf"
library(RCurl)
res <- url.exists(url, .header=TRUE)
as.numeric(res['Content-Length'])
# [1] 42413630
## bytes

R connecting to EC2 instance for parallel processing

I am having trouble initialising a connection to an AWS EC2 instance from R as I seem to keep getting the error: Permission denied (publickey) I am currently using a Mac OS X 10.6.8 as my OS
The code that I try to run in the terminal ($) and then R (>) is as follows:
$ R --vanilla
> require(snowfall)
> sfInit(parallel=TRUE,socketHosts =list("ec2-xx-xxx-xx-xx.zone.compute.amazonaws.com"))
Permission denied (publickey)
but weirdly when trying to ssh into the instance I don't need a password as I had already imported the public key into the instance upon initialization, (I think)
so from my normal terminal...when running
$ ssh ubuntu#ec2-xx-xxx-xx-xx.zone.compute.amazonaws.com
it automatically connects...(so im not 100% sure if its a passwordless issue like in Using snow (and snowfall) with AWS for parallel processing in R)
I have tried looking through a fair amount of the material on keys etc, but none of it seems to be making much of a difference. Also my ~/.ssh/authorized_keys is a folder rather than a file for some reason and I can't access it even when trying sudo cd .ssh/authorized_keys... in terms of permissions it has drw-------
The end goal is to connect to a lot of ec2 instances and use foreach to carry out some parallel processing...but connecting to one for now would be nice...also I would like to use my own ami so the starcluster isn't really what I am looking for....(unless I am able to use private amis and run all commands privately...)
also if doRedis is better than if someone could show me how one would connect to the ec2 instance from a local machine that would be good too...
EDIT
I have managed to deal with the ssh password-less login using the parallel package makePSOCKcluster as shown in R and makePSOCKcluter EC2 socketConnection ...but now coming across socketConnection issues as shown in the question in the link...
Any ideas how to connect to it?
Also proof that everything is working, I guess would mean that the following command/function would work to get in all the different ip addresses
d <- parLapply(cl1, 1:length(cl1),function(x)system("ifconfig",intern=T)[2])
where cl1 is the output of the make*cluster function
NOTE since the bounty is really for the question in the link....I don't mind which question you post up an answer to...but the so long as something is written on this question that links it to the correct answer on the linked question, then I will award the points accordingly...
I had quite a few issues with parallel EC2 setup too when trying to keep the master node local. Using StarCluster to setup the pool helped greatly, but the real improvement came with using StarCluster and having the master node within the EC2 private ip pool.
StarCluster sets up all of the key handling for all the nodes as well as any mounts used. Dynamic node allocation wasn't doable, but unless spot instances are going to be used long term and your bidding strategy doesn't 'keep' your instances then dynamic allocation should be an issue.
Some other lessons learned:
Create a variable containing the private IPs to pass to createCluster and export it, so when you have need to restart with the same nodes it is easier.
Have the master node run byobu and set it up for R session logging.
Running RStudio server on the master can be very helpful at times, but should be a different AMI than the slave nodes. :)
Have the control script offload data rda files to a path that is remotely monitored for new files and automatically download them.
Use htop to monitor the slaves so you can easily see the instances and determine script requirements (memory/cpu/scalability).
Make use of processor hyper-threading enable/disable scripts.
I had quite a bit of an issue with the slave connections and serialize/unserialize and found that one of the things was the connection limit, and that the connection limit needed to be reduced by the number of nodes; and when the control script was stopped the easiest method of cleanup was restarting the master R session, and using a script to kill the slave processes instead of waiting for timeout.
It did take a bit of work to setup, but hopefully these thoughts help...
Although it was 8 months ago and both StarCluster and R have changed here's some of how it was setup... You'll find 90% of this in the StarCluster docs.
Setup .starcluster/config AWS and key-pair sections based on the seurity info from AWS console.
Define the [smallcluster]
key-name
availability-zone
Define a cluster template extending [smallcluster]. Using AMIs based on the StarCluster 64bit HVM AMI. Instead of creating new public AMI instances I just saved a configured instance (with all the tools I needed) and used that as the AMI.
Here's an example of one...
[cluster Rnodes2]
EXTENDS=smallcluster
MASTER_INSTANCE_TYPE = cc1.4xlarge
MASTER_IMAGE_ID= ami-7621f91f
NODE_INSTANCE_TYPE = cc2.8xlarge
NODE_IMAGE_ID= ami-7621f91f
CLUSTER_SIZE= 8
VOLUMES= rdata
PLUGINS= pkginstaller
SPOT_BID= 1.00
Setup the shared volume, this is where the screen/byoubu logs, the main .R script checkpoint output, shared R data, and the source for the production package live. It was monitored for new files in a child path called export so if the cluster or control script died/abended a max number of records would all that would be lost and need to be re-calculated.
After creating the shared volume, the definition was simply:
[volume rdata]
VOLUME_ID = vol-1145497c
MOUNT_PATH = /rdata
The package installer which ensured the latest (and equal) R versions on all nodes.
[plugin pkginstaller]
setup_class = starcluster.plugins.pkginstaller.PackageInstaller
packages = r-base, r-base-dev, r-recommended
Lastly, access permissions for both ssh and RStudio server. Https via proxy would be safer, but since RStudio was only used for the control script setup...
[permission ssh]
# protocol can be: tcp, udp, or icmp
protocol = tcp
from_port = 22
to_port = 22
# [permission http]
protocol = tcp
from_port = 8787
to_port = 8787
Then startup a cluster using the StarCluster interface. It handles all of the access controls, system names, shares, etc... Once the cluster was running I ran an ssh session into each from my local system, and ran a script to stop hyper-threading:
#!/bin/sh
# disable hyperthreading
for cpunum in $(
cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list |
cut -s -d, -f2- | tr ',' '\n' | sort -un); do
echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
done
then started an htop session on each for monitoring scalability against the exported checkpoint logs.
Then, logged into the master, started a screen session (I've since preferred byobu) and fired up R from within the StarCluster mounted volume. That way when the cluster stopped for some reason I could easily setup again just by starting R. Once in R the first thing was to create a workers.list variable using the nodeXXX names, which was simply something along the lines of:
cluster.nodes <- c("localhost", paste("node00", 1:7, sep='' ) )
workers.list <- rep( cluster.nodes, 8 )
Then I loaded up the control script, quit and saved the workspace. The control script handled all of the table output for exporting and checkpoints and par wrapped calls to the production package. The main function of the script also took a cpus argument which is where the workers list was placed, which was then passed as cores to the cluster initializer.
initialize.cluster <- function( cores )
{
if( exists( 'cl' ) ) stopCluster( cl )
print("Creating Cluster")
cl <- makePSOCKcluster( cores )
print("Cluster created.")
assign( 'cl', cl, envir=.GlobalEnv )
print( cl )
# All workers need to have the bounds generator functions...
clusterEvalQ( cl, require('scoreTarget') )
# All workers need to have the production script and package.
clusterExport( cl, varlist=list('RScoreTarget', 'scoreTarget'))
return ( cl )
}
Once the R session was restarted (after initially creating the worker.list) the control script was sourced, and the main func called. That was it. With this setup, if the cluster ever stopped, I'd just quit the rsession on the main host; stop the slave processes via htop on each of the slaves and startup again.
Here's an example of it in action::
R
R version 2.15.0 (2012-03-30)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
[Previously saved workspace restored]
> source('/rdata/buildSatisfactionRangeTable.R')
Loading required package: data.table
data.table 1.7.7 For help type: help("data.table")
Loading required package: parallel
Loading required package: scoreTarget
Loading required package: Rcpp
> ls()
[1] "build.satisfaction.range.table" "initialize.cluster"
[3] "initialize.table" "parallel.choices.threshold"
[5] "rolled.lower" "rolled.upper"
[7] "RScoreTarget" "satisfaction.range.table"
[9] "satisfaction.search.targets" "search.range.bound.offsets"
[11] "search.range.bounds" "search.range.center"
[13] "Search.Satisfaction.Range" "update.bound.offset"
[15] "workers.list"
> workers.list
[1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
[7] "localhost" "localhost" "node001" "node002" "node003" "node004"
[13] "node005" "node006" "node007" "node001" "node002" "node003"
[19] "node004" "node005" "node006" "node007" "node001" "node002"
[25] "node003" "node004" "node005" "node006" "node007" "node001"
[31] "node002" "node003" "node004" "node005" "node006" "node007"
[37] "node001" "node002" "node003" "node004" "node005" "node006"
[43] "node007" "node001" "node002" "node003" "node004" "node005"
[49] "node006" "node007" "node001" "node002" "node003" "node004"
[55] "node005" "node006" "node007" "node001" "node002" "node003"
[61] "node004" "node005" "node006" "node007" "node001" "node002"
[67] "node003" "node004" "node005" "node006" "node007" "node001"
[73] "node002" "node003" "node004" "node005" "node006" "node007"
[79] "node001" "node002" "node003" "node004" "node005" "node006"
[85] "node007" "node001" "node002" "node003" "node004" "node005"
[91] "node006" "node007" "node001" "node002" "node003" "node004"
[97] "node005" "node006" "node007" "node001" "node002" "node003"
[103] "node004" "node005" "node006" "node007" "node001" "node002"
[109] "node003" "node004" "node005" "node006" "node007" "node001"
[115] "node002" "node003" "node004" "node005" "node006" "node007"
> build.satisfaction.range.table(500000, FALSE, workers.list )
[1] "Creating Cluster"
[1] "Cluster created."
socket cluster with 120 nodes on hosts ‘localhost’, ‘node001’, ‘node002’, ‘node003’, ‘node004’, ‘node005’, ‘node006’, ‘node007’
Parallel threshold set to: 11000
Starting at: 2 running to: 5e+05 :: Sat Apr 14 22:21:05 2012
If you have read down to here then you may be interested to know that I tested each cluster setup I could (including openMPI) and found that there wasn't a speed difference, perhaps that is because my calculations where so CPU bound, perhaps not.
Also, don't give up even though it can be a pain to get going with HPC. It can be totally worth it. I would still be waiting to complete the first 100,000 iterations of the calculations I was running had I stuck with a naive implementation in base-R on a commodity workstation (well, not really as I would never have stuck with R :D ). With the cluster, 384,000 iterations completed in under a week. Totally worth the time (and it took a lot of it) to setup.

Sun Solaris 8 - Couldn't set locale correctly

Every time I enter into a user account or to the root account I receive the message "couldn't set locale correctly".
Entering local it displays:
enter code here LANG=
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES=C
LC_ALL=
How I can solve this out....?
Have a look at /etc/TIMEZONE , which should be linked to /etc/default/init, you should have something like:
TZ=US/Pacific
LC_COLLATE=en_US
LC_CTYPE=en_US
LC_MESSAGES=C
LC_MONETARY=en_US
LC_NUMERIC=en_US
LC_TIME=en_US
If you don't, add those values and test it out - reboot.

Resources