how to mass change folder names in amazon s3 with a script - unix

I uploaded a bunch of images but accidentally named the folders with spaces. Now you can't access them cause obviously urls cant have spaces.
I've downloaded the aws cli and was wondering how to change folder names? I've looked at the documentation but I'm still having trouble and hoping someone can help.
I've tried the below command without any success:
aws s3 mv "s3://mybucketname/firstfolder/second folder with spaces/" s3://mybucketname/firstfolder/secondfolderwithspaces/ --recursive
How do I change the name of "second folder with spaces" to "secondfolderwithspaces"?
Also, is there a way I can iterate through these folders? Something like
for folder in s3:/bucketname/firstfolder:
aws s3 mv "folder with spaces" folderwithspaces --recursive

I'd do it via a python script using the boto SDK:
import boto
conn = boto.s3.connect_to_region('ap-southeast-2')
bucket = conn.get_bucket('bucket-name')
for k in bucket.list():
if ' ' in k.key:
bucket.copy_key(k.key.replace(' ', '+'), bucket.name, k.key)
bucket.delete_key(k.key)
The script loops through each object, copies it to a new key (which is like a filename, but it includes the full path including the directory name), then deletes the old object. It fully executes within Amazon S3 -- the contents of the objects are not downloaded/uploaded.
Modify the replace command to suit your needs.

URLs can have spaces. You have to encode them. Space character becomes "%20".
If you have chrome or firefox, open the developper tools console, and type
encodeURI("second folder with spaces")
It prints
second%20folder%20with%20spaces
For the mass rename, it can't be done in the same way as you traditionally would on an OS (Linux/Windows/Mac). On S3, you cannot rename files, you must copy them. So you have to download their content, delete them, and new files.
Amazon S3 boto: How do you rename a file in a bucket?

Related

Different local and remote organisation R Project and GitHub

I want to version control my R scripts so I've created an R project and a GitHub repo. My scripts are scattered through several directories within the same directory where the R project is.
I would like that my GitHub repository harbors only the scripts, independently of the folders they are locally stored in. However when I run the below command:
git add folder/file.R
git commit -m "my_message"
git push -u origin master
A directory named folder is created containing file.R but I'd like to just see file.R without the folder. Do you know how can I do this? Also, would it be good practice? My local folders are organized so each directory contains its own scripts and results, that's the reason the scripts are separated.
Thank you very much
is there a way to add the file.R without specifying the path?
Not using git add, no. The design constraint for git add is that it should store the file's name exactly as it appears, including the forward slashes, so if the file's name is folder/file.R, that's the file's name.
You have some options here though:
You can make a parallel directory where you put the files with the names you want them to have. Run git init in that directory, copy the folder/file.R file to file.R in that directory. Then cd ../gitdir or whatever is appropriate to get there, and git add file.R.
This method is probably the best because it's the simplest.
You can write your own programs using git hash-file -w and git update-index, which are two of Git's plumbing commands. A plumbing command, in Git, is basically a command that exists so that you can build user-facing commands: they're not meant to be run by humans but rather by other programs. So you write a program (in whatever language you like) that uses these plumbing programs to achieve whatever you want.
In particular, you can create or find a Git blob object holding the contents of file.R as read from anywhere you like, then use git update-index to create an index entry holding whatever path you like and referring to the blob object you created (or found) with git hash-object with the -w flag.
Since Git is a suite of tools, not a solution, you can come up with your own method. The tools in Git are made with particular approaches in mind, but they are flexible enough to be repurposed.

WinSCP script to synchronize directories, but exclude several subdirectories

I need to write a script that synchronizes local files with a remote machine.
My file structure is:
ProjectFolder/
.git/
input/
output/
classes/
main.py
readme.md
I need to synchronize everything, but:
completely ignore .git folder
ignore files in input and output folders, but copy the folder
So far my code is:
open sftp://me:password#server -hostkey="XXXXXXXX"
option batch abort
option confirm off
synchronize remote "C:\Users\MYNAME\Documents\MY FOLDER\Python Projects\ProjectFolder" "/home/MYNAME/py_proj/ProjectFolder" -filemask="|C:\Users\MYNAME\Documents\MY FOLDER\Python Projects\ProjectFolder\.git"
close
exit
First question: it doesn't seems to work.
Second question, how to add mask for input and output folder if I have spaces in file paths?
Thanks to all in advance.
Masks for directories have to end with a slash.
To exclude files in a specific folder, use something like */folder/*
-filemask="|.git\;*/input/*;*/output/*"

Copy .rmd file included in a Rstudio addin package to a user defined directory

I have a rstudio addin package located here.
One of the addins allows the user to define a directory and it will copy a file that is located in the package to that directory.
the file is located:
atProjectManageAddins/inst/Docs/RMarkdownSkeleton.Rmd
And I am trying to copy it to the user defined directory with something like this:
file.copy("inst/Docs/RMarkdownSkeleton.Rmd",
paste0(Dir, FolderName, "/Reports/", FolderName, "_report.Rmd"))
Where I am trying to copy it from where it is in the package, to where the user defines it to be (Based on two separate arguments Dir and FolderName).
But this doesn't seem to work. My assumption is that I am not referring to the package directory in the correct way. I've tried ./Inst/, ~/Inst/ and maybe a couple more. My assumption now is that there is a more systematic reason for my inability to get file.copy() to work.
Any suggestions? Is this even possible?
Note that if I run the function locally via source() and runGadget(), it works just fine. Only when the package is installed and I use the RStudio addins GUI where it references the intalled package, does it fail. Thus, I'm quite certain I am not correctly defining the file path for the installed .Rmd files.
Edit: I've changed to the following, based on Carl's suggestion (as can be seen on github), but the files are still not being copied over.
file.copy(system.file("Docs","Rmarkdownskeleton.rmd",package="atProjectManageAd‌​dins"),
paste0(Dir, FolderName, "/Reports/", FolderName, "_report.Rmd"))
system.file is the best function for getting a file from a package. I believe this should work for you:
file.copy(system.file("Docs","Rmarkdownskeleton.rmd",package="atProjectManageAd‌​dins"),
paste0(Dir, FolderName, "/Reports/", FolderName, "_report.Rmd"))
You got the right idea putting the files in inst/.
Use this code to copy the file from the package dir to the current dir :
file.copy(from = file.path(path.package("packagename"), "path/to/file"),
to = file.path("path/to/file"), overwrite = T)
file.path creates a path by concatenating the strings passed to it (OS-specific separators are automatically added).
path.package retrieves the path of a loaded package. Files presents in inst/ are copied at the root of the package dir when installed, thus "path/to/file" here should be the path relative to your inst/ dir.
overwrite can be used to overwrite the file if it already exists.
In your specific case, this should do the trick :
file.copy(file.path(path.package("atProjectManageAddins"), "Docs/RMarkdownSkeleton.Rmd",
file.path(getwd(), "Reports", paste0(reportName, "_report.Rmd")))

Atom How can I get files under the project

I'm using Atom 1.0.5. I'm trying to create atom package.
I would like to get files name under the workspace.
Is it possible?
dirs = atom.project.getDirectories()
for dir in dirs
for entry in dir.getEntriesSync()
console.log entry.getPath()
You should be able to use the following two APIs:
Project.getDirectories: Get an Array of Directorys associated with this project.
Directory.getEntries: Reads file entries in this directory from disk asynchronously.
So you would use Project.getDirectories first to get a list of directories, and then iterate through them using Directory.getEntries. Each entry can either be a file or a directory (again).

Unix File naming conventions

I am in process of doing a remote to local copy using rsyncand the file list is picked up from a txt file which looks like below
#FILE_PATH FILENAME
/a/b/c test1.txt
/a/x/y test2.txt
/a/v/w test1.txt
The FILE_PATH is the same for remote and local servers. The Problem is, I need to copy the files to a Staging area in the local first and then need to move it to the FILE_PATH, so as to make sure Integrity.
If I simply copy all the files to the Staging area, test1.txt will get overridden. So I thought I can go with clubbing the FILE_PATH and FILENAME, thus it gets unique. To do so, I can not create the file as /a/b/c/test1.txt in my staging area.
So I thought to replace / with special chars that support Unix.
Tried with - _ : ., I got conflicts with all this.
-a-b-c-test1.txt
How I can achieve copying files to the same Staging directory though the file names are of same but supposed to reach different directory
your thoughts pls.

Resources