Cloud computing with FloydHub using PowerShell and Python in Visual Studio

Cloud computing with FloydHub using PowerShell and Python in Visual Studio

Cloudcomputing can be a bit confusing: there are a lot of solutions and it is sometimes difficult to know how and where to start. In this tutorial we will use Visual Studio to create a small Python file with some input and output, and then use PowerShell to connect to FloydHub where we will execute it on a CPU in the cloud.

Downloading the necessary software and creating a FloydHub account

If you don’t have Visual Studio you can download a free version here:

https://visualstudio.microsoft.com/vs/express/

This will install a Python editor and PowerShell.

Then, go to www.floydhub.com, create an account and follow the instructions. You will receive a username and a password to login to the floydhub.com pages. You will later also need it to connect to the FloydHub server in PowerShell. From hereon whenever you have to type your username it will be referenced as {username}. For this tutorial a free account is OK. If you want to use a GPU later, you can still upgrade.

Creating a simple Python program

We will create a HelloWorld application that reads input from a file (HelloWorld.txt) in the input folder and writes it to an output file with the same name in the output folder.

First we will create a Python program using the Python editor in Visual Studio. In Visual Studio click on “File>New>Project” and select “Python Application” to create a Python Application. Call your solution HelloWorld and click “OK”.

The editor should open and you can now start programming your Python program.

If we would write a program that reads a file and writes the contents to another file on your local computer, it would look something like this:

with  open('c:\input\HelloWorld.txt', 'r') as input:   
    data=input.read()
with  open('c:\output\HelloWorld.txt', 'w') as output:  
    output.write(data)
    output.close()

In this example the input folder is “c:\input”, a local folder and the output folder is “c:\output”, also a local folder. However, we will run the program in the cloud on the FloydHub servers. This means we will have to put our input in a specific folder and we will only have rights to write in one specific outputfolder.

Let’s call the input folder we are going to use on Floyd “/my_data” for now.

The output will be written on the FloydHub server and we will be able to download it later. In our code we always have to refer to the output folder as “/output”. FloydHub only allows writing to this output folder and you will not have rights to write in any other folder than this one. Off course you can also write to subfolders of this “/output” folder.

So, our final Python file we will later run in the cloud is as follows:

with  open('/my_data/HelloWorld.txt', 'r') as input:   
    data=input.read()
with  open('/output/HelloWorld.txt', 'w') as output:  
    output.write(data)
    output.close()

Save this file Visual Studio as “HelloWorld.py”

We will need the whole path of the “HelloWorld.py” file later. If you don’t know, you can right-click on your solution name “HelloWorld” and click “Open folder in file explorer”. You should now see the location of your local Python (*.py) file. (It should be something like this: “C:\TF\Tools\HelloWorld\HelloWorld.py”).

To recap, the program, when running on FloydHub, will read our input file in a folder we refered to as “my_data” and write the contents of our input file in the output folder “output”. In the next steps we will create and upload a “HelloWorld.txt” file to the FloydHub servers.

Creating the input file HelloWorld.txt

Create a text file and call it “HelloWorld.txt” (with Notepad for example). Create a “C:\input” folder and put the txt file in it. Open the txt file and type something in the text file and save it again. This is the file we are going to use as our input. We will now upload it to the FloydHub servers using PowerShell. You can access PowerShell in Visual Studio from the Python Environments tab. Clicking on “Open Powershell” will open a PowerShell instance.

To access our input file in the cloud, we will first have to upload it to FloydHub. To upload, first we have to make a connection to FloydHub, create a dataset folder and then upload the data from your local hard drive to the FloydHub dataset you just created.

Connecting to FloydHub

To connect to FloydHub type floyd login in the PowerShell editor. You will be presented with your username, just press enter here, and then it will ask for the password. You can now type in your password followed by enter (you won’t see a cursor, just type it and press enter). You should now be logged in.

Creating a dataset and uploading input data

We will now create a dataset called “helloworlddataset” on FloydHub and upload the “C:\ input \HelloWorld.txt” into it. Datasets are used to store only input information on the FloydHub servers. First in the PowerShell window, move to the directory containing the HelloWorld.txt : “c:\input”

You can use “cd” to change directory and “dir” to show the contents of a directory. If your directory names are long, you can type the first letter after “cd” and then press tab to autocomplete your folder names.

Once you are in the correct folder, type “dir” to check if you can see the HelloWorld.txt in the listing.

Now type floyd data init helloworlddataset in the PowerShell editor. This should open a “Create a new dataset” webpage in your Internet browser. Click “Create dataset” and go back to the PowerShell editor and press enter where it says “Press ENTER to use the dataset name “helloworlddataset” or enter a different name:”.

The "Private" setting is only available if you have a paid FloydHub subscription.

What this does it links your local “c:\input” folder with your FloydHub dataset folder “{username}/dataset/helloworlddataset”.

Now type floyd data upload. This will upload the contents from your currently active folder in PowerShell “C:\input” to your dataset “helloworlddataset” on FloydHub. Every time you upload data from your computer to the same dataset, a new subfolder will be created, this means you can have different versions for the same dataset. The first time you upload your FloydHub dataset will create the subfolder “1”, the second time “2” and so on. Since this is our first upload, our “HelloWorld.txt” is now on the FloydHub server in the “{username}/dataset/helloworlddataset/1” folder on your account (see the red arrow in the screenshot below).

To double check, you can login to your FloydHub web account, click on “Datasets” and you should see that the “helloworlddataset” is created. If you open it, you should see the “HelloWorld.txt” and its foldername on the server, in my case: “pietvt/datasets/helloworlddataset/1”.

Running the Python file on the FloydHub server

This is the final step, we will upload our Python file to the FloydHub server and ask it to be executed. First we use the PowerShell editor to move to the location of our Python file we just created. Again use “cd” to change directory and “dir” to check the folder contents. You should see the “HelloWorld.py” file in the listing.

Just like we upload data to “Datasets” in FloydHub, we will upload and execute our code to FloydHub “Projects”. On your FloydHub webpage you will find the Projects next to the Datasets:

To recap: we have “Projects” that run code and generate output, and we have “Datasets” that we can use as input. Both datasets and project code need to be uploaded in a similar way.

Back to PowerShell, we will now create a new project “helloworldproject” on FloydHub by typing floyd init {username}/helloworldproject. Don’t forget to replace {username} with your own username, so in my case:

Similar to uploading the dataset, as soon as you enter the command a webpag will popup where you have to click “Create project”. Then in the PowerShell editor you will get the question “Press ENTER to use project name “{username}/helloworldprojet” or enter a different name:”, just press enter again and the project will be created.

You can double check the creation of this project on your FloydHub page by clicking on Projects. You should now see that it created a project, with 0 jobs. Every time we send a job to this project a new subfolder with job information will be created, similar to the input data. So we will get “{username}/projects/helloworldproject/1” then “{username}/projects/helloworldproject/2” and so on…

To recap, we now have

·        an input dataset: “{username}/projects/helloworldproject/1”

·        a project folder: “{username}/datasets/helloworlddataset/1”

In my case:

·        an input dataset: “pietvt/projects/helloworldproject/1”

·        a project folder: “pietvt/datasets/helloworlddataset/1”

We will now run our HelloWorld.py file and link it to the above input and project folder by this command:

floyd run --data {username}/datasets/helloworlddataset/1:my_data --cpu "python HelloWorld.py"

In my case:

floyd run --data pietvt/datasets/helloworlddataset/1:my_data --cpu "python HelloWorld.py"

This is where you link the “my_data” folder we used in our Python code to the dataset on FloydHub. So alternatively you could have used another name in your Python code as long as you refer to it here after the dataset.

Make sure you are in the folder that contains the Python file otherwise it won’t work.Everytime you repeat the command, you will send a batch to the “helloworldproject” project. So if you made some mistakes you can try as many times as you want. You can check on your FloydHub page if the batch succeeded or not:

If a job failed, you can check the output by clicking on the project job, then on “overview” and scroll down to see more information about what went wrong.

If everything went as planned you can find the output file in the "Output" tab once the project succeeds. If you can't see the "Output" tab you can browse to the url of the project and add "/output" to the url, so it will look something like:

"https://www.floydhub.com/pietvt/projects/helloworldproject/6/output"

This is the case when the output is smaller than 10KB.

Summary

That’s it, 5 commands to run a program in the cloud:

To login:

floyd login

To create a dataset and upload your data from the folder in PowerShell containing your input files:

floyd data init {datasetname}

floyd data upload

To create a project and upload your code and execute it from the folder in PowerShell containing your Python file:

floyd init {username}/{projectname}

floyd run --data {username}/datasets/{datasetname}/{version}:{input folder name used in your code} --cpu "python {python file}” 

Advanced: Using tensorflow and a GPU

If you want to start with deep learning and artificial neural networks, you can download the AI tools for Visual Studio here: https://visualstudio.microsoft.com/downloads/ai-tools-vs/

This will also install tensorflow: an open source software library for high performance numerical computation, created by Google.

To run tensorflow code on the Floyd servers you will additionaly have to specify the environment you want to use in the floyd run command, like this:

floyd run --data {username}/datasets/helloworlddataset/1:my_data --cpu --env tensorflow-1.3 "python HelloWorld.py”

And, if you want to use a GPU (only for paying FloydHub users) you will need to specify it as follows:

floyd run --data {username}/datasets/helloworlddataset/1:my_data --gpu --env tensorflow-1.3 "python HelloWorld.py”

 If you want to use other dependencies like “sklearn” or “matplotlib” you have to include a “floyd_requirements.txt” in the same folder as your py file. The content of this file is just a text file with the names of the dependencies, one per each line.

.


Nice. Works also on Mac. I used Python version 3.7 and the Python Visual Studio Code extension. For the link with FloydHub I used Terminal instead of PowerShell.

To view or add a comment, sign in

Others also viewed

Explore content categories