Hello, internet. Welcome to Data Science Dojo. My name is Phuc Duong and I’m going to bring you another video tutorial series. This time we’re going to expand your data science toolkit by teaching you how to data-mine using Azure Machine Learning Studio. r_break r_break This will be a multi-part video series, and who is this video series intended for? Well, it’s anyone who wants to learn Azure Machine Learning Studio and, more importantly, Data Science Dojo hosts a five-day Bootcamp on data science and data engineering. It is an in-person Bootcamp. It lasts 50 hours, for an entire week. We go 8 to 5 every day, and this is one of the five modules that students have to learn before showing up on day 1. We teach a class mainly in R and in Azure ML Studio. Students are expected to be very comfortable before even showing up on day 1, so we can tackle more data science problems during the course. r_break r_break Now, if you are not attending the course, don’t worry, because you will still learn everything you need to know about Azure Machine Learning Studio. r_break r_break By the time this series is over, you should be very comfortable with Azure ML. You should be able to import and export data. You should be able to explore an unknown data set. You should be able to manipulate and transform data, mold data, preprocess data, and clean data all within Azure ML. You should be able to build and predict models in Azure ML. You should be able to expose those predictive models as a web service and then consume those APIs, and then you should also be up to code within Azure ML itself. r_break r_break This series assumes that you already have an introduction to data mining. If you do not have that introduction, go ahead and watch the video series that I’ve linked inside a description box, and I’ll get you up to speed on what data mining is. r_break r_break A quick introduction about myself: I’ve been teaching data science and data engineering for about three years. I was the lead author of an 85-page lab manual on how to data-mine using Azure Machine Learning Studio. I wrote three other books on various topics in data engineering and data science, none of which are available to the public. You have to sign up for our five-day Bootcamp to receive one of these manuals. And then I created an 11-part Azure ML tutorial tourists series on YouTube three years ago when Azure ML was still in beta. And a lot has changed, which is why we’re going to redo it now. r_break r_break And what are we going to cover in this video? Well, we’re going to teach you what Azure Machine Learning Studio is, what it means to be in the cloud, and what are its benefits. There are subscriptions that you need to get and the pricing of Azure ML. r_break r_break What is Azure Machine Learning? Well, it’s also called Azure ML for short. It is a data mining and data science and machine learning tool in the cloud. But it’s different than most data science tools because there’s no coding involved. r_break r_break Traditionally, data science tools involve R or Python or Matlab or SAS, which is all coding based. You had a terminal, and you had a command line, and you had to learn syntax. Not this one. This one is a drag and drop approach to machine learning. It has a visual interface, and it feels more like Visio or PowerPoint than it does any other traditional data science tool. And I think this is the best tool for learning machine learning and data mining because you don’t have to juggle both the syntax of learning how to program at the same time and juggle how to data-mine - the data mining theory, data mining frameworks, and also the theories of machine learning in general. r_break r_break But this is not just a tool for beginners. Advanced users will love this as well, because it does a seamless integration with SQL, R, or Python, and you can mix and match. So all of a sudden you can be using SQL, drop SQL, use a module, and then all of a sudden switch and switch again and then go into R. I do that all the time. Also, you can deploy these models automatically. Meaning, basically, they pack up your model, throw it into a cloud somewhere, and then you can contact those models via rest APIs, and then you can connect those APIs using auto-generated, code using C sharp, R, or Python. r_break r_break As your machine learning is a cloud-based machine learning tool that only exists in the cloud, specifically it is on the Azure Stack, so the Azure cloud platform. And Azure is one of the services within the Azure ecosystem itself. r_break r_break Azure is the cloud platform brought to you by Microsoft. So that makes Azure ML, by extension, a Microsoft technology. r_break r_break Azure is also comparable to other cloud services, like Amazon Web Services and Google Cloud Services, all of which are infrastructure as a service. Meaning, they are a platform that can host robust IT services that can build entire software platforms, like Netflix or Snapchat. These cloud services are all software as a service, where you use these services, and they charge you by usage, like an electric bill or a subscription service, like a cable bill. r_break r_break So, because Azure Machine Learning Studio is a cloud-based tool, it brings with it the strengths and weaknesses of cloud computing itself. r_break r_break Let’s go over the pros and cons of cloud computing so that you know what you’re getting yourself into. r_break r_break The first thing is, I’ve noticed that none of the cloud services will almost ever ask you how much data you need to store. They would just store and not ask any questions about that, and the reason for that is storage space, especially on the cloud, has been plummeting pretty consistently every year. It’s gotten to the point where it’s only going to cost you about $0.02 to store anything online per gigabyte per month. So, it doesn’t matter if you’re on AWS or on Azure, they will both charge you about $0.02 per gigabyte per month to store something, which is really cool because it basically removes data size as an equation when you have to deal with your capacity with your machine learning tools and hardware. r_break r_break The next thing is machine learning does not exist in a vacuum. Machine learning is extremely dependent on large data sets. A large data set, like big data, is dependent on IT infrastructure. In order to have an IT infrastructure big enough to support big data, you have to have either some kind of robust service like a data warehouse, or you can rent that stuff from AWS or Azure or Google Cloud - one of the cloud services. So specifically in Azure, you have all of that data infrastructure to back up your machine learning. r_break r_break So, you have, for example, Azure SQL databases as your database. You have Spark and Apache Hadoop in the form of HDInsight. You have Blob storage. You have Data Lake storage. You have stream analytics in the form of ETL. You have Azure Data Factory in the form of orchestration of all your data pipelines. And more importantly, because it’s a Microsoft technology, it integrates with Excel, which is one of the most commonly used BI tools in existence. r_break r_break The next thing is because you’re in the cloud, you’re freed from the hardware. Right? r_break None of your guys have to have a beeper, and you don’t have a data warehouse where you have to keep upgrading hardware. You don’t have to worry about that. All you have to worry about is really the data and how much it’s charging you on a monthly basis. r_break r_break The next thing I find that’s really cool is that it runs on someone else’s machine. That doesn’t sound like it means much, but the idea is you don’t need a very powerful computer anymore. You don’t need a workstation desktop anymore. What this means is you can get an iPad and use any of these cloud-based tools, hit the Run button or the Execution button, and then close the tab. If the device can open up a browser, like Chrome, the idea is it can run the cloud service. Because it’s in the cloud, it’s also collaborative, which means you can invite other people to your cloud spaces, work together, and share the same cloud space. r_break r_break Another thing: it’s scalable. You’re harnessing the entire power that is the cloud. The cloud is very good at distributing workloads among multiple nodes, multiple surveys, calling upon extra help when it needs more processing power or more storage. r_break r_break Let’s go over the cons of cloud computing. r_break By using our cloud-based tool, you’re committing to an unwavering internet connection. You can never lose internet, or you lose basically access to do your job. r_break r_break The next con is the biggest con of them all, which is compliance. Can you even be in the cloud, is what that thing is trying to ask. Does your industry, does the government that oversees your industry, does your company, do all those policies comply is such that your data can be in the cloud? This is all about data governance. So, before you even use this tool and start loading data into it that is work-related, you really need to ask someone at your company, is the cloud available, and specifically is Azure an allowed technology at this company? r_break r_break I’ll show you have to make an Azure Machine Learning Studio workspace in the next video, but for now, let me explain how it all works. r_break r_break There are two main ways to get Azure Machine Learning Studio. The first is the free trial method, and the second is the full workspace method. To get a free trial for Azure ML workspace, what you do is simply go to the Azure ML website, sign in with your email, and you’ll be given a limited-access workspace. If you want a fully working workspace, then you’ll need an actual Azure subscription. Then, once you’re inside of that subscription, go ahead, and you’ll have to create an Azure ML workspace within that subscription. So this is the full workspace. r_break r_break Now, if you’ve never used Azure before, then Azure will give you a free trial subscription to start off. You’ll get $200 credit on your Azure subscription or 30 days, whichever limit is reached first. Now, if you’ve used up your Azure free trial subscription because you’ve used Azure before, then you’ll simply need to go the pay-as-you-go route, where you’ll add your credit card to an autopay system, and it will charge you for the fees that you’ve incurred based upon just cloud usage for the month. And for this web series, I’m going to go ahead and assume that you have a full workspace, and I’ll show you how to create that full workspace in the next video. r_break r_break All right, let’s talk about cost. So there are really two parts to the pricing of Azure Machine Learning Studio. The first is a monthly subscription and the second is for usage. So, first is the subscription. You’re charged $9.99 per month per seat. For each workspace, you’ll be charged about $10 a month. The second is your charge for usage. There’re three parts to the usage. The first one is runtime. You're charged $1 per one hour of experiment run time, so every time you hit the Run button on your experiment, you’re going to see a timer tick on the top right-hand corner of your screen. That is what you’re being charged for. The second part is that you’re also charged for deployment calls. Any time anyone calls your deployed web services, basically your rest APIs, for every 1,000 API calls, you’re charged about $0.10 to $0.50, depending on which tier you’re going with for your web service. Now, the first 1,000 API calls are basically always free. So, unless you’re an e-commerce business, I don’t foresee you actually getting hit with these charges at all for deployment. And then the third usage is you’re charged for storage of data. So, basically, whatever data you pull into Azure ML, you’ll be charged for that as well. Azure Blob storage - which is the service we’re going to use - charges $0.02 per gigabyte per month. It’s actually less than $0.02, but they’re going around it up, so it’s going to be about $0.02. And then, if you’re going with the whole pay-as-you-go for this web series, this whole series will probably cost you $20 for the month. So $10 for the seat and then another $10 for the usage. r_break r_break Now, they will prorate it if you delete the workspace when you’re done with it, so that’s going to be nice. r_break r_break All right, and then that will conclude this video. If you like these tutorials, and you want to see more videos like this in the future, go ahead and like and subscribe. I’m also going to leave you with a question. r_break r_break What is your favorite data mining tool? Talk to me in the comment section. I’m really interested in knowing what everyone is using. r_break r_break I’ll see you in the next video, where I’ll show you how to create an Azure ML workspace, and I will look forward to seeing you at our Bootcamp.