Cloud Computations - Quick data analysis with AWS Athena, Glue and Databricks spark Throughout my carrier, I always had a situation that I had to fix failing production jobs. Most of the time, the debug involved analysis of input data to figure out the error in the raw data. For the last ten years, I have also been doing data analysis to provide quick business insights. This often involves running a complex query on an extensive set of data. Most of the time, we do not have access to the production environment to debug a job or install the required packages. It's also advisable not to debug jobs in the production environment as it might have a negative performance impact or completely break the job. We have been using a few tools to debug, mainly Hive, Presto, Tableau, etc. These tools are not always the best option as often it's required to have custom code/ser-der/packa need to be used for debugging falling jobs because of data issues. I like to use spark, however; ...
I have been using spark for a long time. It is an excellent, distributed computation framework. I use this regularly at work, and I also have it installed on my local desktop and laptop. This document is to show the installation steps for installing spark 3+ on Windows 10 in a sudo distributed mode. Steps:- Install WSL2 https://docs.microsoft.com/en-us/windows/wsl/install-win10 Install Ubuntu 20.4 LTS from the Microsoft store. Install windows terminal form the Microsoft store. This step is optional. You can use PowerShell or MobaXterm Fire up the Ubuntu from WSL Once logged in, then go to home dir “ cd ~ ” For spark, we need Python3 Java Latest Scala Spark with Hadoop, zip file Let's download and install all the prerequisite install python sudo apt-get install software-properties-common sudo apt-get install python-software-properties install Java (open JDK) sudo apt-get install openjdk-8-jdk Check the java and javac version java -version javac -version Install Scala ...