Skip to main content

Posts

Showing posts from October, 2021

Cloud Computations - Quick data analysis with AWS Athena, Glue and Databricks spark

Cloud Computations -  Quick data analysis with  AWS Athena, Glue and Databricks spark   Throughout my carrier, I always had a situation that I had to fix failing production jobs. Most of the time, the debug involved analysis of input data to figure out the error in the raw data. For the last ten years, I have also been doing data analysis to provide quick business insights. This often involves running a complex query on an extensive set of data. Most of the time, we do not have access to the production environment to debug a job or install the required packages. It's also advisable not to debug jobs in the production environment as it might have a negative performance impact or completely break the job. We have been using a few tools to debug, mainly Hive, Presto, Tableau, etc. These tools are not always the best option as often it's required to have custom code/ser-der/packa need to be used for debugging falling jobs because of data issues. I like to use spark, however; ...