Anjan Data Engineer

ID: 82687

About Me

• Big Data Professional with over 8+ experience in Big Data Eco System Using HDFS, YARN, MapReduce, Hive, Impala, Pig, Sqoop, HBase, Spark, Spark SQL, Kafka, Spark Streaming, Flume, Oozie, Zookeeper, Hue. • Expertise in using major components of Hadoop ecosystem components • Good understanding of distributed systems, HDFS architecture, Internal working details of MapReduce and Spark processing frameworks. • Experience in ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data Warehouse tools for reporting and data analysis. • Develop data set processes for data modelling, and Data mining. Recommend ways to improve data reliability, efficiency and quality. • Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems and vice-versa and load into Hive tables, which are partitioned. • Proficient in data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka. • Expertise in Python and Scala, user-defined functions (UDF) for Hive and Pig using Python. • Having good knowledge in writing MapReduce jobs through Pig, Hive, and Sqoop. • Hands-on use of Spark and Scala API's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames in Scala. • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement. • Experience in designing star schema, Snowflake schema for Data Warehouse, ODS architecture. • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management. • Hands on Spark MLlib utilities such as including classification, regression, clustering, collaborative filtering, dimensionality reduction. • Expertise in Amazon Web Services (AWS) Cloud Platform which includes services like EC2, S3, VPC, ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy, Dynamo DB, Auto Scaling, Security Groups, Red shift, CloudWatch, Cloud Formation, Cloud Trail, Ops Works, Kinesis, IAM, SQS, SNS, SES. • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features. • Excellent working with data modeling tools like Erwin, Power Designer and ER Studio. • Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging and Teradata. • Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments. • Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality. • Expertise in designing complex Mappings and have expertise in performance tuning and slowly changing Dimension Tables and Fact tables • Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQLDatabase, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data lakestore using Azure Data factory. • Experience in developing a data pipeline through Kafka-Spark API. • Expertise in developing relational and NoSQL databases including OLTP, OLAP, MDM, Data Warehouse and Data Governance solutions using 3NF, Star and Snowflake schemas designs. • Extensively worked with Teradata utilities Fast export, and Multi Load to export and load data to/from different source systems including flat files. • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB using Python. • Experience in working with Excel Pivot and VBA macros for various business scenarios. • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS). • Track record of results as a project manager in an Agile methodology using data-driven analytics. • Experience in data manipulation, data analysis, and data visualization of structured data, semi-structured data, and unstructured data • Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle) • Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services. • Excellent in performing data transfer activities between SAS and various databases and data file formats like XLS, CSV, etc. • Experienced in development and support knowledge on Oracle, SQL, PL/SQL, T-SQL queries. • Experience in Designing and implementing data structures and commonly used data business intelligence tools for data analysis. • Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inmon’s Approach. • Working Knowledge of the Spark Architecture and programming Spark applications • Creative skills in developing elegant solutions to challenges related to pipeline engineering • Good Understanding of Data ingestion, Airflow Operators for Data Orchestration, and other related python libraries. • Experience in designing Data Marts by following Star Schema and Snowflake Schema Methodology. • Highly skilled in Business Intelligence tools like Tableau, PowerBI, Plotly and Dataiku. • Experience in managing and analyzing massive datasets on multiple Hadoop frameworks like Cloudera and Hortonworks. • Experience in designing and developing applications in Spark using Python to compare the performance of Spark with Hive. • In - depth understanding of Snowflake cloud technology. • Experience in Spark-Scala programming with good knowledge on Spark Architecture and its In-memory Processing. • Experience with Snowflake Multi-Cluster Warehouses. • Experience with Snowflake Virtual Warehouses. • Hands-on experience with Amazon EC2, Amazon S3, Amazon RDS, VPC, IAM, Amazon Elastic Load Balancing, Auto Scaling, Cloud Front, docWatch, SNS, SES, SQS and other services of the AWS family. • Good work experience with the cutting-edge technologies like Kafka, Spark, Spark streaming. • Partnered with cross functional teams across the organization to gather requirements, architect, and develop proof of concept for the enterprise Data Lake environments like MAPR, CLOUDERA, HORTONWORKS, AWS, and AZURE. • Strong Experience in analyzing data using HIVE, Impala, Pig Latin, and Drill. Experience in writing custom UDFs in Hive and Pig to extend the functionality. • Experience in writing MAPREDUCE programs in java for data cleansing and preprocessing. • Excellent understanding/knowledge on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource manager, Node manager. • Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight • Good working experience with Hive and HBase/MapRDB Integration. Excellent understanding and knowledge of NOSQL databases like HBase, and Cassandra. • Experienced in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Python. • Experience setting up instances behind Elastic Load Balancer in AWS for high availability and cloud integration with AWS using ELASTIC MapReduce (EMR). • Experience in working in Hadoop eco-system integrated to the Cloud platform provided by AWS with several services like Amazon EC2 instances, S3 bucket and RedShift. • Good experience working with Azure Cloud Platform services like Azure Data Factory (ADF), Azure Data Lake, Azure Blob Storage, Azure SQL Analytics, HDInsight/Databricks. • Expose to various software development methodologies like Agile and Waterfall. • Extensive experience working with spark distributed Framework involving Resilient Distributed Datasets (RDD) and Data Frames using Python, Scala and Java8. • Involving in developing applications on Windows, UNIX, and Linux Platforms. • Extensive experience in SQL, Python (3.x), and PySpark • Experience with batch and real-time data processing tools and technologies: Azure Databricks/Spark, Azure Synapse/DW, Azure Analysis Services, Azure Data Factory • Deep experience with data engineering, big data and analytical technologies using Azure cloud-based data platforms • Familiarity with Continuous Integration/Continuous Deployment, Git. • Fluent in spoken and written English. • Knowledge about Agile development methods like Scrum and Kanban. Technologies we use: Microsoft Azure Databricks (Spark), Azure SQL Datawarehouse, Azure Tabular, Azure Data Factory, Azure Functions, Azure Containers, DevOps, Scripting (Powershell, Bash), Git, Terraform, Power BI, Snowflake, Docker, Terraform.

Hire Remote Developers For Your Project

iT-Outstaffing.com is a reliable Staffing Vendor that provides Remote Developers for your project within 24h on request

Send your request

Skills description

Python
The “Python” name was borrowed from the TV show called Monthly Python and a language itself was developed in 1990. Python supports different programming modules and packages that promotes modularity and smart code reuse. Python interpreter and standard libraries are fully available for all the major platforms in compiled and original versions. Python authors were guided by a philosophy of simplicity and immediacy, as an unwavering guarantee of quality. Python’s philosophical paradigm implies a harmony, beauty and an avoiding of excessive complexity. That's why Python is considering as one of the most easiest programming languages and as the very prospective one. The so-called generation of Python is quite young and very progressive. This is why even highly qualified Python developers are still quite young, but ambitious professionals. Python freelancers have an experience in the complex implementation of dynamic interactive web sites and web applications. Python has no limits, so Full Stack Python developers professionalism is limitless too. AOG offers you to work directly with the Python developer and our managers will help you 24/7!
SQL
If HTML is a basis of frontend programming, then the SQL is a key to the deep database programming and managing of almost every existing DBMS. Using this language, you can manage different data, filter and analyze information even in very large volumes. In general, SQL first appeared in 1974, but it is still widely used by programmers who work with different databases platforms such as MySQL, because of its versatility and uniqueness. Did you know that a well-known Microsoft Access application is also based on the SQL? Yes, SQL is much closer than it seems. It’s everywhere! Typically, experienced SQL developers are working with multiple platforms and databases systems. It is an incredibly big advantage! You can hire a highly qualified SQL database developer and get an expert advice from him or simply involve him in your project as a BackEnd SQL Freelancer. During one week you can test your candidate and if he will not satisfy you, we will replace him for another one without any difficulties!

Related profiles