Large company, more than 100 employees

Apache Spark

Apache Spark

Spark is an open source framework from Apache Software Foundation for distributed processing of large amounts of data on clusters of computers, designed for use in Big Data environments, and created to enhance the capabilities of its predecessor MapReduce.

Spark inherits the scalability and fault tolerance capabilities of MapReduce, but far surpasses it in terms of processing speed, ease of use and analytical capabilities...

Google Cloud SQL

Google Cloud SQL

Google Cloud SQL is a fully managed relational database service that helps businesses set up, maintain, and manage their databases on Google Cloud Platform.
It supports MySQL, PostgreSQL, and SQL Server...

Apache Hive

Editor de consultas SQL de Apache Hive

Hive is a software that works on Hadoop clusters creating a layer that allows the developer to abstract from the management of HDFS and MapReduce files through SQL-based data query operations, with the HiveQL language...

Cloudera Data Platform

Cloudera Data Platform (CDP) is a comprehensive solution for data management and analytics in hybrid and multi-cloud environments. Designed to offer flexibility and scalability, it enables organizations to manage data across any cloud, perform advanced analytics, and ensure data security at all times..

Apache Hadoop

Arquitectura de apache Hadoop

The Hadoop software library is a framework that enables distributed processing of large datasets using clusters of computers or servers, using simple programming models.

Hadoop is designed to scale easily from single server systems to thousands of machines...

IBM Cloud Pak for Data

IBM Cloud 4 data

IBM Cloud Pak for Data is a modular platform designed to integrate, manage, and analyze distributed data across hybrid and multi-cloud environments. It emphasizes data virtualization and centralized governance, enabling real-time access to trusted data while optimizing operational efficiency..