Comparative analysis of large data processing in Apache Spark using Java, Python and Scala
- Ivan BorodiiIllia FedorovychH. OsukhivskaD. VelychkoRoman Butsii
- 21 October 2025
Computer Science
It was found that the programming language significantly affects the efficiency of data processing by the Apache Spark algorithm, with Scala and Java being more productive for processing large amounts of data and complex operations, while Python demonstrates an advantage in working with small amounts of data.
Performance Benchmarking of Continuous Processing and Micro-Batch Modes in Spark Structured Streaming
- Illia FedorovychH. OsukhivskaN. Lutsyk
- 2024
Computer Science, Engineering
The findings indicate that while the Continuous Processing mode offers significantly lower latencies while using Rate source, its performance in high-throughput scenarios using Kafka source may be less consistent, and have practical implications for optimizing text data flow strategies in big data analytics.