Covers data science tools, Hadoop, Spark, data lake ecosystems, CAP theorem, batch vs. stream processing, HDFS, Hive, Parquet, ORC, and MapReduce architecture.
Explores storage management challenges in transitioning to data lakes, addressing software and hardware heterogeneity, unified storage design, and performance optimization.
Explores scalable synchronization mechanisms for many-core operating systems, focusing on the challenges of handling data growth and regressions in OS.