Zeinalipour > Courses > EPL646 > Schedule

Schedule »

Week Content PDF
W01 Syllabus and Course Overview »
Readings: Chapter 1, Ramakrishnan and Gehrke 3ED
Additional Readings:
  • The Beckman Database Research Self-Assessment Meeting, Daniel Abadi et. al., Irvine, California, October 2013. => (The Washington Database Research Self-Assessment Meeting, October 2018 => The Seattle Report on Database Research, August 2022)
  • "Mobility Data Science (Dagstuhl Seminar 22021)", Mohamed F. Mokbel, Mahmoud Attia Sakr, Li Xiong, Andreas Züfle, Jussara M. Almeida, Taylor Anderson, Walid G. Aref, Gennady L. Andrienko, Natalia V. Andrienko, Yang Cao, Sanjay Chawla, Reynold Cheng, Panos K. Chrysanthis, Xiqi Fei, Gabriel Ghinita, Anita Graser, Dimitrios Gunopulos, Christian S. Jensen, Joon-Sook Kim, Kyoung-Sook Kim, Peer Kröger, John Krumm, Johannes Lauer, Amr Magdy, Mario A. Nascimento, Siva Ravada, Matthias Renz, Dimitris Sacharidis, Cyrus Shahabi, Flora D. Salim, Mohamed Sarwat, Maxime Schoemans, Bettina Speckmann, Egemen Tanin, Yannis Theodoridis, Kristian Torp, Goce Trajcevski, Marc J. van Kreveld, Carola Wenk, Martin Werner, Raymond Chi-Wing Wong, Song Wu, Jianqiu Xu, Moustafa Youssef, Demetris Zeinalipour, Mengxuan Zhang, Esteban Zimányi, Schloss Dagstuhl -- Leibniz-Zentrum für Informatik, Vol. 12, pp. 1--34, Dagstuhl, Germany, DOI: 10.4230/DagRep.12.1.1, 2022.
  • 01
    syllabus
    W02 DB Internals I (Storage) »
    Assignment 2 (Minibase) Announced
    Readings: Chapters 8.1-8.5, Ramakrishnan and Gehrke 3ED
    Content: DBMS Layer 1: Data on External Storage, Storage Mediums and Storage Hierarchy (HDD, SSD, Tape, NVM/NVRAM/NVDIMMs and DNA Storage), DBMS Layer 2: Disk Space Manager (DSM), DBMS Layer 3:Buffer Manager (BM), DBMS Layer 4: Alternative File Organizations and Indexes (Access Methods), B+ Tree Index Overview, Structure of Data Entry k*, Hash-Based Index Overview, Clustered vs. Unclustered Indexes, Primary vs. Secondary Indexes, Comparison of File Organizations (System and Cost Model, Assumptions), I/O cost analysis (Heap Files, Sorted Files and Clustered B+Tree Index File), Indexes and Performance Tuning (Understanding the Workload, Index Specification in SQL, Index-Only Plans,Index Selection Guidelines)
    02
    W03 DB Internals II (Storage) »
    Readings: Chapters 9.1-9.7, Ramakrishnan and Gehrke 3ED
    Content: Disks (Components, Accessing a Block, Arranging Pages), RAID (Basic Concepts, Levels: 0 to 5 and 0+1), Disk Space Manager, Buffer Manager: Definitions (Pin/Unpin, Dirty-bit), Replacement Policies (LRU, MRU, clock), Sequential Flooding, Buffer in OS, File, Page and Record Formats: File Structure (Linked-List/Directory-based), Page Structure with Fixed/Variable-length records, Record Structure (Fixed-length/Variable-length), System Catalog.
    AI/IoT/Geo Storage Content: File Formats for Analytics and IoT: Apache Parquet: structure, codecs, examples in Apache Arrow / Pandas, DB Connectivity (Microsoft Polybase), Delta Format (Delta Lake), Geo (Spatial) Formats: GeoParquet, ESRI. PostGIS/Postgres, PostGeese/DuckDB, Apache Sedona.
    03
      DB Internals III (Indexing) »
    Readings: Chapters 10.1-10.8, Ramakrishnan and Gehrke 3ED
    Content: Introduction to Tree Indexes, Structure of Nodes in Trees, Binary Search over Sorted Files, Binary vs. N-ary Search Trees, ISAM: Indexed Sequential Access Method (Outline, Search, Insert, Delete, Examples), Introduction to B+ Trees, B+Tree Functions: Search / Insert / Delete (Algorithms and Examples), B+ Trees in Practice: i) Prefix-Key Compression and ii) Bulk Loading B+Trees,
    IoT DB Content: LSM Trees (Log-structured merge-trees): internals, ingestion, compaction and reads (LevelDB, RocksDB, and Cassandra)). Time Series Database (TSDB): InfluxDB, Apache IoTDB, TimescaleDB. Using InfluxDB: Internals, TSM vs LSM, Sharding, Retention Policies, Operators and FluxQL. Row Serialization (AVRO), Data Funnels (Kafka)
    04
    W04 DB Internals IV (Indexing) »
    Readings: Chapters 11.1-11.4, Ramakrishnan and Gehrke 3ED
    Content: Static Hashing, Dynamic Hashing (Extendible Hashing, Linear Hashing), Extendible vs Linear Hashing
    Vector DBs Content: Embeddings (Text vs Sentence), Similarity Search & Approximate Similarity Search (Lp-Norms), Libraries, Chroma DB: Internals (Main-Memory vs. Persistency with DuckDB), Storage: Parquet | DuckDB, Indexing: Hierarchical navigable small world (HNSW), Other Vector Databases Products (Vespa, Milvus, Postgres/pgvector) and Libraries (FAISS, ScaNN/AlloyDB, Hnswlib, Annoy), LLMs, RAG and Vector Databases, Open-source LLMs.
    05
    05b
      DB Internals V (External Sorting and Query Evaluation)»
    Readings: Chapters 13.1-13.5 and Chapters 12.3,14.1,14.3,14.5, Ramakrishnan and Gehrke 3ED
    Content: Introduction (When a DBMS sorts data), Simple Two-Way Merge-Sort, External Merge-Sort, Double Bufferun Using B+Tree for Sorting, Introduction to Algorithms for Relational Operators, The Selection Operation (No Index/Unsorted Data, No Index/Sorted Data, B+Tree Index, Hash Index), General Selection Conditions (Conjunctive Normal Form and Index Matching, Selections with No Disjunctions, Selections with Disjunctions, The Project Operation (using Sorting, using Hashing, Sorting vs. Hashing)
    06
    W05 DB Internals VI (Query Optimization) »
    Assignment 1 (Overview Research Papers) Announced
    Assignment 2 (Minibase) due
    Readings: Chapter 14.4 and Chapter 12.6,15.1-15.4, Ramakrishnan and Gehrke 3ED
    Content: Simple Nested Loops Join (Tuple-at-a-time, Page-at-a-time), Block-Nested Loop Join (M-size, M-size with Hashtable, (B-2)-size, Index-Nested Loops Join, Sort-Merge Join, Sort-Merge Join Refinement (Combining merge phase of sorting with merge phase of joining), Sort-Merge-Join Implementation Details. What a Typical Optimizer Does (Alternative Plans Considered, Left-Deep Plans, Estimating the Cost of a Plan), Translating SQL Queries into Algebra, Estimating the Cost of a Plan, Relational Algebra Equivalences, Enumeration of Alternative Plans (Single-Relation Queries, Multiple-Relation Queries).
    07
      DB Internals VII (Transactions and Concurrency) »
    Readings: Chapters 16.1-16.3 and 16.6, Ramakrishnan and Gehrke 3ED and Chapters 17.4-17.5, Elmasri and Navathe, 5ED
    Content: 16.0) Introduction to Transactions, 16.1) The ACID (Atomicity-Consistency-Isolation-Durability) Properties, 16.2) Transactions and Schedules, 16.3) Concurrent Executions of Transactions and Problems, 16.6) Transaction Support in SQL 16.2) Transactions and Schedules (Serial, Complete Schedules), Serializability (Conflicting Actions, Conflict Equivalence, Conflict Serializability, Testing for Serializability using Precedence Graphs, View Equivalence and View Serializability), 16.3) Concurrent Execution of Transaction, 17.1) Recoverability (Recoverable Schedule, Cascadeless schedule, Strict Schedules)
    08
    W06 DB Internals VIII (Transactions and Concurrency) »
    Assignment 3 (Technical Research Papers) Announced
    Readings: Chapter 18.1, Elmasri and Navathe, 5ED
    Content: Introduction to DBMS Concurrency Control, Concurrency Control with Locking, Locks and Types of Locks, Implementing Locks in a DBMS, Conversion of Locks (Upgrade/Downgrade), CC with Locking Techniques (Conservative 2PL, Basic 2PL, Strict 2PL, Rigorous 2PL), Deadlocks and Starvation
    09
    W07 DB Internals IX (Crash Recovery) and NVM Databases (Research Outlook)»
    Recovery Readings: Chapters 17, Garcia-Molina, Ullman, Widom, 2ED.
    Recovery Content: Definitions, Purpose, Failure Reasons, ACID Properties and Responsibilities, Undo Logging and Recovery, Checkpointing and Nonquiescent Checkpointing, Redo Logging and Recovery

    NVM Readings: + Joy Arulraj and Andrew Pavlo. 2017. How to Build a Non-Volatile Memory Database Management System. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD '17). ACM, New York, NY, USA, 1753-1758. DOI: https://doi.org/10.1145/3035918.3054780.
    + Non-Volatile Memory Database Management Systems, Joy Arulraj, Georgia Institute of Technology, Andrew Pavlo, Carnegie Mellon University, ISBN: 9781681734842 | PDF ISBN: 9781681734859, Hardcover ISBN: 9781681734866, Copyright © 2019 | 191 Pages
    NVM Content: What is NVM, Types of NVM (NVDIMM-F, NVDIMM-D, NVDIMM-P), NVM Properties, Performance, Benchmark of Traditional DBMS with NVM DBMS (MySQL vs H-store), Disk-oriented DBMS vs Memory-oriented DBMS (PelotonDB), Quickly Glancing through the NVM DBMS stack (storage to query evaluation).
    10, 10b
    W08 MIDTERM: Tuesday, March 11, 2025
    This is an open exam: an A4 paper with notes from slides / book / papers is allowed.
    -
      Big Data I: Introduction and Overview »
    Content: Definitions and Background, 3V Examples: Velocity (Sensor Monitoring, Network Monitoring, Web2.0 Media, Smartphone Services), Volume (Text-Multimedia-Sciences, Web Data, Filesystems) and Variety. The New Database Landscape NoSQL (Document Stores, Replication, Consistency, Map-Reduce, Column Stores) and NewSQL
    11
    W09 Big Data II: NoSQL - JSON Document Stores and CouchDB »
    Assignment 1 (Overview Research Papers) due: presentation and discussion scheduled for Thursday, March 14, 2019, 17:00-21:00 in ΘΕΕ01-146. Presentations and papers are available under the reading list.
    Readings: Chapter 20, Abideboul et. al
    Content: Intro to Web2.0 and the JSON data interchange format, the JSON Key-Value data model, CouchDB: A JSON Database, Using Command Line CURL/ Web-based FUTON CouchDB Architecture (Btrees, Filesystem, Replication)
    12
      Big Data III: NoSQL - JSON Document Stores and CouchDB »
    Assignment 4 Announced
    Readings: Chapter 20, Abideboul et. al
    Content: REST Principles, CouchDB Queries: Creating DBs, Adding Docs, Updating Docs, Deleting Docs, _ID and _REV issues, Multi-Version CC (MVCC), Querying Data with (Materialized) Views (Map-Reduce style in Javascript), Replication, Scalability and Security Issues.
    13
    W10 Big Data IV: Introduction and HDFS »
    Readings: Chapters 14-15, Abideboul et. al.
    Content: Introduction to Cloud Computing: Typical Datacenters, Cloud Stack and Buzzwords, Public / Private Clouds, Utility Computing, Killer Apps, Economic Model, Distributed System Basics: I/O Performance, Replication Strategies; The Hadoop Project (Core, HDFS, Map-Reduce, HBase, HIVE), The Hadoop Distributed File System (HDFS), HDFS vs. NFS (Network File System), HDFS Example Deployments (Yahoo, Facebook)
    14
      Big Data V: Hadoop/MapReduce Internals »
    Readings: Chapters 14-15, Abideboul et. al.
    Content: Introduction to "Big-Data" Analytics (Example Scenarios and Architectures), Map-Reduce Programming Model, Microsoft's Dryad Programming Model, Map-Reduce Counting Problem Map-Reduce Architecture, Hadoop JobTracker, Tasktrackers and data-nodes, Failure Management, Map-Reduce Optimizations, Combiners, Compression, In-Memory Shuffling, Speculative Execution, Programming Map-ReducE with Languages / PIG and in-the-cloud.
    15
    W11 Big Data VI: MapReduce Programming »
    Readings: Chapters 14-15, Abideboul et. al.
    Content: MapReduce “Hello World” Program Explained (Wordcount in MR, Pseudocode – Mean Computation in MR, JAVA API Preview), Operational Issues, Combiners and In-Memory Combiners, Relational Operators in MR (Selection, Projection, Union, Intersection, Set Difference, (basic) Join, Aggregation), Beyond MR operators.
    16
      Big Data VII: Column Stores and NewSQL »
    Assignment 5 Announced
    Readings: Chapter 20, Abideboul et. al
    Content: BigTable (Examples, How-big are Big-tables, Conceptual vs. Physical View), Apache HBase (Architecture, Features, Shell, When to use?), NewSQL (The problem, what are spanner and F1, spanner true-time, Google F1 RDBMS overview, deployment and challenges)
    17
    W12 Big Data VIII: Research Talk »
    Readings:
    Content:
    18
      Big Data IX: In-Memory Processing with Spark »
    Readings:
    Content:
    19
    W13 Student Presentations »
    Assignment 3 (Research Research Papers) due
    For detailed plan see Moodle.
    -
    More: Selected III: Telco Big Data »
    Readings: "Telco Big Data Research and Open Problems", Constantinos Costa and Demetrios Zeinalipour-Yazti, Proceedings of the 35th IEEE International Conference on Data Engineering (ICDE '19), IEEE Computer Society, 8-12 April 2019, Macau SAR, China, 2019.
    Content: Introduction to Crowdsourcing, Definitions, Stakeholders, Incentives, Landscape, Challenges, Smartphone Era, Previous Tutorials, Web and DB Crowdsourcing, Implicit Crowdsourcing (reCAPTCHAs and ESP Game), Explicit Crowdsourcing (Contests, Microwork, Declarative CS, Wisdom-of-the-Crowd, Crowdfunding, Crowdvoting, Q/A)
    link
    Selected I: Indoor Data Management »
    Readings: "Mobile Data Management in Indoor Spaces", Christos Laoudias and Demetrios Zeinalipour-Yazti "Proceedings of the 16th IEEE International Conference on Mobile Data Management" (MDM '15), IEEE Press, Volume 1, Pages: 11-14, 2015.
    Content: Introduction to Indoor Mobile Data Management, Indoor Location, Modeling, Query Processing and Analytics, Privacy and Crowdsourcing, Conclusions and Future Directions
    link
    Selected II: Crowdsourcing »
    Readings: "Crowdsourcing for Mobile Data Management", G. Chatzimilioudis and D. Zeinalipour-Yazti "Proceedings of the 14th IEEE International Conference on Mobile Data Management" (MDM '13), Milan, Italy Volume 2, Pages: 3-4, 2013.
    Content: Introduction to Crowdsourcing, Definitions, Stakeholders, Incentives, Landscape, Challenges, Smartphone Era, Previous Tutorials, Web and DB Crowdsourcing, Implicit Crowdsourcing (reCAPTCHAs and ESP Game), Explicit Crowdsourcing (Contests, Microwork, Declarative CS, Wisdom-of-the-Crowd, Crowdfunding, Crowdvoting, Q/A)
    link
    Exams FINAL Exam scheduled on scheduled on Monday, May 15, 2025 @ 16:30-19:30 in ΧΩΔ01-001. Please always consult the schedule for updated details.
    This is an open exam: an A4 paper with notes from slides / book / papers is allowed.
    -