大數據、機器學習與人工智慧(112-1)

課程大綱

Prerequisite:

  • Programming Proficiency: It is essential for students to have familiarity with at least one high-level programming language.
  • Expertise in scientific programming languages, such as R, Matlab, Python, Julia, or SAS, will be especially beneficial.
  • Relational Databases SQL: Students must possess a foundational understanding of relational databases and the Structured Query
  • Language (SQL). This knowledge is pivotal for many data engineering components within the course.
  • Data Structures Fundamentals: A grasp of basic data structures, including but not limited to arrays, lists, sets, and dictionaries, is vital. This understanding will serve students well as they delve into the intricacies of Data Engineering and Analytics.
  • Introductory Statistics: A solid grounding in statistical concepts is required. This includes understanding descriptive analytics,
  • which encompasses measures of data dispersion and central tendency, as well as diagnostic analytics, such as hypothesis testing. Such
  • knowledge will pave the way for a deeper comprehension of data analytics methodologies and statistical machine learning algorithms.
  • Analytical Problem-solving Skills: The practical aspects of this course necessitate strong analytical thinking and
  • problem-solving abilities. Students should be prepared to apply their knowledge to tackle real-world challenges.

For those students who may find themselves lacking in any of the outlined prerequisites, it is strongly advised to pursue supplementary
coursework or dedicated self-study. This proactive approach will ensure a richer and more effective learning experience throughout the course.

課程目標

This course offers an in-depth exploration of the multifaceted domains of AI, Machine Learning (ML), and Big Data Analytics. Students will gain a foundational understanding of Data Science, tracing its evolution and significance across diverse sectors such as business, healthcare, insurance, and finance.
The curriculum addresses essential topics, encompassing Big Data Analytics, Data Engineering, Business Analytics, Machine Learning Design Patterns, and the nuances between Supervised & Unsupervised Learning. Enhancing the course content, students will explore foundational concepts like deep neural networks, model architecture design, functional programming, and ML Interpretability, which aim to demystify the black-box nature of many ML models. A central component of our curriculum is the transformative potential of large language models, such as the GPT-series. Students will appreciate how these models are pivotal in data analytics, especially in their capacity to generate R and Python code for streamlined and automated data processing.
Throughout the course, the emphasis on practical applications ensures that students garner hands-on experience with the R and Python programming languages, addressing modern data analytics challenges. This course is tailored for those eager to both grasp and apply the principles of data science and ML/AI in concrete real-world contexts.

授課方式

  1. Onsite & online video lectures
  2. In-class quiz

評分方式

  1. Quiz:10%
  2. In-class exercise:20%
  3. Homework:30%
  4. Midterm Proposal:20%
  5. Final Project:20%

參考書/教科書/閱讀文獻

  • G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: with Applications in R, 2nd Edition, Springer (Available free online: https://www.statlearning.com/)
  • G. James, D. Witten, T. Hastie, and R. Tibshirani, J. Taylor, An Introduction to Statistical Learning: with Applications in Python, Springer (Available free online: https://www.statlearning.com/)
  • F. Buisson, Behavioral Data Analysis with R and Python, O’Reilly Media, Inc., 2021.
  • K. Hwang and M. Chen, Big-Data Analytics for Cloud, IoT and Cognitive Computing, 1st ed. Wiley Publishing, 2017.
  • EMC Education Services, Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data, John Wiley & Sons, 2015.
  • N. Matloff, The Art of R Programming: A Tour of Statistical Software Design, 1st edition. No Starch Press, 2011.
  • Kabacoff, Robert, R in Action, Manning Publications Co., 2011
  • C. O’Neil and R. Schutt, Doing Data Science: Straight Talk from the Frontline, 1st edition. O’Reilly Media, 2013.
  • F. Chollet and J. J. Allaire, Deep Learning with R, 1 edition. Manning Publications, 2018.
  • A. Géron, Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, O’Reilly Media, Inc., 2019.
  • W. McKinney, Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, O’Reilly Media, Inc., 2017.
  • F. Provost, T. Fawcett, Data Science for Business: What you need to know about data mining and data-analytic thinking, O’Reilly Media, Inc., 2013.
  • C. Molnar, Interpretable Machine Learning: A Guide For Making Black Box Models Explainable. Munich, Germany: Independently published, 2022. (https://christophm.github.io/interpretable-ml-book/)
  • J. Pearl, M. Glymour, and N. P. Jewell, Causal Inference in Statistics – A Primer, 1st edition. Chichester, West Sussex: Wiley, 2016.

課程內容及進度

Week Syllabus
1 Course Introduction
2 Data Engineering — I
3 Data Engineering — II
4 Data Engineering — III
5  Fundamentals of Data Analytics — I
6 Fundamentals of Data Analytics — II
7  Fundamentals of Data Analytics — III
8 Introduction to Statistical Learning — I
9 Project Proposal Defense – I
10 Project Proposal Defense – II
11  Introduction to Statistical Learning — II
12  Supervised Learning — Regression
13 Supervised Learning — Classification
14 Introduction to Unsupervised Learning
15 Topics in Interpretable Machine Learning and Causal Inference
16 Term Project Presentation — I
17 Term Project Presentation — II
18