Shing & Kolsch Title

Dr. Man-Tak Shing

Associate Professor
Department of Computer Science
shing@nps.edu

Dr. Man-Tak Shing
 

Dr. Mathias Kolsch
Associate Professor
Department of Computer Science
kolsch@nps.edu

Dr. Mathias Kolsch

 

Overview

Overview

The description of the class CS4921: Mining of Large Databases, along with some of the previous knowledge you should obtain before taking the class
CS4921 Mining of Large Datasets (3-1)

Modern data-mining applications, often called "big-data" analysis, require us to manage immense amounts of data quickly. Big-data mining focuses on the extraction of information from very large amounts of data, that is, data so large it does not fit on a single computer's memory or disk. Because of the emphasis on size, many of the examples covered in the course are about the Web or data derived from the Web. Rather than using data to "train" machine-learning engines, this course takes an algorithmic point of view, focusing on applying algorithms to data and hands-on with Hadoop. Topics covered in the course include:

  • Distributed file system and map-reduce algorithm
  • Data mining techniques - finding similar items, clustering
  • Technologies for search engines - link analysis, page ranking, link-spam detection
  • Ability to program in Python, C++, or Java
  • Basic knowledge of data structures and algorithms
  • Basic UNIX command line usage
text

Text:

Asset Publisher
title-list-document-download is not a display type.