Skip to main content

A Two Day Workshop on Data Mining

I recently attended this, 2 day workshop, July 30 and 31, 2010

Organized by Dept. of Computer Science & Engineering,
Reva Institute of Technology & Management, Bangalore.
  • Chief Guest: Dr. M K Banga, Senior Manager, Wipro.
  • Guest of Honour: Basawraj Patil, IBM Project Manager.
  • Prof  & Head – VijayKumar, Reva Institute.
  • Principal – Reva : Rana Pratap Reddy.
All awesome sessions:

Day - 1
Few Highlights:
  • Data + Mine = DataMining
  • Film = “Beautiful Minds”
Current Problems

  • Data mining on Images / Audio / Video.
  • Patterns, Similarity and Regularity.
  • Mathematical Models, Statistics, Probability, Fuzzy Logic, Neural Networks
  • Developing Mathematical models for Leaf pattern / Sun Flower
  • Collaborative Filtering
  • Stanford University offers – Dataset + Infrastructure
  • 90% data on Internet is junk


  • So, first Clean the data, Pre-processing steps [Data Cleansing and Transportation]
  • Data Types: Enterprise Data / RDBMS / Web Data
  • Book Reference: - “Pattern Classification” by Duda & Hart. 

Prof & Head - ISE, K. Raghuveer


  • Supervised Learning/ Unsupervised Learning
  • Classification / Clustering / Association
  • Data Mining Tools:
  • GLUTO – Graphical CLUstering Toolkit
  • WEKA – Waikato Environment Knowledge Analysis [written in Java]
  • XLMiner – Addin for Excel sheet.


      3 types of Web Mining:
        1) Web Content Mining
        2) Web Structure Mining
        3) Web Usage Mining
   
  • Web Crawlers / Robot:
  • Semantic Web
  • IBM’s Clever Search Engine
  • Extraction - Prepare a report of 1 page from 100 pages of Report.
  • Abstraction – Prepare s summary/abstract from 100 pages of Report.
  • Generate a title – Given 100 page project Report.


Srinivas Gowda
Advissory S/W Engineer
India Software Labs
  • Data Quality – Critical factor to create consistent trust worthy information
  • Association Rule: Market Basket Analysis
  • Time Series Data – Mining ATM Applications
  • SPSS – Statistical Package for Social Sciences
Day - 2
Few Highlights:
Distributed Data Mining
Clustering Techniques


Dr. Srinivas K G
MSRIT, Bangalore

Support Vector Machines & Its Applications

n-p hard problem
Soft Computing

Tools – SPSS,WEKA
Stipulus Correlation – Correlating anything to anything

Data is fuzzy , No data is measurable
Fuzzy Dataset – Ex: She is beautiful, He is tall.

Local Data to Mine - Breast Cancer

SVM - Support Vector Machines
Retrieving interesting content has become a very difficult task.

Web Operating System
Virtual Society

EDIA, EDAS and EasyChair

Challenges
  • DMQL – Not a standard like SQL
  • Scaling up for high dimensional data
  • Sequential/ Time series Data
  • Mining Complex Knowledge from Complex Data
  • Impact Factor – Who all referred your paper
  • Google Scholar
  • Data Mining in Social Networking site
  • Distributed Data Mining and Multi agent data
  • Data Mining for Biological & environmental Problems
  • Dealing with non-static, unbalanced and sensitive data
  • UCI – Datasets – small & highly unbalanced
Prof Jay Bharatheesh Simha Ph.D
Abiba Systems
  • Mining Massive Data Sets
  • Data Mining for Knowledge Discovery
  • BioInformitcis 





    Comments