OtherPapers.com - Other Term Papers and Free Essays
Search

Data Mining

Essay by   •  April 9, 2016  •  Research Paper  •  3,344 Words (14 Pages)  •  2,284 Views

Essay Preview: Data Mining

Report this essay
Page 1 of 14

PVA – FUNDRAISING

IDS 572 – DATA MINING

ASSIGNMENT 2

Arati Singh

Nirav Dedhia

Sachin Pandey

Table of Contents

Contents

Question 1: Data Analysis        

Question 2: Modelling        

Question 3: Classification under Asymmetric Response and Cost        

Question 1: Data Analysis

  1. Data Statistics:

The table 1 in the appendix gives the mean, number of missing values, minimum and maximum values for each attribute. The appendix also includes the distribution of the variables.

Following are some of the inferences from the data analysis:

  1. The Target_B variable shows 65% of donors and 35% of Non-donors.
  2. There are no missing values for Target_B, State, DOB, HIT, however there are missing values in AGE, HOMEOWNR, NUMCHILD, INCOME, and GENDER.
  3. The donors consist of 5367 Females and 4107 Males.
  4. The average age of the population is 61.6.The minimum age is 25 for the Donors.
  5. The maximum donors are from State CA.
  6. The average number of children is 1-2.
  7. The number of homeowners is 5478.

Note: Click on DATA SUMMARY: for detailed view

  1. Data cleaning and transformation model

Below we have described each process we performed on the attributes:

  1. Generate New Attribute: We used generate new attribute to replace the missing values with 0 and non-missing values with 1 and thus we created new variable for every variable transformed. The details of the variable generated and the function applied is given in the appendix table 4.

Note: Click here Generate attribute for detailed view.

  1. Remove old attributes: We selected attribute for which we created a new attribute in the previous step and removed the old variable for the transformed variables. The details of the attributes removed is given below.        

Remove old Attribute:

CHILD03

CHILD07

CHILD12

CHILD18

DOMAIN

GENDER

HOMEOWNR

MAJOR

PEPSTRFL

PVASTATE

RECINHSE

RECP3

RECPGVG

RECSWEEP

Note: Click here  Remove old variables for detailed view.

  1. Eliminating less relevant Variables:

  1. Remove useless attributes: In this step, we eliminated variables which seemed less relevant to us to obtain donor/non donor target prediction. We deleted some variables which had more than 50% null values and variables which were highly skewed, also some variables like past date of donation or amount of donation, as these types of variables did not contribute much to modelling , we decided to delete these.

The details of the less relevant attributes removed is given below.

Attributes

Reason for removal

ADATE_1-ADATE_24

These are all historical promotion values which we found irrelevant for prediction.

MDMAUD

MDMAUD_A

MDMAUD_F

MDMAUD_R

These variables represent Major donor matrix who have given gift previously which we think is not necessary for our target prediction. Also the values for this field is highly skewed.

WEALTH2

WEALTH1 is already considered making this redundant.

ANC1 - ANC15

Ancestry of persons is of no significance in predicting donors and non-donors.

LSC1 - LSC4

Language of persons is of no significance in predicting donors and non-donors.

ODATEDW, OSOURCE, TCODE, DOB, NOEXCH, AGEFLAG, DATASOURCE, GEOCODE, LIFESRC, HPHONE_D, MAILCODE,

These variables about Geocode, zip, Phone number and other donor’s basic info is also irrelevant for prediction of our Target variable

RECHINSE,RECGVNG,RECP3,RECSWEEP,NUMCHILD, CHILD03-CHILD18,SOLP3,SOLIH,MAJOR,COLLECT1,VETERANS,BIBLE, CATALOG,HOME,PETS,CDPLAY,STEREO,FISHER,GARDEN,BOATS,WALKER,PEPSTRFL

Some of the variables which were highly skewed and would over predict the results and thus eliminated.

HHAGE1-HHAGE3,DW1-DW9,HV1-HV4 ,HU1-HU5,HHD1-HHD12,HHAS1-HHAS4,MC1-MC3,TPE1-TPE13,LFC1-LFC10,AFC1-AFC3,HC1-HC21,

We also removed some Neighbor population attributes with skewed value and some which we found redundant with respect to target variable prediction

TARGET_D

As suggested in the case we discarded this

      Note: Click Here Remove useless variable for detailed view.

  1.  How did we handle missing values:
  1.  Map: In this step we transformed nominal variable with “?” by replacing with “N”.
  2.  Replace Missing Value: In this step we replaced all numeric variables with unknown value as “0”.

Below is the summary table of variables with the replace missing techniques used.

Attribute Value

Original Value

Missing Values Replaced By

Transformation Technique

Domain

1st byte = U,C,S,T,R

2nd byte=1,2,3

N/A

Cut(Domain,0,1)

collect1, cards, kidstuff

Y / N

0

if(Value=Y,1,0)

CHILD03, CHILD07, CHILD12, CHILD18

M, F, B

0

if(value=”M” || “F”|| “B”, “1”,” 0”)

...

...

Download as:   txt (21.4 Kb)   pdf (1.1 Mb)   docx (522.8 Kb)  
Continue for 13 more pages »
Only available on OtherPapers.com