Python Dictionary

Creating a dictionary

>>> my_dict[‘key1′]=’value1’
>>> my_dict[‘key2′]=’value2’
>>> my_dict[‘key3′]=’value3’
>>> my_dict
{‘key3’: ‘value3’, ‘key2’: ‘value2’, ‘key1’: ‘value1’}


>>> my_dict = {‘A’:0, ‘B’:1, ‘C’:2, ‘D’:3, ‘E’:4}

Iterating through python dictionary

>>> my_dict = {‘A’:0, ‘B’:1, ‘C’:2, ‘D’:3, ‘E’:4}
>>> for key, value in my_dict.iteritems():
>>>  print key, value
A 0
C 2
B 1
E 4
D 3

We can also use my_dict.iterkeys() or my_dict.itervalues() to itearte of keys or values in the dictionary


Reading csv files in SAS

SAS reads any input file (ASCII) separated by delimited characters. However, it is important to assign the format to the SAS variables to ensure that the columns are read correctly.

The following are to be established to ensure correct reading of the files.

  1. DLM -> ‘,’ OR ‘|’ or any other characters used as delimiter.
  2. DSD -> indicates that the data is sensitive and may contain delimiters between the quotes (string data) and need to be considered as characters.
  3. TRUNCOVER -> Assign the value to the variable even if the value is less than the allocate format size.
  4. LRECL -> default logical record length for reading an external file
  5. FIRSTOBS -> read from specified row number
  6. FORMAT -> :$20.

The most important is the : . The colon indicates that the reading of columns begin with the delimiter and must not consider the length (20 in this case) to separate the columnar data. $ is to indicate that it is a character. ($w.) w is the width of the characters.

Following is an example:


DATA csv_data;

INFILE ‘C:\path_to_dir\file_name.csv’ DSD DLM=’,’ TRUNCOVER LRECL=1024 FIRSTOBS=2;

INPUT char_variable1 :$20. char_variable2:$20. char_variable3:$20. int_variable 5.3;



Setup easy_install and Scrapy

On windows,

Download Scrapy-0.16.3.tar.gz from

tar xvf Scrapy-0.16.3.tar.gz


easy_install lxml==2.3

easy_install -U Scrapy-0.16.3

Scrapy Example

scrapy crawl <name> will throw the following error.

ImportError: Error loading object ‘scrapy.core.downloader.webclient.ScrapyHTTPClientFactory’: No module named win32api

Please refer to

Install pywin32-218.win32-py2.7.exe from (for python 2.7 win32)

FreeTDS Installation: Creating DSN for MS-SQL in Ubuntu

Installation of FreeTDS

FreeTDS is a open source implementation of Tabular Data Stream Protocol, used to connect to a number of RDBMS.

Install the following packages in ubuntu

sudo apt-get install unixodbc unixodbc-dev tdsodbc freetds-dev sqsh

We need to append/create the following files

  1. /usr/share/freetds/freetds.conf
  2. /etc/odbcinst.ini
  3. /etc/odbc.ini

sudo vi /usr/share/freetds/freetds.conf
host =
port = 1433
tds version = 7.0

To test if the freeTDS is working fine, use tsql to connect to RDBMS.
Note that the TDS Version is related to SQL server (version). 2007 MSSQL is version 7.0. By default, TDS is set to 5.0


You can lookup the location of the driver using the following command.
find /usr/ -type f -name libtds*

sudo vi /etc/odbcinst.ini
Description = FreeTDS driver
Driver = /usr/lib/i386-linux-gnu/odbc/
FileUsage = 1
UsageCount = 1

sudo vi /etc/odbc.ini
Description = MS SQL Server
Driver = FreeTDS
Server = IP_ADDR
ReadOnly = No
Port = PORT_NO

Test your final DSN with the following command
isql -v MSSQL username password

Note: Please do not leave space at the beginning of the line while editing these files. DSN will not be configured otherwise.

Feel free to report any corrections.


Lemmatization is a process of finding inflected words that can have similar meaning. It can be identified, grouped together and treated as a single word in the same context.

For example:

(Drank, drinking, drunk) -> drink

(Great, better, best) -> good


from nltk.stem.wordnet import WordNetLemmatizer

Java: (Stanford NLP)

import java.util.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.ling.CoreAnnotations.*;