Cluster results visualisation in R with fviz_cluster

R comes as a handy tool for a data scientist. One package I found useful while working in R, is factoextra, a package for visualisation and extraction of results from multivariate data analysis including different dimensionality reduction techniques and clustering methods. The official documentation of factoextra is at http://www.sthda.com/english/rpkgs/factoextra/. It can be installed from CRAN using the command ,

install.packages("factoextra")

fviz_cluster function from factoextra is useful in clustering results visualisation. The function usage details are at http://www.sthda.com/english/rpkgs/factoextra/fviz_cluster.html

A quick example showing the clustering of Iris

#Load iris dataset
data("iris")
irisDataScaled<- scale(as.matrix(iris[, 1:4]))
kmeansClusters<- kmeans(irisDataScaled, 3, nstart = 25)

#Visualisation of resuls using fviz_cluster from factoextra
library("factoextra")
fviz_cluster(kmeansClusters, irisDataScaled, stand = FALSE, geom = "point")

The resulting figure would be

kmeanswithiris

The output from the fviz_cluster function is a ggplot. If you are to use fviz_cluster within a loop,  the output ggplot should be printed out explicitly as

for (i in 1:iterEnd){
.....
print(fviz_cluster(kmeansClusters, irisDataScaled, stand = FALSE, geom = "point")
......
}
Advertisements

Working with .forward file in Linux

You can easily configure you emails residing on a Linux server to be forwarded into an email address you prefer.

For an example if you want your emails at stu01@ce.pera.lk to be forwarded to stu01@gmail.com , you could do this by editing the .forward file in your home folder.

.forward is  a utility provided  for individual users on Linux to add aliases to their email. When delivering email to a user, sendmail checks to see if the user has a .forward file in his home directory. If not empty, it’s content are read and used as an alias for user’s original email. For an example, if I had the line  stu01@gmail.com in my .forward file, all emails will be directed directed to that address instead of being delivered to my mailbox at ce.pera.lk

The .forward file can have anything in it that the right-hand side of an alias can such as another user, a filename, or a program to pipe it to. Multiple entries can be separated with a comma or a newline. alias expansion is done recursively until there’s nothing left to expand.

What if you want a copy to yourself too ? One  method guessed would be to have .forward file as below.

stu01@ce.pera.lk

stu01@gmail.com

This is wrong, since it would lead to an infinite loop since when sendmail reaches stu01@ce.pera.lk and expands again. Method to avoid this is to tell sendmail not to expand that entry by adding a ‘\’ at the beginning. So the correct method .forward file would look like as below.


\stu01@ce.pera.lk

stu01@gmail.com

Difference between cast and convert commands in MS SQL

While working in MS SQL, we often need to do data conversion from one type to another. Consider , comparing 2 dates in string format. Easiest would be to cast the parameters into DATETIME from VARCHAR.

There are 2 functions supported by SQL Server , for casting, that is ‘CAST’ and ‘CONVERT’
But which is the best function to use?
I wanted to clarify , before proceeding and came up with this wonderful post on an experiment carried out on this matter.

According to Nakul’s experiment, use of both these does not make a big difference and he concludes with the fact that, infact, CAST uses CONVERT in underlying implementation.

Please refer below , for complete reference on this.

http://beyondrelational.com/modules/2/blogs/77/posts/11334/cast-vs-convert-is-there-a-difference-as-far-as-sql-server-is-concerned-which-is-better.aspx

 

 

Celebration of Software Freedom Day 2013 at Virtusa CMB premises

Aligning with the global celebration of software freedom day (http://softwarefreedomday.org)”>, day which falls in each third Saturday of September , Virtusa Open Source Special Interest Group (VOSSIG) organized a software freedom day event at Virtusa ,Colombo office premises. The event was held from 1 p.m to 5.30 p.m with participation of several geeks presenting and sharing their expertise on FOSS and enthusiastic attendance of Virtusans. I found it really great to be part of the team of organizers of this year’s event.

Attendees of Virtusa SFD 2013
Attendees of Virtusa SFD 2013

Miffan Careem (Director, Solutions Architecture – WSO2) joined the event briefing about the importance of practicing Free and Open source software development and the value it has added to his career. We also had Suchetha Wijenayake (FOSS advocate, Consultant for Linux/FOSS based services) as a guest speaker of the event and he shared the concepts behind FOSS and it’s history.

We also had Chandika Mendis (Senior Vice President and Global Head of Engineering at Virtusa) and Thushera Kawdawatta (Senior Director of Technology & Chief Software Architect at Virtusa) sharing how Virtusa supports FOSS and uses it’s value within the organization. The initiatives carried out at Virtusa has lead to 2 prominent open source projects Sahana (http://sahanafoundation.org/) and Akura (http://akuraschools.org/). Also Virtusa has it’s own operating system customized for employees , based on Ubuntu named Virtusa OS in which I have been part of once as an intern at Virtusa R&D department. Hence , as was elaborated in the session , there is a lot of enthusiasm at and support from Virtusa towards FOSS development .

Then, there were 2 sessions conducted by VOSSIG team members on Debian packaging and Ethical Hacking . The session ended with a technical discussion with the panel which was conducted by me and a fellow VOSSIG member , Probodha.

It was the first ever software freedom day held at Virtusa and the event ended with a lot of encouraging feedback from the guest panel and the attendees. Hope that enthusiasm will follow in the years to come……..

The VOSSIG team after having a successful SFD 2013 being held at Virtusa
The VOSSIG team after having a successful SFD 2013 being held at Virtusa

XML file handling fails with external jars :”DOMSource cannot be processed: check that saxon9-dom.jar is on the classpath” How to resolve

When I run a simple XML file handling module without using any external dependency jar it works perfect. But once I have integrated it
with other modules which are using some external jars, it gives the error, “DOMSource cannot be processed: check that saxon9-dom.jar is on the classpath”.
As I dig on I could find that issue triggers when I am trying to perform use Transformer class to output the entire XML content to stream output,i.e a File.
where I call final javax.xml.transform.TransformerFactory tfactory = TransformerFactory.newInstance();
Workaround for this was to change it as TransformerFactory tfactory = new com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl();
The hack works since newInstance() method in TransoformerFactory (if we check on JRE system library ) in turn points to com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl() 😉
Hope you would find it useful if you fell into trouble with xml file handling in Java.

How rabbit and turtle help in identifying loops in Linked List Data structure.

List is one of the fundamental data structures we frequently use in our programs. A linked list could be generated using a simple data structure as follows.

class Node {

Node next;

}

To identify a loop within a created linked list using above data structure, we will use a rabbit and a turtle; Rabbit iterating in n steps among the nodes and turtle iterating in step 1. If both collide on a certain node, the list includes a loop. Otherwise, rabbit will reach the end of list before the turtle.

Following is the code snippet implementing above logic where the rabbits go through the list in step size 2. It will return true if the list includes an infinite loop.

public static boolean isInInfiniteLoop(Node node) {

if (node == null) {

return false;

}

Node turtle = node;

Node rabbit = node.next;

while (rabbit != null) {

if (rabbit.equals(turtle)) {

return true;

} else if (rabbit.next == null) {

return false;

} else {

turtle = turtle.next;

rabbit = rabbit.next.next;

}

}

return false;

}

Measuring execution time of a program ( C and C++) via command line in Linux

1.Using gprof in Linux.

For a C file compile it as
gcc -pg filename.c -o test
For a C++ file compile it as
gpp -pg filename.c -o test    (That’s obvious!:))
Then run the file as ./test and a fie named gmon.out would have been created among the working directory.This file could be viewed by gprof test.

2.Using time command in Linux.
Run the executable as time test, It would output the time used .