Cluster results visualisation in R with fviz_cluster

R comes as a handy tool for a data scientist. One package I found useful while working in R, is factoextra, a package for visualisation and extraction of results from multivariate data analysis including different dimensionality reduction techniques and clustering methods. The official documentation of factoextra is at http://www.sthda.com/english/rpkgs/factoextra/. It can be installed from CRAN using the command ,

install.packages("factoextra")

fviz_cluster function from factoextra is useful in clustering results visualisation. The function usage details are at http://www.sthda.com/english/rpkgs/factoextra/fviz_cluster.html

A quick example showing the clustering of Iris

#Load iris dataset
data("iris")
irisDataScaled<- scale(as.matrix(iris[, 1:4]))
kmeansClusters<- kmeans(irisDataScaled, 3, nstart = 25)

#Visualisation of resuls using fviz_cluster from factoextra
library("factoextra")
fviz_cluster(kmeansClusters, irisDataScaled, stand = FALSE, geom = "point")

The resulting figure would be

kmeanswithiris

The output from the fviz_cluster function is a ggplot. If you are to use fviz_cluster within a loop,  the output ggplot should be printed out explicitly as

for (i in 1:iterEnd){
.....
print(fviz_cluster(kmeansClusters, irisDataScaled, stand = FALSE, geom = "point")
......
}
Advertisements

Difference between cast and convert commands in MS SQL

While working in MS SQL, we often need to do data conversion from one type to another. Consider , comparing 2 dates in string format. Easiest would be to cast the parameters into DATETIME from VARCHAR.

There are 2 functions supported by SQL Server , for casting, that is ‘CAST’ and ‘CONVERT’
But which is the best function to use?
I wanted to clarify , before proceeding and came up with this wonderful post on an experiment carried out on this matter.

According to Nakul’s experiment, use of both these does not make a big difference and he concludes with the fact that, infact, CAST uses CONVERT in underlying implementation.

Please refer below , for complete reference on this.

http://beyondrelational.com/modules/2/blogs/77/posts/11334/cast-vs-convert-is-there-a-difference-as-far-as-sql-server-is-concerned-which-is-better.aspx

 

 

How rabbit and turtle help in identifying loops in Linked List Data structure.

List is one of the fundamental data structures we frequently use in our programs. A linked list could be generated using a simple data structure as follows.

class Node {

Node next;

}

To identify a loop within a created linked list using above data structure, we will use a rabbit and a turtle; Rabbit iterating in n steps among the nodes and turtle iterating in step 1. If both collide on a certain node, the list includes a loop. Otherwise, rabbit will reach the end of list before the turtle.

Following is the code snippet implementing above logic where the rabbits go through the list in step size 2. It will return true if the list includes an infinite loop.

public static boolean isInInfiniteLoop(Node node) {

if (node == null) {

return false;

}

Node turtle = node;

Node rabbit = node.next;

while (rabbit != null) {

if (rabbit.equals(turtle)) {

return true;

} else if (rabbit.next == null) {

return false;

} else {

turtle = turtle.next;

rabbit = rabbit.next.next;

}

}

return false;

}

Measuring execution time of a program ( C and C++) via command line in Linux

1.Using gprof in Linux.

For a C file compile it as
gcc -pg filename.c -o test
For a C++ file compile it as
gpp -pg filename.c -o test    (That’s obvious!:))
Then run the file as ./test and a fie named gmon.out would have been created among the working directory.This file could be viewed by gprof test.

2.Using time command in Linux.
Run the executable as time test, It would output the time used .