30 August 2012

Getting Started with IBM Netezza

Note: More recent version covering Netezza 7 is available here: http://szahariev.blogspot.com/2013/07/getting-started-with-ibm-puredata.html

Data Warehouse, Business Intelligence, ETL, Data Analysis are terms we here more and more every day. But there is one name that becomes more and more popular: Netezza – state of the art data warehouse appliance offered by IBM. If you want to get started with Netezza, there is a good news: IBM distributes an emulator running under Windows that you can download and deploy at home. So here is what you need:

  1. Download and install VMware player (free) or VMware workstation (paid)
  2. Install VIX API (free) if using VMware player
  3. Download the Netezza emulator by first requesting to join the IBM Netezza Developer Network (NDN) by following this link: https://www.ibm.com/developerworks/mydeveloperworks/groups/service/html/communityview?communityUuid=35ac05e2-4e00-42fe-b252-111e5f3ad8fa

When you join the IBM NDN you will find there not only the emulator but also the required product documentation that will guide you to master the product. Here is what you will get when the emulator is running:

image

Currently IBM does not offer Windows based GUI for querying the Netezza so you will have to connect to the Netezza host and use nzsql by using ssh client.

One 3rd party alternative to the nzsql is the Aginity Workbench for Netezza. It runs under Windows and provides GUI for querying the Netezza. The application will require a free registration after a 10 days trial period. The downside is that you will need a Netezza ODBC or OLE DB driver to connect to the Netezza host. Unfortunately these drivers can be downloaded only by IBM customers.

29 August 2012

Using Notepad++ to Search and Replace using Regular Expressions

Not long time ago I had to modify a 300 lines SQL Server stored procedure that uses columns containing spaces in the names into a version that does not contain spaces. For example columns like [Price Rate] should be converted to Price_Rate. Doing this manually is long and tedious task. Fortunately Notepad++ saved the day once again – use regular expressions.

To to search and replace the spaces for the column names with underscore open the Notepad++ Replace dialog (Ctrl+H) and type the following for the search pattern:

\[([0-9a-zA-Z]*) ([0-9a-zA-Z]*)\]

The above will match everything that starts with [ continues with combination of letters or numbers, has a space after that, has a combination of letter or number after the space and ends with ]

Type the following for the “Replace with” field:

\1_\2

This means that Notepad++ will replace the \1 with the match between the [ and the space from the column name. The \2 will be replaced with the match between the space and the ]. Here we are using one special feature – if part of regular expression is within a round brackets then this match can be tagged using \1, \2, \3, etc.

At the end select “Regular expression” for “Search Mode” and hit the “Replace All” button.

image

For more information about the special character of a regular expressions take a look here.

12 February 2012

Introduction to DSCop

DSCop is an open source tool that analyzes IBM InfoSphere DataStage jobs and reports information such as violation of some commonly accepted best practices. It's developed in C# and provides plugin based architecture to allow 3rd party extensibility. The tool comes with a few sample plugins that should be enough for basic understanding how the tool works and how to implement your own plugins.

Requirements

Computer running version of Microsoft Windows with .NET Framework 4 preinstalled.

Download and Install

The current publicly available version of DSCop is RC1 available here. After download unzip the file and you are ready to use the tool.

Basic Usage Scenario

The tool will automatically search the folder where the DSCop.exe file is located and will load all the plugins. Each plugin contains one or more rules. Each rule enforces certain check that is performed on  a DataStage job. The jobs should be exported to one or many xml files.

To check a job with the tool, use this syntax:

DSCop jobfilename.xml

As a result you will get a list with rules that were executed, jobs processed, rule violations found.

A sample output after running the tool is shown here:

image

DataStage has the ability to export multiple jobs into singe XML file. However if your jobs are not in one file you can use wildcards to specify them like this:

DSCop Staging*.xml

Advanced Usage Scenario

Now we are going to explore some of the advanced use cases where you want to run the tool by including/excluding certain rules.

The syntax for running the tool by explicitly enumerating the rules you want to execute is:

DSCop jobfile.xml –include RuleName1 RuleName2 RuleName3

A sample output using this syntax is shown bellow:

image

Please notice the “*Ignore” next to the rules that are not enforced.

To exclude certain rules use the following syntax:

DSCop jobfile.xml –exclude RileName1, RuleName2, RuleName3

image

Sample Plugins

The RC1 version of the tool comes with the following plugins/rules:

  • CoreRules/StableSortRule – checks all Sort stages whether the StableSort=true. StableSort is enabled by default but should not be used due to decreased performance.
  • CoreRules/TeradataConnectorParametersRule – checks all Teradata connector stages whether the ServerName/Username/Password properties are parameterized. Hardcoding this information should be avoided.
  • NamingRules/PrefixNamingRule – checks if stage names are following predefined naming convention. The naming convention is described in a the file NamingPrefixes.xml located in the same folder as the plugin.

Feedback

You can send your feedback to the following email: dscoptool-at-gmail.com

10 February 2012

Some tips for the SSD owners running Windows

Solid State Drives (SSD) are much different than Hard Disk Drives (HDD). Because of this you must change your usual pattern of usage. Otherwise you risk your shiny new SSD to be at the end of its life only after a few months of use. In this post I will describe some of the tricks that seems to work on my machine.

The main disadvantage of the SSD devices is that each data cell can be re-written limited number of times. This number is several times lower than the data cell in the HDD. So you must make sure that the applications you are using are not constantly writing on the SSD. Bellow is a short list that will extend the life of your SSD.

Must Have

 

Disable Windows Swap File

If Windows runs out of RAM then it moves some of the data to the swap file on your disk drive. This is one of the primary sources of disk write operations on systems running with low RAM memory. Buy more RAM and disable Windows swap file. The RAM you need depends on the applications you are running and the version of Windows. IMHO for Windows 7 you need at least 4GB RAM, 8GB is recommended.

Turn off Disk Defragmenter Schedule

The classic hard disk drives are using a moving head that reads the data from the disk. If the file is not located on sequential blocks on the disk, the head positioning time will increase and this will slow down the read process. The defragmentation process makes sure that the file blocks are located in a sequence on the disk. The SSD do not suffer from this since there is no head that reads the information. So you don’t need this feature.

If you perform defragmentation on SSD drive this will only drain write cycles from its life.

Windows 7 automatically detects SSD drives and turns of the defragmentation but it does not hurt to check if it has been stopped. On Windows 7 you can disable the Disk Defragmenter Schedule like this.

Move Temp Folders to RAM Drive

RAM drive is a disk drive that looks looks like a normal drive for your windows but stores the information in the RAM instead of using a SSD or HDD. The operations with RAM drive are several times faster than the operations with SSD or HDD. The downside is that you lose the drive content after powering off your computer. But the Temp folder is used by Windows to store temporary files not needed after power off. So you don’t have to worry that you will lose something important.

Moving the windows Temp folder to a RAM drive will decrease the write operations on your SSD and this prolongs its life. Here is how to move the Windows Temp folder to another location. The only think you should select is the software that emulates your RAM drive since Windows still does not have such a feature. For example the QSoft’s RAMDisk does a perfect job for me. You can download a free version that will expire after 6 months (at which point you need to download new free version which will be active for another 6 months).

You can even move the location of Internet Temp folder but this will slow down your browsing since there will be no cached images or pages and everything will be downloaded again after you power off the PC.

Turn off Windows Search

Windows Search is another feature that generates a lot of write operations due to the indexing process. The SSD are quite fast on reading data so you don’t need indexes to speed up your search. You can turn it off like this on Windows 7 systems. Of course if you are not satisfied you can always turn it on.

Monitor the Overall Health of the Drive

Install an application that will report your drive health. For example a good choice is the free version of SSDLife. It can be downloaded here. The most useful metric is the approximate date when the drive is supposed to failure.

image

 

Nice To Have

 

Hibernation

Hibernation becomes very tempting when combined with fast disk drive. The down side is that this feature stores GB of information (depending on you RAM size) every time when the system is hibernated. Having in mind that the SSD drives are sensible to the number of writes it may quickly drain the life from your new SSD drive. And of course after disabling you get some more free space on your drive. Here is how to do it on Windows 7.

Disable System Restore

System Restore is helpful if you mess up your system – for example installing wrong device driver. However, this feature consumes significant amount of free space on your SSD drive. If you are suffering from free space problems and you are confident that changes you make to your system are safe you can disable System Restore.

28 April 2010

Migrating ASP.NET MVC 1.0 to MVC 2.0: Real World Scenario

As most of you have noticed, ASP.NET MVC v.2.0 has been released last month. The new version introduces lots of cool features, so most of the existing MVC 1 applications will be upgraded to the new release. In the current post I will try to share my experience with migrating existing ASP.NET MVC 1.0 application to MVC 2.0.

Scenario: There is an existing ASP.NET MVC 1.0 web application build on top of .NET Framework 3.5, jQuery and Visual Studio 2008. The goal is to migrate the application to MVC 2, while keeping the other libraries and tools (.NET 3.5, VS2008, etc).

Identifying the breaking changes

The first step from the process would be to check the ASP.NET MVC 2.0 breaking changes. Naturally after carefully evaluating each item in the breaking changes list, I have find out that the following will be a problem: “JsonResult now responds only to HTTP POST requests”. The problem was caused by a jQuery plug-in that uses only HTTP GET. So the solution was to:

  • Replace the plug-in by someone more configurable that can use HTTP POST. You should do it in case that your application exposes sensitive information and is vulnerable to the attack described here.
  • Explicitly allow HTTP GET on JsonResults. You can do it by using the JsonRequestBehavior.AllowGet
[AcceptVerbs(HttpVerbs.Get)]
public JsonResult GetData()
{
//Some other code
return Json(data, JsonRequestBehavior.AllowGet);
}


I was lucky that the information exposed in my application was not sensitive, so I decided to use the JsonRequestBehavior.AllowGet.


Migrating the solution to build against ASP.NET MVC 2.0


Being confident that I have resolved all breaking changes I had to think about migrating the solution to use the new version of MVC 2.0. If you don’t want to do everything by hand, you should use this tool. The tool is build by one of the Microsoft employees and works GREAT. However, I have noticed a few gotchas. The tool insists to backup your project before the conversion. This seems redundant, because usually the source code stays in code repository and if somehow the conversion produces a mess, everything could be restored. After all I had to wait a few extra minutes for the backup (more than 1GB in my case). The second issue that I have found is the update of the jQuery files. The tool updates the jQuery and Microsoft AJAX libraries. Since the web site was relying on several other 3rd party jQuery plug-ins I didn’t want the jQuery upgrade. So I had to manually remove the update. Despite of the above, the tool is absolutely FANTASTIC an you will need it.


Running the application


Up to now, everything went pretty well and the solution compiled without problems. So I was ready to run it. After hitting F5 I was stunned. The application started to close and open pages by itself! With the help of a few unit tests and Goolge I was able to identify the source of the problem. It turns out that there is another undocumented breaking change: a value from TempData dictionary will be removed after the request in which it is read! You can read more here. The fix was relatively easy and soon everything was working as usual.


Bottom line


The migration process from ASP.NET MVC 1.0 to MVC 2.0 is relatively easy and you should do it. The new features are awesome.

26 March 2010

ASP.NET MVC Best Practices – Routes to Ignore

Routing is a key feature from the ASP.NET MVC. If you are not familiar with this concept, take a look here, because I’m not going to explain it. I’m going to share some routes that you should ignore in your configuration.
So maybe you are wondering, why should I ignore some routes? And the simple answer is: Because most of the time, different search engines/crawlers/bots will try to index your site and they will request for specific files. For example, Google will try to access the following file: http://yoursite.com/robots.txt. The robots.txt file is used to to give instructions about your site to web robots. More info can be found here. Up to now, everything sounds great, but if your site does not have robots.txt file, the ASP.NET MVC framework will raise an exception upon a request like the above. So your log, where the unhandled exceptions are reported can be flooded with messages like:
System.Web.HttpException : A public action method 'robots.txt' could not be found
on controller 'YourPage.Controllers.YourController'.
Obviously, there are two ways to handle this – provide a dummy robots.txt file or instruct the ASP.NET MVC framework to ignore such URL’s. I prefer the second approach. In this post I will try to summarize all the URL’s that should be ignored by the ASP.NET application. So, here they are:
You need to ignore the routes above, ONLY if your site do not uses such a file. If you are not familiar with the sitemaps concept,  here is a quick overview that answers to most of the questions.
Here is the code, taken from the Global.asax.cs file that should ignore some of the URL’s outlined above:
public static void RegisterRoutes(RouteCollection routes)
{
routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

routes.IgnoreRoute("robots.txt");
routes.IgnoreRoute("sitemap");
routes.IgnoreRoute("sitemap.gz");
routes.IgnoreRoute("sitemap.xml");
routes.IgnoreRoute("sitemap.xml.gz");
routes.IgnoreRoute("google_sitemap.xml");
routes.IgnoreRoute("google_sitemap.xml.gz");
routes.IgnoreRoute("favicon.ico");

//Rest of the code is ommited
}
I will update the list above as soon as I find a new URL that should be ignored, so stay tuned. As well you are welcome to contribute to this list by posting a comment.

14 February 2010

Some SharePoint 2010 Articles

Few months ago I have installed the beta version of the SharePoint 2010. The product looks really cool, although some of the requirements like: “must have Windows 2008 64bit” are really annoying.

In the past few weeks I have published some articles in one new site dedicated to SharePoint technologies called SharepointMonitor.com. If somebody is interested, here is the list:

I have plans for several other articles, so stay tuned. I will update this post when something new comes out.

17 January 2010

NHibernate: Display executed SQL at the bottom of an ASP.NET page (MVC or WebForms)

Almost every modern application that uses relational database is build on top of ORM tool. The ORM tools can significantly decrease the efforts for interacting with the database, but unfortunately most of the developers are not familiar with the SQL code that is produced by the ORM tools. I’m not hiding that my favorite ORM is NHibernate. I’m using it since early beta, but from time to time even I’m surprised by the produced SQL. So, you need to monitor closely what SQL is executed. This blog post will describe one not widely used but very useful approach – the executed SQL will be displayed as a footer on any ASP.NET web page. It does not matter if the web page is using MVC or WebForms. Interested? Then continue reading.

Currently, there are several ways to display the SQL produced by NHibernate:

  • Database profiler tool – some of the modern RDBMS systems are coming with a profiler tool that can hook to the database engine and display the executed SQL. For example SQL Server comes with SQL Server Profiler. Unfortunately the SQL Server Express edition is missing this tool.
  • NHibernate Profiler – this is a third party tool that hooks to your application and monitors the NHibernate. Comes with cool WPF UI interface and lots of other features. Unfortunately at the time of writing you will need to purchase a license to use it. As well you will need to switch from your application to the tool after each action to monitor the produced SQL.
  • Log4Net – you can configure the log4net library to save the produced SQL to a file for example. Unfortunately reading long files with SQL statements is too boring. Sometimes it is tricky to figure out the SQL executed as a result of single action.
  • Display the executed SQL at the bottom of an ASP.NET web page. The idea is to have an IHttpModule module that will inject the SQL code at the end of each request. This approach saves time, because the SQL is visible instantaneously after the action is executed. Unfortunately this trick works only for web applications.

There are several ways to access the SQL that has been executed by NHibernate. I will use a NHibernate interceptor. The code for the interceptor looks like this:

public class NHibernateSQLMonitor : EmptyInterceptor
{
public static void Init(Configuration config)
{
config.SetInterceptor(new NHibernateSQLMonitor());
}

private static StringBuilder mExecutedSQL = new StringBuilder();

public static string ExecutedSQL
{
get
{
return mExecutedSQL.ToString();
}
}

public static void ClearExecutedSQL()
{
mExecutedSQL = new StringBuilder();
}

public override NHibernate.SqlCommand.SqlString OnPrepareStatement(
NHibernate.SqlCommand.SqlString sql)
{
mExecutedSQL.AppendLine(sql.ToString());

return base.OnPrepareStatement(sql);
}
}



The important method is OnPrepareStatement. This method is invoked every time NHibernate executes SQL statement. The sql parameter holds the SQL that will be executed.


Before using it, the interceptor must be registered. One possible way to do it is like this:

NHibernate.Cfg.Configuration cfg = new NHibernate.Cfg.Configuration();
NHibernateSQLMonitor.Init(cfg);

ISessionFactory sessionFactory = cfg.BuildSessionFactory();



Now everything is ready for our IHttpModule that will inject the executed SQL at the bottom of a web page. Here is the implementation:

public class NHibernateSQLMonitorModule : IHttpModule
{
public void Init(HttpApplication context)
{
context.PostRequestHandlerExecute += new EventHandler(
PostRequestHandlerExecute);
}

void PostRequestHandlerExecute(object sender, EventArgs e)
{
HttpContext httpContext = ((HttpApplication)sender).Context;
HttpResponse response = httpContext.Response;

if (response.StatusCode == 302)
{
//Browser performs redirect. Do nothing.
//The executed SQL will be shown on the next page
}
else if (response.ContentType == "text/html")
{
response.Write("<hr>");
response.Write("<b>SQL Executed by NHibernate</b>");
response.Write("<br>");

string executedSQL = NHibernateSQLMonitor.ExecutedSQL.Replace(
"\n", "<br>");
response.Write(executedSQL);

NHibernateSQLMonitor.ClearExecutedSQL();
}
}

public void Dispose()
{
//Not required
}
}



Now there is one final step that should be done – the HttpModule needs to be registered in the config file of the application. To do it open the web.config file, find the <httpModules> section (<modules> if you are running IIS7) and place the following line there:

<add name="NHibernateSQLMonitor"
type="Data.NHibernateSQLMonitorModule, Data"/>



Data is the assembly name where the NHibernateSQLMonitorModule class is defined. Now everything is ready and if there are no compilation errors you should see something like this:


image


After adding new record, the page will look like this:


image


So, what about improvements:


Instead of using NHibernate interceptor for monitoring the executed SQL it is possible to display the output from the Log4Net. The log4net output contains the SQL parameter values which may be a great benefit. As you may have noticed, the values in the screenshot above are replaced by the ? character.


The current version uses static variable to hold the executed SQL. If there is more than one user interacting with the application, this will lead to some synchronization problems. For example, you can extend the NHibernateSQLMonitor class to use the ASP.NET Session store. This way the user will see only the SQL code executed as a result of his actions.


P.S. This post was inspired long time ago after reading the Steve Sanderson’s Pro ASP.NET MVC Framework. He has a similar application that connects to Linq to SQL and displays the executed SQL.

09 December 2009

eBay Architecture

Recently I was curious to find out how big sites like eBay are build and running. After a quick research I was able to find some quite interesting articles.

I was surprised to find out that eBay doesn't use transactions. As well they do not use foreign keys in their database (the same thing as the SharePoint_AdminContent database). It looks like this is the latest trend ;-)

If you are curious like me, take a look at the following:

Randy Shoup on eBay's Architectural Principles
eBay Architectural Strategies, Patterns, and Forces
Software Engineering Radio: eBay's Architecture Principles with Randy Shoup
Scalability Best Practices: Lessons from eBay
Dan Pritchett on Architecture at eBay
You Scaled Your What?
High Scalability: eBay Architecture

24 August 2009

jQuery FullCalendar and ASP.NET MVC

Recently I had to integrate jQuery FullCalendar into ASP.NET MVC application. Up to now I was not able to find such an example, so I will try to provide one. I assume that you are familiar with ASP.NET MVC, jQuery and FullCalendar component, so I’m not going to introduce each technology.
image I will use the default ASP.NET MVC Web Application template to create new project. Please, note that the jQuery is included by default, so there is no need to download and reference it.
The first step would be to place FullCalendar JavaScript and style-sheet files inside project’s Scripts and Content directory as shown in the picture.
Then we need to reference the files we just added to our project. I will do it by placing the following lines of code inside the <head> section of Site.Master file:
<head runat="server">
   <title><asp:ContentPlaceHolder ID="TitleContent" runat="server" /></title>
   <link href="../../Content/Site.css" rel="stylesheet" type="text/css" />
   <link href="../../Content/fullcalendar.css" rel="stylesheet" type="text/css" />
   <script src="../../Scripts/jquery-1.3.2.js" type="text/javascript"></script>
   <script src="../../Scripts/fullcalendar.js" type="text/javascript"></script>
</head>
Now we are ready to use the calendar routines inside our views. How to do it? Just create a div tag and render the calendar’s HTML code inside. Here is how to modify Home’s Index.aspx:
<asp:Content ID="indexContent" ContentPlaceHolderID="MainContent" runat="server">

   <script type="text/javascript">
       $(document).ready(function() {
           $('#calendar').fullCalendar({
               events: "/Home/CalendarData"
           });
       });  
   </script>

   <div id="calendar">
   </div>
</asp:Content>
The code above will render the calendar’s HTML inside a div with id=”calendar”. The calendar data will be delivered by invoking the following URL: /Home/CalendarData. This corresponds to CalendarData method from the Home controller. This controller is supposed to return the data in Json format. Here is a sample implementation:
    [HandleError]
   public class HomeController : Controller
   {
       public ActionResult CalendarData()
       {
           IList<CalendarDTO> tasksList = new List<CalendarDTO>();

           tasksList.Add(new CalendarDTO
           {
               id = 1,
               title = "Google search",
               start = ToUnixTimespan(DateTime.Now),
               end = ToUnixTimespan(DateTime.Now.AddHours(4)),
               url = "www.google.com"
           });
           tasksList.Add(new CalendarDTO
           {
               id = 1,
               title = "Bing search",
               start = ToUnixTimespan(DateTime.Now.AddDays(1)),
               end = ToUnixTimespan(DateTime.Now.AddDays(1).AddHours(4)),
               url = "www.bing.com"
           });

           return Json(tasksList);
       }

       private long ToUnixTimespan(DateTime date)
       {
           TimeSpan tspan = date.ToUniversalTime().Subtract(
    new DateTime(1970, 1, 1, 0, 0, 0));

           return (long)Math.Truncate(tspan.TotalSeconds);
       }

       public ActionResult Index()
       {
           return View();
       }

       public ActionResult About()
       {
           return View();
       }
   }
The code above creates two calendar entries called “Google search” and “Bing search”. Everything should be pretty simple, except the stuff around ToUnixTimespan routine.
There is well known problem with serialization of dates in Json format. There is no strict standard, so there are several approaches to this problem. For example, take a look here. The implementation adopted by Microsoft was not recognized by FullCalendar, so I had to introduce the ToUnixTimespan routine. Basically, this routine returns the seconds after 1/1/1970.
Because of the above, you should notice that the start and end dates are represented as int:
    public class CalendarDTO
   {
       public int id { get; set; }
       public string title { get; set; }
       public long start { get; set; }
       public long end { get; set; }
       public string url { get; set; }
   }
If you have done everything correct, the final result will be
image

Enjoy!

Edit: Due to the higher interset I have published the source code of this post here. Please note that Visual Studio 2008 is used and you need to convert the project if more recent version is used.

21 June 2007

Anti-development methodologies

Software industry is growing constantly and development methodologies are following the same trend. Internet is full with shiny articles describing each methodology and all the benefit. Most of the people I'm talking are bored when they hear stuff like this. If you are like them, don't miss this article. A friend of mine just send me the link. Also, check out the comments. All of them are about dummy development practices. Some of the titles are cynical, but there is a lot of truth in each line.
Enjoy!