Thursday, May 13, 2010

Date to String Conversion in Pentaho Kettle

0 comments
Here is the simple conversion from Date to String in the spoon
Go the design mode select Modified Java Script Step lying under the Scripting folder drag & drop it.
image
Double click the Java Modified Script  Step and write the following code in the scriptVale
var dateToStringConversion = date2str(datefield_name, "yyyyMMdd");
where dateToStringConversion  is the name of the variable and datefield_name is the field name to whom you want to convert which is coming from the previous step

How to get Distinct Count in Pentaho

0 comments
In order to get Distinct Count in the Kettel spoon
Go to the design mode drill down the Transformation folder and Drag & drop the  Unique Rows Step
image
this step remove the double and repeated Rows and give only unique occurrences but this works only on sorted input so you need to set a Sorted Rows Step available under the Transformation Folder jus drag & drop it  and make the hope between the two steps
image
If the input is not sorted, then only double consecutive rows are handled correctly.

How to Write ETL Script in Pentaho

0 comments
In this posting i m going to exemplify for the beginners that how to write a simple script in the Pentaho Kettle Spoon let say extracting the data from the source or transactional database or from the one data source to destination or target or  another database, here is the example steps.
Extract Customer information from Customer_database from MYSQL
Transform and Load The extracted data in SQL SERVER Customer_database Table
Open the Pentaho Spoon
Go to the file menu and select Transformation or press Ctr+N
Go to the design mode and select Table Input from the Input folder and the Table Output Steps from  the Output folder respectively
Press Shift button and drag from Table input to Table Out Put called a hope having green color that showing all the values shifting to another step.
image
Extracting required fields from the source table representing here in this example as Table Input from MYSQL and loading all the fields in the destination database table representing here as Table output in SQLSERVER.
Double click the each step to set source and destination connections as shown below image name the connection, if working on the local system name local host else IP
select MYSQL from the connection type set ,user id and password followed by the database name and test the connection.
image
after defining the connection write the query in the SQL text area and double click the Table Output set the connection same as defined above here you also have to define the target table that will hold the extracted data as can view in the below image.
image

Wednesday, May 12, 2010

Check Table Column Existence in Pentaho

0 comments
In order to find out the existence of a specific column in a database use the Check if Column Exists Step available under the Lookup Folder in the design mode  
image
  1. Drag & Drop the above icon.
  2. image
  3. Double click the step and name it , defined the connection for MYSQL or SQL SERVER etc.
  4. Table Name: The name of the column to whom the column belongs.
  5. Table name in  Field: Enabled it if you want to read the name of the table from the input field.
  6. Tablename Field:Set fields here that have parameters and its type 
  7. Columnname Field:Set the name of the column field reading from the input field.
  8. Result fieldname:Resulting Boolean flag  that shows weather it exist or not.



Performing Calculations in Pentaho Spoon

0 comments
To perform simple calculations and get new fields on different output fields extracting from the database table on constant values.
  1. Go to Transformation in the design mode
  2. Drill Down the Transformation folder and drag & drop the Calculator
  3.    image
  4. Double Click the Calculator Step a pop window same as below will appear
  5. image
  6. In the New Field  column name your new Field
  7. Select the type of calculation from the list box on the fields you want to perform calculations.
  8. Field A, Field B and Field C combos are available to select your fields that are need to be performed calculations
  9. Select the value type from the combo box and set decimal precision.

Aggregate Functions in Pentaho Spoon

0 comments
To Calculate the Aggregate Functions likewise Sum(), Average(), Count(), MIn(), Max(), First() and Last() usually on the data coming and reading from text files do the followings

  1. Drill Down the Input in the design mode and select Text Input File

  2. image

  3. Double Click the Text File Input in the File tab and brows your file add it  to the selected path

  4. In the Content tab select File Type

  5. Go to Fields tab and click the Get Fields , it will extract all the fields along with data their data types, reading each coma separated field

  6. Go to the design mode again find Deprecated the 3rd last folder drill down it and drag & drop the Aggregate Rows Step as the same icon  mentioned below.

  7. image

  8. Double click the Aggregate Rows a pop window will open click Get Fields it will extract all the fields as it has read from the Text Input File, set the name for each field in the New Name column and finally select the aggregate function from the combo box.
image

What is Checksum in Pentaho Spoon

0 comments
By definition checksum is a way to determine the integrity of data in which both the sender and receiver can test the data integrity weather it is correct or not, the sender calculates the checksum value  by taking the sum of binary data transmitted on the other end the receiver can perform the same calculation on the data and compare it with the checksum value provided by the sender. If the two values match, the receiver has surety that the data was received correctly.
Pentaho Spoon provide the Add a Checksum Step  lying under the  Transformation Tab.


  1. Go to Design
  2. Drill Down Transformation

  3. Drag & Drop Add a Checksum

  4. image

  5. Name the Step

  6. Define the Checksum Algorithm most used Algorithm for  Checksum is MD-5

  7. Click the Get Fields Button, this will extract and show all the fields coming from the previous step at which your going to apply  checksum might be extracting using Input Step from the database table or Stored Procedure
image

Tuesday, May 11, 2010

Hex to integer Conversion in Kettle

0 comments
To convert your Hex value to integer in spoon do the followings
Go to the design mode.
Drill Down the Scripting step.
Drag and drop the Modified Java Script Value.
image
Double Click the Modified Java Script Value and drill down the Transformation Script available at the left hand side of the Script Mod, double click the ScriptValue
in the Script Value paste the following lines
java;
var nummerke = Packages.java.lang.Integer.valueOf(Hex.getString(), 16).intValue();
here Hex is your input value coming from another step, at the bottom of a Script Mod click the Get fields Button now you have get the converted variable with its data type showing in a grid, finally test the script.
image

How to Set Environment Variables in Kettle

0 comments
To set Environment Variables go to Edit menu and click Set Environment Variables or press CTRL+ALT+J, in the name column name your variable and assigned it to any value by default the variable is of String type. You can later in your job or transformation access these variable using GET VARIABLE STEP but it is recommended not to use Environment Variables if your working in multiple or more than two Transformations  or jobs as these variable have global scope and  Changes to the environment variables would conflict in Transformations or jobs for this reason Kettle variables were introduced to provide a way to define variables that are local to the job in which the variable is set. The Set Variable step in a transformation allows you to specify in which job you want to set the variable's scope (i.e. parent job, grandparent job or the root job).
image 
in the same tab of Edit menu you can see your all Environment Variables using Show Environment Variables option, here you can also up and down your Variable on clicking name column header.

Monday, May 10, 2010

What is Get Variable in Pentaho Kettle

0 comments
image
The Get variable step is used to get access your declared variable’s values, whenever you need a variable value you have declared earlier in your script using Set Variable step what you need to do is drag and drop a Get Variable step and do the followings.
image
  1. Go to the Job Folder drag & drop a Get Variable Step
  2. Nam the value of accessing variable that you can use in other steps
  3. in the second column give the same name of the required variable as you have declared it  in Set Variable Step in below format
  4. ${variable} or %%variable%%
Example:
variable value               ${my_variable as declared in Set Varible}