Tuesday, May 25, 2010

Generate Mapping against target Step

0 comments
In cases where you have a fixed target table, you will want to map the fields from the stream to their corresponding fields in the target output table. This is normally  accomplished using a Select Values step in your transformation. The 'Generate mapping against target' option provides you with an easy to use dialog for defining these mappings that will automatically create the resulting Select Values step that can be dropped into your transformation flow prior to the table output step. To...

Pentaho Spoon Installation

1 comments
The first step is the installation of Sun Microsystems Java Runtime Environment version 1.4 or higher. You can download a JRE After this, you can simply unzip the zipfile: Kettle3.0.0. zip in a directory of your choice. In the Kettle directory where you unzipped the file, you will find a number of files. Under Unixlike environments (Solaris, Linux, MacOS, …) you will need to make the shell scripts executable. Execute these commands to make all shell scripts in the Kettle directory executable: cd...

Friday, May 21, 2010

Group by SQL Query

0 comments
To extract Country wise sales count for the period  MAY-2010 use below code. Select c.country_name, count(s.id) from tbl_sales s inner join tbl_country c on s.country_id=c.country_id where s.saels_date between '2010-05-01 00:00:00' and '2010-05-31 23:59:59' group by c.coutnry_name...

Sunday, May 16, 2010

How to Insert Extracted Values in Pentaho Spoon

0 comments
To insert the extracted records  from the transactional source database into the destination or data warehouse  following is simple way to do it. Open the Pentaho Spoon go to file menu and open new Transformation. in the design mode drill down the Input Folder and drag & drop the Table Input Step, double click the Table Input  and set the source database (How to Set Connection Parameters in Pentaho Spoon)connection parameters this will extract all required fields Again...

Loading XML Data Using Pentaho ETL Tool

0 comments
To load the data from a XML file using pentaho kettle do the followings.  Go to the design mode, drill down the input folder and drag & drop the XML Input Step. Double click the XML Input Step as shown in the image below in the first tab of file click the Brows Button and add your desire XML File by clicking Add Button this will add your file(s) inside the selected files Grid under the File /Directory Grid Label. Below this Grid there is a Show File(s) options that allow you to view your...

SQL Query For Monthly Calculation

0 comments
To get individual library member's monthly status report use the following column level query that includes how many books have been issued and returned in the month of may 2010 for those members who have their registration date in the same month here is the query select m.member_id as "Member ID",m.member_name as "Member Name"         ,(select count(book_id)          from tbl_book_issue b        ...

Saturday, May 15, 2010

Concatenation in Pentaho Kettle

0 comments
To get the  concatenated resulted values of the two variables do the followings. in the design mode drill down the Scripting folder drag and drop the  Modified Java Script Value step Double click the icon  and go the java script value Text area, copy the below code and past it into the Script Value Text Area var fullName =  FirstName+ LastName; here in this code of java “var” is the reserved word for declaring String variables while “fullName is  the name of variable that...

How to Split a String in pentaho Kettle

0 comments
To get values from a comma separated string we can spilt it in following way.In the design mode drill down the Scripting folder drag and drop the  Modified Java Script Value step Double click the the icon  and go the java script value Text area, copy the below code and past it into the Script Value Text Area var employee = EmployeeInfo.getString().split("\\,"); var ID = employee [0]; var  Name= employee [1]; var Address = employee [2]; here...

Thursday, May 13, 2010

Date to String Conversion in Pentaho Kettle

0 comments
Here is the simple conversion from Date to String in the spoonGo the design mode select Modified Java Script Step lying under the Scripting folder drag & drop it. Double click the Java Modified Script  Step and write the following code in the scriptVale var dateToStringConversion = date2str(datefield_name, "yyyyMMdd");where dateToStringConversion  is the name of the variable and datefield_name is the field name to whom you want to convert which is coming from the previous st...

How to get Distinct Count in Pentaho

0 comments
In order to get Distinct Count in the Kettel spoon Go to the design mode drill down the Transformation folder and Drag & drop the  Unique Rows Step this step remove the double and repeated Rows and give only unique occurrences but this works only on sorted input so you need to set a Sorted Rows Step available under the Transformation Folder jus drag & drop it  and make the hope between the two stepsIf the input is not sorted, then only double consecutive rows are handled correctly....

How to Write ETL Script in Pentaho

0 comments
In this posting i m going to exemplify for the beginners that how to write a simple script in the Pentaho Kettle Spoon let say extracting the data from the source or transactional database or from the one data source to destination or target or  another database, here is the example steps. Extract Customer information from Customer_database from MYSQLTransform and Load The extracted data in SQL SERVER Customer_database TableOpen the Pentaho Spoon Go to the file menu and select Transformation...

Wednesday, May 12, 2010

Check Table Column Existence in Pentaho

0 comments
In order to find out the existence of a specific column in a database use the Check if Column Exists Step available under the Lookup Folder in the design mode   Drag & Drop the above icon. Double click the step and name it , defined the connection for MYSQL or SQL SERVER etc. Table Name: The name of the column to whom the column belongs. Table name in  Field: Enabled it if you want to read the name of the table from the input field. Tablename Field:Set fields here that...

Performing Calculations in Pentaho Spoon

0 comments
To perform simple calculations and get new fields on different output fields extracting from the database table on constant values. Go to Transformation in the design mode Drill Down the Transformation folder and drag & drop the Calculator    Double Click the Calculator Step a pop window same as below will appear In the New Field  column name your new Field Select the type of calculation from the list box on the fields you want to perform calculations. Field A, Field B and...

Aggregate Functions in Pentaho Spoon

0 comments
To Calculate the Aggregate Functions likewise Sum(), Average(), Count(), MIn(), Max(), First() and Last() usually on the data coming and reading from text files do the followings Drill Down the Input in the design mode and select Text Input File Double Click the Text File Input in the File tab and brows your file add it  to the selected path In the Content tab select File Type Go to Fields tab and click the Get Fields , it will extract all the fields along...

What is Checksum in Pentaho Spoon

0 comments
By definition checksum is a way to determine the integrity of data in which both the sender and receiver can test the data integrity weather it is correct or not, the sender calculates the checksum value  by taking the sum of binary data transmitted on the other end the receiver can perform the same calculation on the data and compare it with the checksum value provided by the sender. If the two values match, the receiver has surety that the data was received correctly.Pentaho Spoon provide...

Tuesday, May 11, 2010

Hex to integer Conversion in Kettle

0 comments
To convert your Hex value to integer in spoon do the followings Go to the design mode.Drill Down the Scripting step.Drag and drop the Modified Java Script Value.Double Click the Modified Java Script Value and drill down the Transformation Script available at the left hand side of the Script Mod, double click the ScriptValuein the Script Value paste the following linesjava; var nummerke = Packages.java.lang.Integer.valueOf(Hex.getString(), 16).intValue();here Hex is your input value coming...

How to Set Environment Variables in Kettle

0 comments
To set Environment Variables go to Edit menu and click Set Environment Variables or press CTRL+ALT+J, in the name column name your variable and assigned it to any value by default the variable is of String type. You can later in your job or transformation access these variable using GET VARIABLE STEP but it is recommended not to use Environment Variables if your working in multiple or more than two Transformations  or jobs as these variable have global scope and  Changes to the environment...

Monday, May 10, 2010

What is Get Variable in Pentaho Kettle

0 comments
The Get variable step is used to get access your declared variable’s values, whenever you need a variable value you have declared earlier in your script using Set Variable step what you need to do is drag and drop a Get Variable step and do the followings. Go to the Job Folder drag & drop a Get Variable Step Nam the value of accessing variable that you can use in other steps in the second column give the same name of the required variable as you have declared it  in Set Variable Step...

What is Set Variable in Pentaho Kettle

0 comments
  The Set Variable is used to set or declare variables to hold values usually database fields likewise you want to set start_date value which you have extracted from your time_dimension table using INPUT Step so that this start_date field will be accessible through out your script. in the upper text filed named the step and then click the get fields button to have all desired table fields in the field name column that you have extracted using input step, in the second step name the variable...

Sunday, May 9, 2010

Storing Transformations and Jobs in local Files System

0 comments
Spoon provides you with the ability to store transformation and job files to the local file system or in the Kettle repository. The Kettle repository can be housed in any common relational database. This means that in order to load a transformation from a database repository, you need to connect to this repository. To do this, you need to define a database connection to this repository. You can do this using the repositories dialog you are presented with when you start up Spoon bat file you will...

What is Pentaho

0 comments
The Pentaho BI Project is open source application software for enterprise reporting, analysis, dashboard, data mining, workflow and ETL capabilities for business intelligence needs. What is SpoonKettle is an acronym for “Kettle E.T.T.L. Environment”. This means it has been designed to help you with your ETTL needs: the Extraction, Transformation, Transportation and Loading of data. Spoon is a graphical user interface that allows you to design transformations and jobs that can be run with the Kettle...

ETL Scripting with Open Source ETL Tools

2 comments
Extract Transform Load (ETL)Extracting, transforming, and loading (ETL) is a process used in database especially in data warehousing and have followin stepsExtracting data from sources (The transactinal Database)Transforming it to fit operational needs (which can include quality levels)Loading in destination (database or data warehouse or staging where you practice about data) Extraction:The first part of an ETL process involves extracting the data from the source systems usually...

Saturday, May 8, 2010

SELECT AS CLAUSE

0 comments
In SQL Query you can define column heading jus write heading near to field name with out any space but if you want to make space in column heading then use AS clause like wise Column Heading with out AS Clause SELECT LASTNAME, SALARY TotalSalary FROM EMP WHERE WORKDEPT = 'C01' ORDER BY 2 DESC Column Heading with space SELECT LASTNAME, SALARY + BONUS + COMM AS TOTAL EANINGS FROM EMP WHERE WORKDEPT = 'C01' ORDER BY TOTAL_EANINGS D...

Calculations in SQL Query

0 comments
What are total earnings for sales reps? SELECT LASTNAME, SALARY + BONUS + COMM FROM EMP WHERE JOB = 'FIELDREP' Which operators have total earnings less then $20,000? SELECT LASTNAME, SALARY + BONUS + COMM FROM EMP WHERE JOB = 'OPERATOR' AND SALARY + BONUS + COMM < 20...

SQL Distinct Unique Records

0 comments
The Below Query will select all rows this will also include repetative or redundent rows i..e un necessary or duplicated rows  i.e. UNIQUE Records SELECT WORKDEPT FROM EMP Distinct rows: This is will extract all the rows but will remove the duplicated rows 1.SELECT DISTINCT WORKDEPT FROM EMP 2.SELECT DISTINCT WORKDEPT, JOB FROM EMP Distinct Count One person can send many feedack / comments against this post so count the unique no of person who posted their comments                                 ...

SQL IN, NOT IN and NOT BETWEEN Clauses

0 comments
IN Cluase The IN clause onlty extract those values that match or exist in the database against the values which are being given in the IN clause but it does not  diplay the values that do not meet the condition not even with a null values that is why some time we need to make a temporary table to stroe these values and then they are extracted using left joins so that all the values should be visible. 1. SELECT EMPNO, LASTNAME, SALARY FROM EMP WHERE WORKDEPT IN ('D11', 'D21',...

SQL Between Clause

0 comments
The BETWEEN expression  with AND specified the range and tells the database that to select desired records for this range like wise to find out all those transactions that have been done in the month of may-2010. Example Select c.customer_name,i.invoice_no,i.invoice_Date from sales s inner join customer c on s.customer_id=c.customer_id where s.sales_date between '2010-05-01 00:00:00' and '2010-05-31 23:59:5...

SQL Multiple Conditional Queries

0 comments
Which woman designers have an annual salary over $22,000? Exampes 1.SELECT LASTNAME, SALARY FROM EMP WHERE SALARY > 22000 AND JOB = 'DESIGNER' AND SEX = 'F' Who makes more than $40,000 or is a manager? 2.SELECT LASTNAME, FIRSTNAME, SALARY, JOB FROM EMP WHERE SALARY > 40000 OR JOB = 'MANAGER' 3.Which Filed Rep makes more than $2,000 in commissions or who has an education lever higher then 18? SELECT LASTNAME, JOB, COMM FROM EMP WHERE (COMM > 2000 AND JOB = 'FIELDREP') OR EDLEVEL >...

SQL NULL Conditional Queries

0 comments
A special value that represents absence of information. Examples:  1. SELECT DEPTNO, DEPTNAME, MGRNO FROM DEPT WHERE MGRNO IS NULL SQL Server Select top ten emplyees who do not have any manager in SQLSERVER select top ten emplyees who do not have any manager in mysql2. SELECT TOP 10 * FRO M EMPLOYEE WHERE MANGAER IS NOT NULL                             MYSQL                            Select...

SQL Comparison Operators

0 comments
= Equal <> Not Equal > Greater than >= Greater than or equal < Less than <= Greater than or equal Examples: SELECT EMPNO, LASTNAME, SALARY FROM EMP WHERE SALARY >= 20000 SELECT LASTNAME, FIRSTNAME, JOB                                     FROM EMP                                    ...

Saturday, May 1, 2010

Candidate Key

0 comments
In the relational model of databases, a candidate key of a relation is a minimal superkey for that relation; that is, a set of attributes such that 1. the relation does not have two distinct tuples with the same values for these attributes (which means that the set of attributes is a superkey) 2.there is no proper subset of these attributes for which (1) holds (which means that the set is minimal). Since a relation contains no duplicate tuples, the set of all its attributes is a superkey...

Super Key

1 comments
A superkey is defined in the relational model of database organization as a set of attributes of a relation variable (relvar) for which it holds that in all relations assigned to that variable there are no two distinct tuples (rows) that have the same values for the attributes in this set. Equivalently a superkey can also be defined as a set of attributes of a relvar upon which all attributes of the relvar are functionally dependent. A superkey is a set of columns within a table whose values...

Composite & Compound keys

0 comments
In database design, a compound key is a key that consists of 2 or more attributes that uniquely identify an entity occurrence. Each attribute that makes up the compound key is a simple key in its own right. This is often confused with a composite key whereby even though this is also a key that consists of 2 or more attributes that uniquely identify an entity occurrence, at least one attribute that makes up the composite key is not a simple key in its own right. An example might be an entity that...

Foreign key

0 comments
In the context of relational databases, a foreign key is a referential constraint between two tables.[1] The foreign key identifies a column or a set of columns in one (referencing) table that refers to set of columns in another (referenced) table. The columns in the referencing table must be the primary key or other candidate key in the referenced table. The values in one row of the referencing columns must occur in a single row in the referenced table. Thus, a row in the referencing table cannot...

Alternate or Secondary Key

0 comments
With respect to relational databases, an alternate key (or secondary key) is any candidate key which is not selected to be the primary key (PK). For example, a relational database with a table "employee" could have attributes like "employee_id", "national_insurance_number", and so on. In this case, both "employee_id" and "national_insurance_number" serve as unique identifiers for a given employee, and could thus arguably be used for a primary key. Hence, both of them are called "candidate keys"....

Surrogate Key

0 comments
A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data.Two definitions of a surrogate appear in the literature:Surrogate (1) This definition is based on that given by Hall, Owlett and Todd (1976). Here a surrogate represents an entity in the outside world. The surrogate is internally generated by the system but is nevertheless visible to the user or application. Surrogate...