TOP 100 Data Warehousing Interview Questions and Answers pdf fresher and experienced ~ all interview questions answers pdf

Read the most frequently asked 100 top Data Warehousing interview questions and answers for freshers and experienced.

Data Warehousing Interview Questions and Answers PDF Experienced Freshers

1 What is Meta data?
Metadata is data about data. E.g. if in data mart we are receiving any file. Then metadata will contain information like how many columns, file is fix width/limited, ordering of fields, data types of field etc.

2 Briefly state different between data ware house & data mart?
Data warehouse is made up of many datamarts. DWH contain many subject areas. However, data mart focuses on one subject area generally. E.g. If there will be DHW of bank then there can be one data mart for accounts, one for Loans etc. This is high-level definitions.

3 What is galaxy schema?
Galaxy schema is also known as fact constellation scheme. It requires no of fact tables to share dimension tables. In data, wares housing mainly the people are using the conceptual hierarchy.

4 Suppose you are filtering the rows using a filter transformation only the rows meet the condition pass to the target. Tell me where the rows will go that does not meet the condition.
Informatica filter transformation default value is 1 i.e. true. If you place a break point on filter transformation and run the mapping in a debugger mode, you will find these values 1 or 0 for each row passing through filter. If you change 0 to 1, the particular row will be passed to next stage.

5 After we create a SCD table, can we use that particular Dimension as a dimension table for Star Schema?
Yes.

6 What is Core Dimension?
Core Dimension is a Dimension table, which is used dedicated for single fact table or Datamart. Conform Dimension is a Dimension table which is used across fact tables or Data marts.

7 How much data hold in one universe.
Universe does not hold any data. However, practically the universe is known to have issues when the objects cross 6000.

8 Can any one explain about Core Dimension, Balanced Dimension, and Dirty Dimension?
Dirty Dimension is nothing but Junk Dimensions. Core Dimensions are dedicated for a fact table or Data mart. Conformed Dimensions are used across fact tables or Data marts.

9 Can any one explain the Hierarchies level Data warehousing.
In Data warehousing, levels are columns available in dimension table. Levels are having attributes. Hierarchies are used for navigational purpose; there are two types of Hierarchies. You can define hierarchies in top down or bottom up.
1. Natural Hierarchy: Best example is Time Dimension - Year, Month, Day etc. In natural Hierarchy definite relationship exists between each level
2. Navigational Hierarchy: You can have levels like
Ex - Production cost of Product, Sales Cost of Product.
Ex - Lead Time defined to procure, Actual Procurement time,
In this, two levels need not to have relationship. This Hierarchy is created for navigational purpose.

10 What is data cleaning? How can we do that?
Data cleaning is a self-explanatory term. Most of the data warehouses in the world source data from multiple systems - systems that were created long before data warehousing was well understood, and hence without the vision to consolidate the same in a single repository of information. In such a scenario, the possibilities of the following are there:
? Missing information for a column from one of the data sources;
? Inconsistent information among different data sources;
? Orphan records;
? Outlier data points;
? Different data types for the same information among various data sources, leading to improper conversion;
? Data breaching business rules

In order to ensure that the data warehouse is not infected by any of these discrepancies, it is important to cleanse the data using a set of business rules, before it makes its way into the data warehouse.

11 What is dimension modeling?
A logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. There are different data modeling concepts like ER Modeling (Entity Relationship modeling), DM (Dimensional modeling), Hierarchal Modeling, Network modeling. However, popular are ER and DM only.

12 Where the cache files stored?
Caches are stored in Repository.

13 How can you import tables from a database?
In Business Objects Universe Designer you can open Table Browser and select the tables needed then insert them to designer.

14 What is drilling across?
Drill across corresponds to switching from 1 classification in 1 dimension to a different classification in different dimension.

15 How Many different schemas or DW Models can be used in Siebel Analytics. I know Only STAR and SNOW FLAKE and any other model that can be used?
Integrated schema design is also used to define an integrated schema design we have to define the following concepts
? Fact constellation
? Act less fact table
? Onformed dimension
A: Â A fact constellation is the process of joining two or more fact tables
B: A fact table with out any facts is known as fact less fact table
C:A dimension which is re useful and fixed is known as conformed dimensionA dimension, which is, shared with multiple fact tables known as conformed dimension

16 What is an error log table in Informatica occurs and how to maintain it in mapping?
Error Log in Informatica is a one of output file created by Informatica Server while running the session for error messages. It is created in Informatica home directory.

17 What is loop in Data warehousing?
In DWH loops may exist between the tables. If loops exist, then query generation will take more time, because more than one path is available. It creates ambiguity also. Loops can be avoided by creating aliases of the table or by context.
Example: 4 Tables - Customer, Product, Time, Cost forming a close loop. Create alias for the cost to avoid loop.

18 How many clustered indexes can u create for a table in DWH? In case of truncate and delete command what happens to table, which has unique id.
You can have only one clustered index per table. If you use delete command, you can rollback... it fills your redo log files.
If you do not want records, you may use truncate command, which will be faster and does not fill your redo log file.

19 What is hybrid slowly changing dimension?
Hybrid SCDs are combination of both SCD 1 and SCD 2.It may happen that in a table, some columns are important and we need to track changes for them i.e. capture the historical data for them whereas in some columns even if the data changes, we don't care.For such tables we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.You can add that it is not an intelligent key but similar to a sequence number and tied to a timestamp typically!

20 Can a dimension table contain numeric values?
Yes. However, those data type will be char (only the values can numeric/char).Yes, dimensions even contain numerical because these are descriptive elements of our business.

21 How do you create Surrogate Key using Ab Initio?
There are many ways to create Surrogate key but it depends on your business logic. Here you can try these ways.
1. Use next in sequence () function in your transform
2. Use Assign key values component (if your GDE is higher than 1.10)
3. Write a stored proc to this and call this store proc wherever you need.Yes, dimension table contains numeric but not contain measures and facts

22 What is the difference between star and snowflake schemas?
Star schema:
A single fact table with N number of DimensionSnowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

23 What is a CUBE in data warehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

24 What is the difference between Snowflake and Star Schema? What are situations where Snowflake Schema is better than Star Schema when the opposite is true?
Star schema contains the dimension tables mapped around one or more fact tables.It is a renormalized model and no need to use complicated joins. Also Queries results fast.Snowflake schema is the normalized form of Star schema. It contains in-depth joins, because the tables are spited in to many pieces. We can easily do modification directly in the tables.We have to use complicated joins, since we have more tables. There will be some delay in processing the Query.

25 What is ER Diagram?
The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram, which is used to visually represent data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design for the database designer, the utility of the ER model is: it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables. It is simple and easy to understand with a minimum of training. Therefore, the database designer to communicate the design to the end user can use the model. In addition, the model can be used as a design plan by the database developer to implement a data model in specific database management software.

26 What is degenerate dimension table?
Degenerate Dimensions: If a table contains the values, which r neither dimension nor measures is called degenerate dimensions. For example invoice id, employee no.A degenerate dimension is data that is dimensional in nature but stored in a fact table.

27 What is VLDB?
The perception of what constitutes a VLDB continues to grow. A one-terabyte database would normally be considered VLDB.Degenerate dimension: it does not have any link with dimensions and it will not have any attribute.
28 What is Dimensional Modeling?
Dimensional Modeling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.Dimension modeling is a method for designing data warehouse. Three types of modeling are there
1. Conceptual modeling
2. Logical modeling
3. Physical modeling

29 What are the various ETL tools in the Market?
Various ETL tools used in market are Informatica Data Stage Oracle Warehouse Builder Ab Initio Data Junction

30 What are the possible data marts in Retail sales?
Product information and sales information

31 What is meant by metadata in context of a Data warehouse and how it is important?
Metadata is the data about data; Business Analyst or data modeler usually capture information about data - the source (where and how the data is originated), nature of data (char, varchar, nullable, existence, valid values etc) and behavior of data (how it is modified / derived and the life cycle) in data dictionary a.k.a metadata.
Metadata is also presented at the Datamart level, subsets, fact and dimensions, ODS etc. For a DW user, metadata provides vital information for analysis / DSS.

32 What is a linked cube?
Linked cube in which a sub-set of the data can be analyzed into detail. The linking ensures that the data in the cubes remain consistent.

33 What is surrogate key? Where we use it? Explain with examples.
Surrogate key is a substitution for the natural primary key.It is just a unique identifier or number for each row that can be used for the primary key to the table. The only requirement for a surrogate primary key is that it is unique for each row in the table.
Data warehouses typically use a surrogate, (also known as artificial or identity key), key for the dimension tables primary keys. They can use Info sequence generator, or Oracle sequence, or SQL Server Identity values for the surrogate key.
It is useful because the natural primary key (i.e. Customer Number in Customer table) can change and this makes updates more difficult.
Some tables have columns such as AIRPORT_NAME OR CITY_NAME which are stated as the primary keys (according to the business users) but ,not only can these change, indexing on a numerical value is probably better and you could consider creating a surrogate key called, say, AIRPORT_ID. This would be internal to the system and as far as the client is concerned, you may display only the AIRPORT_NAME.

34 What are the data types present in BO? What happens if we implement view in the designer n report?
Three different data types: Dimensions, Measure, and DetailView is nothing but an alias and it can be used to resolve the loops in the universe.

35 What are data validation strategies for data mart validation after loading process?
Data validation is to make sure that the loaded data is accurate and meets the business requirements. Strategies are different methods followed to meet the validation requirements.

36 What is Data warehousing Hierarchy?
Hierarchies are logical structures that use ordered levels as a means of organizing data. A hierarchy can be used to define data aggregation. For example, in a time dimension, a hierarchy might aggregate data from the month level to the quarter level to the year level. A hierarchy can also be used to define a navigational drill path and to establish a family structure.Within a hierarchy, each level is logically connected to the levels above and below it. Data values at lower levels aggregate into the data values at higher levels. A dimension can be composed of more than one hierarchy. For example, in the product dimension, there might be two hierarchies--one for product categories and one for product suppliers.Dimension hierarchies also group levels from general to granular. Query tools use hierarchies to enable you to drill down into your data to view different levels of granularity. This is one of the key benefits of a data warehouse.When designing hierarchies, you must consider the relationships in business structures. Hierarchies impose a family structure on dimension values. For a particular level value, a value at the next higher level is its parent, and values at the next lower level are its children. These familial relationships enable analysts to access data quickly.

37 What is BUS Schema?
BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

38 What are the methodologies of Data Warehousing?
Every company has methodology of their own. However, to name a few SDLC Methodology, AIM methodology is standard used.

39 What is conformed fact?
Conformed dimensions are the dimensions, which can be used across multiple Data Marts in combination with multiple facts tables accordingly

40 What is Difference between E-R Modeling and Dimensional Modeling?
Basic difference is E-R modeling will have logical and physical model. Dimensional model will have only physical model. E-R modeling is used for normalizing the OLTP database design.Dimensional modeling is used for de-normalizing the ROLAP/MOLAP design.

41 Why fact table is in normal form?
The fact table consists of the Index keys of the dimension/look up tables and the measures. So whenever we have the keys in a table. That it implies that the table is in the normal form.

42 What is junk dimension? What is the difference between junk dimension and degenerated dimension?
Junk dimension: Grouping of Random flags and text attributes in a dimension and moving them to a separate sub dimension. Degenerate Dimension: Keeping the control information on Fact table ex: Consider a Dimension table with fields like order number and order line number and have 1:1 relationship with Fact table, In this case this dimension is removed and the order information will be directly stored in a Fact table in order eliminate unnecessary joins while retrieving order information.

43 What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differed in the concept of building the data warehouse.According to Kimball, Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence, the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.

44 What is the difference between view and materialized view?
View - store the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. Materialized view - stores the results of the SQL in table form in the database. SQL statement only executes once and after that every time you run the query, the stored result set is used. Pros include quick query results.

45 What is the advantages data mining over traditional approaches?
Data Mining is used for the estimation of future. For example, if we take a company/business organization, by using the concept of Data Mining, we can predict the future of business in terms of Revenue (or) Employees (or) Customers (or) Orders etc.Traditional approaches use simple algorithms for estimating the future. However, it does not give accurate results when compared to Data Mining.

46 What are the steps to build the data warehouse?
Gathering business requirements>>Identifying Sources>>Identifying Facts>>Defining Dimensions>>Define Attributes>>Redefine Dimensions / Attributes>>Organize Attribute Hierarchy>>Define Relationship>>Assign Unique Identifiers

47 What is the data type of the surrogate key?
Data type of the surrogate key is integer, numeric, or number.

48 What is a source qualifier?
When you add a relational or a flat file source definition to a mapping, you need to connect it to a Source Qualifier transformation. The Source Qualifier represents the rows that the Informatica Server reads when it executes a session.

49 What do you mean by static and local variable?
Static variable is not created on function stack but is created in the initialized data segment and hence the variable can be shared across the multiple call of the same function. Usage of static variables within a function is not thread safe.On the other hand, local variable or auto variable is created on function stack and valid only in the context of the function call and is not shared across function calls.

50 What are the different types of data warehousing?
Types of data warehousing are:
1. Enterprise Data warehousing
2. ODS (Operational Data Store)
3. Data Mart

51 What are Fact, Dimension, and Measure?
Fact is key performance indicator to analyze the business. Dimension is used to analyze the fact. Without dimension there is no meaning for fact.

52 What is the data type of the surrogate key?
There is no data type for a Surrogate Key. Requirement of a surrogate Key: UNIQUE Recommended data type of a Surrogate key is NUMERIC.

53 What are Data Marts?
A data mart is a collection of tables focused on specific business group/department. It may have multi-dimensional or normalized. Data marts are usually built from a bigger data warehouse or from operational data.

54 What are the differences between star and snowflake schema?
Star schema: A single fact table with N number of DimensionSnowflake schema: Any dimensions with extended dimensions are known as snowflake schema.

55 What is a cube in data warehousing concept?
Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values.

56 What are the difference between Snow flake and Star Schema? What are situations where Snow flake Schema is better than Star Schema to use and when the opposite is true?
Star schema contains the dimension tables mapped around one or more fact tables. It is a renormalized model and no need to use complicated joins. Also queries results fast.Snowflake schema: It is the normalized form of Star schema. It contains in-depth joins, because the tables are split in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins, since we have more tables.There will be some delay in processing the query.

57 What is Dimensional Modelling?
Dimensional Modelling is a design concept used by many data warehouse designers to build their data warehouse. In this design model all the data is stored in two types of tables - Facts table and Dimension table. Fact table contains the facts/measurements of the business and the dimension table contains the context of measurements i.e., the dimensions on which the facts are calculated.

58 What is Log Switch?
The point at which ORACLE ends writing to one online redo log file and begins writing to another is called a log switch.

59 What is On-line Redo Log?
The On-line Redo Log is a set of tow or more on-line redo files that record all committed changes made to the database. Whenever a transaction is committed, the corresponding redo entries temporarily stores in redo log buffers of the SGA are written to an on-line redo log file by the background process LGWR. The on-line redo log files are used in cyclical fashion.

60 Which parameter specified in the DEFAULT STORAGE clause of CREATE TABLESPACE cannot be altered after creating the table space?
All the default storage parameters defined for the table space can be changed using the ALTER TABLESPACE command. When objects are created their INITIAL and MINEXTENS values cannot be changed.

61 What are the steps involved in Database Startup?
Start an instance, Mount the Database and Open the Database.

62 What are the steps involved in Instance Recovery?
Rolling forward to recover data that has not been recorded in data files yet has been recorded in the on-line redo log, including the contents of rollback segments. Rolling back transactions that have been explicitly rolled back or have not been committed as indicated by the rollback segments regenerated in step a.
1) Releasing any resources (locks) held by transactions in process at the time of the failure.
2) Resolving any pending distributed transactions undergoing a two-phase commit at the time of the instance failure.

63 Can Full Backup be performed when the database is open?
No

64 What are the different modes of mounting a Database with the Parallel Server?
Exclusive Mode If the first instance that mounts a database does so in exclusive mode, only that Instance can mount the database. Parallel Mode If the first instance that mounts a database is started in parallel mode, other instances that are started in parallel mode can also mount the database.

65 What are the advantages of operating a database in ARCHIVELOG mode over operating it in NO ARCHIVELOG mode?
Complete database recovery from disk failure is possible only in ARCHIVELOG mode. Online database backup is possible only in ARCHIVELOG mode.

66 What are the steps involved in Database Shutdown?
Close the Database; Dismount the Database and Shutdown the Instance.

67 What is Archived Redo Log?
Archived Redo Log consists of Redo Log files that have archived before being reused.

68 What is Restricted Mode of Instance Startup?
An instance can be started in (or later altered to be in) restricted mode so that when the database is open connections are limited only to those whose user accounts have been granted the RESTRICTED SESSION system privilege.

69 What is Partial Backup?
A Partial Backup is any operating system backup short of a full backup, taken while the database is open or shut down.

70 What is Mirrored on-line Redo Log?
A mirrored on-line redo log consists of copies of on-line redo log files physically located on separate disks; changes made to one member of the group are made to all members.

71 What is a full backup?
A full backup is an operating system backup of all data files, on- line redo log files and control file that constitute ORACLE database and the parameter.

72 Can a View based on another View?
Yes

73 Can a Table space hold objects from different Schemes?
Yes

74 Can objects of the same Schema reside in different table spaces?
Yes

75 What is the use of Control File?
When an instance of an ORACLE database is started, its control file is used to identify the database and redo log files that must be opened for database operation to proceed. It is also used in database recovery.

76 Do you View contain Data?
Views do not contain or store data.

77 What are the Referential actions supported by FOREIGN KEY integrity constraint?
Update And Delete Restrict - A referential integrity rule that disallows the update or deletion of referenced data. DELETE Cascade - When a referenced row is deleted all associated dependent rows are deleted.

78 What are the types of Synonyms?
There are two types of Synonyms Private and Public

79 What is a Redo Log?
The set of Redo Log files YSDATE, UID, USER or USERENV SQL functions, or the pseudo columns LEVEL or ROWNUM.

80 What is an Index Segment?
Each Index has an Index segment that stores all of its data.

81 Explain the relationship among Database, Table space and Data file?
Each databases logically divided into one or more table spaces one or more data files are explicitly created for each table space.

82 What are the different types of Segments?
Data Segment,
Index Segment,
Rollback Segment
and
Temporary Segment

83 What are Clusters?
Clusters are groups of one or more tables physically stores together to share common columns and are often used together.

84 What is an Integrity Constrains?
An integrity constraint is a declarative way to define a business rule for a column of a table.

85 What is an Index?
An Index is an optional structure associated with a table to have direct access to rows, which can be created to increase the performance of data retrieval. Index can be created on one or more columns of a table.

86 What is an Extent?
An Extent is a specific number of contiguous data blocks, obtained in a single allocation, and used to store a specific type of information.

87 What is a View?
A view is a virtual table. Every view has a Query attached to it. (The Query is a SELECT statement that identifies the columns and rows of the table(s) the view uses.)

88 What is Table?
A table is the basic unit of data storage in an ORACLE database. The tables of a database hold all of the user accessible data. Table data is stored in rows and columns.

89 What is schema?
A schema is collection of database objects of a User.

90 Describe Referential Integrity?
A rule defined on a column (or set of columns) in one table that allows the insert or update of a row only if the value for the column or set of columns (the dependent value) matches a value in a column of a related table (the referenced value). It also specifies the type of data manipulation allowed on referenced data and the action to be performed on dependent data as a result of any action on referenced data.

96 What are the Characteristics of Data Files?
A data file can be associated with only one database. Once created a data file can't change size. One or more data files form a logical unit of database storage called a table space.

97 How do you define Data Block size?
A data block size is specified for each ORACLE database when the database is created. A database users and allocated free database space in ORACLE data blocks. Block size is specified in INIT.ORA file and can't be changed latter.

98 What does a Control file Contain?
A Control file records the physical structure of the database. It contains the following information. Database Names and locations of a database's files and redo log files. And Time stamp of database creation.

99 What is the effect of setting the value "CHOOSE" for OPTIMIZER_GOAL, parameter of the ALTER SESSION Command?
The Optimizer chooses Cost based approach and optimizes with the goal of best throughput if statistics for at least one of the tables accessed by the SQL statement exist in the data dictionary. Otherwise the OPTIMIZER chooses RULE based approach.

100 What is the function of Optimizer?
The goal of the optimizer is to choose the most efficient way to execute a SQL statement.

all interview questions answers pdf

TOP 100 Data Warehousing Interview Questions and Answers pdf fresher and experienced

0 comments:

Post a Comment