Tuesday, October 20, 2015

GAC - Global Assembly Cache

GAC - Global Assembly Cache

This is a central repository for assemblies in computer. Only strong named assemblies can be registered in GAC.
Location of GAC in any computer is C:\Windows\assembly\
Command to register
gacutil -i assemblyname.dll

Use & advantages of GAC:

  1. Multiple applications can refer same assembly from GAC
  2. GAC can maintain multiple versions of a assembly, so different assemblies can refer their required versions
    Example :
    Our exe is required Myassembly.dll version 1.0.0. Now there is new version available of Myassembly.dll that is 2.0.0 in GAC. We can place a config file with exe mapping old dll to new dll, to  call new dll in exe.
    <bindingredirect oldversion="1.0.0" newversion = "2.0.0" />
  3. Assemblies are secured with windows security, any user of system required admin rights to update or delete assembly 

Strong Name: Strong name contains name, version & public, private or both token keys. A signed assembly is called Strong named assembly

Weak Name: Weak Name does't have public or private key token

Delay Singing: Whenever we sign a assembly there is a option 'Delay Sign Only', if that is chosen later we can again assign a new key file (.snk file) to the assembly
Use of Delay signing : Many times you required strong named assembly in development, so you can sign assembly with public key with delay singing and provide to developers.
At the time deployment on production you can re - assign new key.

Use of IDisposable

IDisposble is used for writing object cleanup functionality for our class.
  1. Implement IDisposable and define Dispose() to cleanup managed object
  2. Implement Destructor (Finalizer) to cleanup unmanaged object, do not use Destructor if do not have at least one unmanaged object in class

How to implement?

1. Write Dispose(bool) method
        public void Dispose(bool disposing)
        {
            if (_dispose) return;

            if (disposing)
            {
                //Free managed code
            }

            //Free unmanged code
            _dispose = true;
        }
2. Call Dispose(bool) with true method from actual Dispose() method of IDisposable, to release managed object.
        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

Below statement suppress destructor of the class for managed object. 
If we do not write this GC puts some objects in generation 1 and generation 2 also that degrade performance
With this statement GC puts objects in 0th generation that improve performance
GC.SuppressFinalize(this);

3. Call Dispose(bool) with false from destructor to release unmanaged objects.
        ~clsSecurityCritical() //Finalizer
        {
            Dispose(false);
        }
4. GC.SuppressFinalize(thisRequests that the CLR not call the finalizer for the specified object.

Code :
    public class clsSecurityCritical : IDisposable
    {
        public clsSecurityCritical()
        {
   
        }
        public void CallCheck()
        {
            Console.WriteLine("Security Critical code executed");
        }
        ~clsSecurityCritical() //Finalizer
        {
            Dispose(false);
        }

        bool _dispose = false;
        public void Dispose(bool disposing)
        {
            if (_dispose) return;

            if (disposing)
            {
                //Free managed code
            }

            //Free unmanged code
            _dispose = true;
        }

        public void Dispose()
        {
            Dispose(true);
            GC.SuppressFinalize(this);
        }

    }

Garbage Collector

What is Garbage Collection?
Garbage collection is a service to freeing up memory space occupied by dead objects or objects not in use from a long time.
It is performed by 'Garbage Collector' that is nothing but a background thread. Garbage Collector is part of CLR.
Garbage Collector maintain a heap of created object for your application, it periodically checks and de-allocate memory occupied by objects no longer required (dead or not used by long time)

When it perform collection?
  1. Periodically
  2. There is predefined threshold(memory limit) for heap of your application, if memory occupied cross that limit  
  3. When physical memory become low
What are Generations?
GC maintain a heap of objects for your application in that it categories objects in three generations,
generation 0, generation 1 & generation 2.

Newly created objects marked with generation 0. Big size newly created objects placed in Generation 1 & 2 with the assumption that they will have long life.

GC performs collection on generation 0, it releases memory from dead objects & moves alive objects to generation 1.
GC performs collection on generation 1, it releases memory from dead objects & moves alive objects to generation 2
GC performs collection on generation 2, it releases memory from dead objects.

0 Generation objects are short life object
1 Generation objects are middle life object
2 Generation objects are long life object
All short live objects resides in 0th generation so GC performs collection on 0th generation more than 1st generation, and so on..

Sunday, October 18, 2015

Managed Vs Unmanaged Code

Managed Code: Code written in any .net language run under CLR environment is managed code.

CLR provides it's services to manged code
  1. Garbage Collection
  2. Error Handling
  3. CTS (Common Type System)
  4. CAS (Code Access Security)
  5. Code Verification 
  6. Performance Improvement: Decides which JIT to use & assigns to JIT. JIT compiles IL to native m\c code.
Unmanaged Code: Code that does not target CLR environment for execution is UnManaged code.

CAS & Changes in .Net Security Modal in .Net 4.0

CAS (Code Access Security): CAS Identifies code, allows only resources & operations to do by the code for those code have permission, based on evidences.
There is set of permissions, policies can be seen with ".Net Framework Configuration" tool. Using this tool we can assign/revoke specific permissions to any code (exe).

This modal has deprecated in .Net framework 4.0


Change from .net 4.0 in security modal is granting permission depend on host, not on CAS modal.
Now, host decides permissions for any code & assign it.
Like if we are executing any assembly, windows identify, categorizes & assign permissions to it.

Code is divided in three types with respect to security
SecurityCritical: Trusted Code, have access to your system
SecurityTransparent: Untrusted Code, should not have acces to system, It must not have any direct call to Security critical code
Security Safe Critical: Used to create bridge b\w transparent and critical code, if want to call critical code from transparent code

How can we mark our code with any of above given category?
Add attributes to AssemblyInfo.cs file of project as given in below picture


If you want to use some third party dll and don't want to give that access to Security Critical code then wrap-up the third party dll with your own code and mark that Security Transparent.
Sand Boxing: 
If you want to use some untrusted third party dll, then create your app domain, assign permissions to that app domain & run the third party dll under this domain, this approach is called Sandboxing.

Saturday, October 17, 2015

CLR, CLS, MSIL, CTS & JIT

MSIL (Microsoft Intermediate Language): An intermediate language generated by .net compilers (C#,VB,VC++...) when compiles source code.

CLR (Common Language Runtime): CLR is runtime environment of .net responsible to compile & manage .net code. Verifies the code for security, provide code access security means restrict code to access restricted area & resources of system, provides garbage collection service.

CLS (Common Lanuage Specifications): There is a defined set of specifications that should be followed by every compiler who is targeting to communicate with .net languages.

CTS (Common Type System): There is a defined set of Types, every .net language compiler compiles code with these common data types, so that all .net languages can communicate.
After compilation IL code of every .net language code, have common types.

JIT (Just In time Compiler): JIT is responsible to compile MSIL code into native machine code and save it in memory.

Types of JIT compiler
  1. Pre JIT  - Compiles whole code in a single cycle.
  2. Econo JIT - Compiles code part by part when required. It means compiles methods those called at runtime, and then remove those compiled code when not required. That means called code compiles, serves and then frees.
  3. Normal JIT - Compiles code part by part as Econo but only once when any code part called first time, then it put that code to cache and if required later, then serves from cache that part.
    Normal JIT = Econo JIT + Caching


Wednesday, October 14, 2015

SQL - Pivot & Unpivot

Pivot & Unpivot are relational operators.
Pivot is used to rotate table valued expression. In a table valued expression, it turns unique values of one column into multiple columns and perform aggregation on any of other columns that is required in output.
Unpivot is used to perform opposite operation of Pivot, In a table valued expression, it turns multiple columns into one column values with multiple rows.

Syntax
SELECT <non-pivoted column>,
    [first pivoted column] AS <column name>,
    [second pivoted column] AS <column name>,
    ...
    [nth pivoted column] AS <column name>
FROM
    (<SELECT query that produces the data>)
    AS <alias for the main source query>
PIVOT
(
    <aggregation function>(<column being aggregated>)
FOR
[<column that contains the values that will become column headers>]
    IN ( [first pivoted column], [second pivoted column],
    ... [nth pivoted column])
) AS <alias for the pivot table>

<optional ORDER BY clause>

Example
 Table valued sql expression
Pivoted table 

Unpivot Example



Monday, October 12, 2015

SQL - Change Data Capture (CDC)

CDC - Change data capture helps us to track insert/update/delete on any table
CDC is not available in express edition, it is available in enterprise, developer & enterprise evaluation editions

Steps to enable CDC on one or more tables those you want to track
  1. First enable CDC at database level
    sp_cdc_enable_db
  2. Enable CDC at table level
    sp_cdc_enable_table
    @source_schema = 'dbo',
    @source_name = 'table1'

    @role_name = null

    sp_cdc_enable_table
    @source_schema = 'dbo',
    @source_name = 'table2'

    @role_name = null
Sql creates some 6 new tables and lots of procedures to track and store data of change.
For every table on which you on CDC it creates a table with name pattern for above example
cdc.dbo_table1_CT
cdc.dbo_table2_CT

Sql stores all changed data in their respective table with a column "_$operation" that have values as given below
1 - deleted record
2 - Inserted record
3 - Record before update
4 - Record after update

SQL - Useful system functions

1. Coalesce
--COALESCE ( expression [ ,...n ] )
 --Returns value from first not null column from the provided columns
  select Coalesce(FirstName,MiddleName,LastName) from tb_userdesc

2. Row_Number
 --Row_number()
 --Sorted on FirstName and given serial number
 select Row_number() over (order by Firstname) as SNo, userid, FirstName, Address_Type from tb_userdesc





 -- Records grouped on values of Address_Type then these group has sorted on UserId with row number restarted
 select Row_number() over (PARTITION by Address_Type order by UserId) as SNo, userid, FirstName, Address_Type from tb_userdesc




3. Rank
 --Rank()
 -- Generates sequence like Row_number, gives same number for same values in order by column. But for next value uses the number that actually returned by Row_number
 select rank() over (order by FirstName) as SNo, userid, FirstName, Address_Type from tb_userdesc




4. Dense_Rank
 --dense_rank()
 --Generates sequence like Row_number, gives same number for same values in order by column. For next value uses the next number in sequence

 select dense_rank() over (order by FirstName) as SNo, userid, FirstName, Address_Type from tb_userdesc



SQL - Stored Procedures

Stored procedures are precompiled sql statements.

Sql caches execution plan for stored procedures and on every execution of procedure it fetches execution plan from cache by procedure id and executes it.
So, Sql no need go through steps of syntax checking(step1) and plan selection(step2), this increases performance of application that uses the procedure.

While Sql also caches execution plan for sql statements we execute. But if we change any value sql treated it as new sql statement and cache it again with different name.
Example
Select * from employees where Name like '%Ram%'
Select * from employees where Name like '%Raj%'
Plans for above both statements will be cached but with different Ids, so can't take benefit of it.


Procedure Syntax :
CREATE PROCEDURE <Procedure_Name, sysname, ProcedureName>
       -- Add the parameters for the stored procedure here
       <@Param1, sysname, @p1> <Datatype_For_Param1, , int> = <Default_Value_For_Param1, , 0>,
       <@Param2, sysname, @p2> <Datatype_For_Param2, , int> = <Default_Value_For_Param2, , 0>
AS
BEGIN
       -- SET NOCOUNT ON added to prevent extra result sets from
       -- interfering with SELECT statements.
       SET NOCOUNT ON;

    -- Insert statements for procedure here
       SELECT <@Param1, sysname, @p1>, <@Param2, sysname, @p2>
END

SQL - Inserted & Deleted Tables in Trigger

'Inserted' & 'Deleted' tables are temporary table created by Sql
These tables have same structure as Main table
Their scope is in triggers
These can be used for validation or other calculation or as required in triggers


























Example :
ALTER TRIGGER [dbo].[Reminder]
ON [dbo].[Filings]
AFTER UPDATE
AS

declare @FilingId as int
declare @FilingSId as int
declare @IsOnTime as bit

SELECT @FilingId = i.FilingId FROM INSERTED i
SELECT @FilingSId = i.filingstatus FROM INSERTED i
SELECT @IsOnTime = case when getdate() > A.DueDate then 0 elseend from vw_Companies A  inner join  INSERTED i on A.crno = i.crno

IF ( UPDATE (filingstatus) and @FilingSId =12)
BEGIN
update   Filings set isontime = @IsOnTime where filingid = (select i.FilingId FROM INSERTED i)
END;

SQL - Effect of Indexes on DML operations

Indexes degrade performance of DML operations Insert / Update /Delete
As many as indexes increases cost of DML sql statements.

Insert - Sql need to find correct nodes as per indexes defined on table to insert data then it checks whether node have enough space to store data if not then it splits node and distributes entry in split nodes that is also having cost. Also balances nodes in the tree.
While for table without index, sql directly can inserts entry in available data blocks.
More number of indexes more cost.
Insert statements has more cost than update and delete for indexed tables.

Update - Traverse using tree, removes the old references and adds new references, balances  tree
More number of indexes more cost.

Delete - Traverse using tree, removes the old references, balances  tree
More number of indexes more cost.

Saturday, October 10, 2015

SQL - Clustered Vs Non Clustered Index

  • Clustered index 
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.
Clustered index forms B-tree (Balanced tree) kind structure for fast search
Clustered index has data pages on leaf node
Only one clustered index can be created
why? because a clustered index form a physical ordering of data and we can't have multiple physical ordering of data
If A table has clustered index data stored in sorted manner
If A table not have clustered index data stored in a manner called heap
Sql server automatically creates clustered index on primary key

  • Non clustered index 
Non-clustered indexes have a structure separate from the data rows. A Non-clustered index contains the Non-clustered index key values and each key value entry has a pointer to the data row that contains the key value.
The pointer from an index row in a Non-clustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.
You can add non key columns to the leaf level of the Non-clustered index to by-pass existing index key limits, 900 bytes and 16 key columns, and execute fully covered, indexed, queries.
It is logical ordering of data, not physical, physically data stored in different manner
It has index rows on leaf nodes not data pages those works like pointers to actual data pages
it could be clustered index
249 Non clustered index can be created on a table
Sql server automatically creates non clustered index on any other unique key

Thursday, October 8, 2015

SQL - Sub Query Vs Co-related Query

Sub Query : Independent from outer query
First sub query executed and result passes to outer query as input.

Ex.
create table employee (
ID int primary key identity(1,1),
FullName Varchar(20),
Salary money)

insert into employee values ('Megha', 22000)
insert into employee values ('Mukesh', 25000)
insert into employee values ('Renuka', 21000)
insert into employee values ('Ghanshyam', 132000)

Second highest salary using Sub Query
select top 1 o.Salary from
(select top 2 c.Salary from employee c order by c.Salary desc) o order by o.Salary asc

Co-related query : Dependent on outer query
Values from outer query becomes input for inner query then final output gets generated

All records of Sub query compared with each record of outer query

Second highest salary using Co-related Query
select o.Salary from employee o
where 2 = (select count(c.id) from employee c where c.Salary >= o.Salary)

SQL - Having Clause

Two reasons of using Having clause
  1. Can't use 'Where' on records filtered by 'Group by'
  2. Aggregate functions can't be used in where clause
SELECT column_name, aggregate_function(column_name)
FROM table_name
WHERE column_name operator value
GROUP BY column_name
HAVING aggregate_function(column_name) operator value;

Ex.
This will execute
select count(id), max(cast(isactive as varchar(2))) from tb_city
where regionid > 2
group by regionid 

Syntax error
select count(id), max(cast(isactive as varchar(2))) from tb_city
group by regionid 
where regionid > 2

SQL - Data Pages

There are two types of sql pages -
User Data Pages - Stores data inserted by user in pages
System Configuration Pages - IAM Page, Index Page etc

Index Allocation Map (IAM) Page - Store information about actual data pages, Page_type =10
Data Page - Stores Actual data. Page_Type =1
Page id 697 is IAM page that contains information of data pages like page id 696, 698..

Sql stores data in one page if data exceeded it start to write data in next new page and so on.

One page size total 8 KB = 8192 Bytes

--------------------
Section-1
Page Header 92 Bytes
--------------------
Section-2
Actuall Data Uses remaining Bytes out of 8192
--------------------
Section-3
Offset Table 2Bytes per record (stored in section 2)
--------------------

Sunday, October 4, 2015

OLTP Vs OLAP

OLTP (Online Transaction Processing) : Deals with operational data, where frequent DML(Insert/Update/Delete) operations happens, this is normalized system.
OLTP provides data to OLAP

OLAP (Online Analytical Processing) : Deals with historical & archival data, operations are
analysis and search of data used for reporting for purpose of data analysis and planning.
It is data archiving mining of OLTP data, OLAP is used for warehouses.
This required cost effective design for select statements not for insert/update/delete.

OLAP mainly required Facts, Dimensions & Measures for analysis & planning.
Fact - Product + Measures, Dimension - SalePerMonth, Measures - 30000$
Fact - Employee + Measures, Dimension - AgeWiseSalary, Measures - 2500$

So OLAP have demoralized design




















































There are two techniques of designing OLAP :

  1. Star Schema
  2. Snow Flake
Difference between them is that dimension could be normalized in Snow Flake.

SQL - 1NF, 2NF & 3NF

Normal Forms are fall under database normalization techniques

1NF (First Normal Form) : Each table cell should contain a single value, there should not be repetitive data in rows.

2NF (Second Normal Form) : Table should be in follow 1NF & Each column of table should depend on full primary key.
In this example We have composite key ProductId + SuplierId, here Price is depend on ProductId only not on SupplierId, so it should be moved in table where it depends on primary key that is Products table. In Product table Store is not depend on ProductId so it should be moved in new Store table

































3NF (Third normal Form) : Each column should only depend on primary key.
While 2NF say column depend on full primary key.
In this example Total is depend on quantity and Price not on ProductId, Total can be calculated using Price and Quantity so it should removed.
















Saturday, September 26, 2015

SQL - Query Optimization Techniquess - I

Below given are some facts those can be used to optimize our query performance
  • Unique Key: If there is no primary key defined on a table sql uses "Table Scan" operator.table scan operator reads records one by one until it gets required record, so this operator is costly if table having large number of rows.
    When we create primary key on table it improve performance drastically. As displayed in picture.

  • Scan Vs Seek: If we've clustered / non clustered index on table then sql uses 'Index Scan' or 'Index Seek' operators.Index Scan : Touches every row whether it is qualified or not, start to end. So cost associated with this search is proportional total number of rows.
    This scan is right choice if table is small or most of the rows are qualified for condition of query.
    Index Seek : Index seek touches only rows / pages those are qualified for condition of query.
    So, if we provide some condition sql prefer to choose 'Index Seek', Even if condition provided result record number is large then optimizer change the plan uses Scan instead of Seek.
    This is right choice if table having large number of rows or qualified rows count is very much smaller(few) compare to all rows. 

Monday, September 21, 2015

SQL - Basics Sql execution plan

Below given is high level diagram shows how Sql queries get executed




































Sql query processing steps : There are some steps for sql statements execution
  1. Parsing : Parser checks syntax of sql statement including spellings, reserved keywords etc, and prepare sequence tree.
  2. Normalization : Normalization process mainly perform binding for objects of sequence tree, checks tables and columns available or not, loading metadata, adding required information for conversations to sequence tree. Replacing views with their definitions.
  3. Compilation : Compilation process include extraction of true transact statements(select, insert, update, delete etc) to pass to optimizer, because whole code may include variable declarations, assignments, loops, if conditions etc those not required optimization.

Tuesday, September 1, 2015

SOLID Principles

SOLID is an acronym for basic object oriented programming & design principals. Applying these principle on system makes system easy to maintain and extendable with time.
  • Single Responsibility Principle (SRP)
  • Open Closed Principle (OCP)
  • Liskov Substitution Principle (LSP)
  • Interface Segregation Principle (ISP)
  • Dependency Inversion Principle (DIP) 
Also refer nice article on https://www.syncfusion.com/blogs/post/mastering-solid-principles-csharp 
  • Single Responsibility Principle (SRP): Single responsibility principle says a software module / class should have only one reason to modify. It means if any class having more than one reasons to modify, this is violation of SRP.
    public class Circle
    {
        private int _radius;
        public int Radius
        {
            get
            {
                return _radius;
            }
            set
            {
                value = _radius;
            }
        }

        public int Area()
        {
            return Radius * Radius * 22 / 7;
        }

        public string PrintHTMLArea()
        {
            return "<h2>" + Area().ToString() + "</h2>";
        }
    }

Above class contains two responsibility
Calculating Area
Printing Area

If properties need to add/modify or print method need to be modified or need to add new print method, class need to be updated. So this is violation of SRP, entity class should have only entity specific things like variables / properties.

CI/CD - Safe DB Changes/Migrations

Safe DB Migrations means updating your database schema without breaking the running application and without downtime . In real systems (A...