This tutorial aims to give a brief and advanced introduction into
programming with C#. The prerequisites for understanding this tutorial
are a working knowledge of programming, the C programming language and a
little bit of basic mathematics. Some basic knowledge of C++ or Java
could be helpful, but should not be required.
Basic concepts, object oriented programming and using the .NET-Framework
This is the first part of a series of tutorials on C#. In this part
we are introducing the fundamental concepts of the language and it's
output, the Microsoft Intermediate Language (MSIL). We will take a look
at object-oriented programming (OOP) and what C# does to make OOP as
efficient as possible to realize in practice.
For further reading a list of references will be given in the end.
The references provide a deeper look at some of the topics discussed in
this tutorial.
A little bit of warning before reading this tutorial: The way that
this tutorial works is that it defines a path of usability. This part of
the tutorial is essentially required before doing anything with C#.
After this first tutorial everyone should be able to write a simple C#
program using basic object-oriented principles. Hence more advanced or
curious readers will probably miss some parts. As an example we will not
investigate exception handling, exciting possibilities with the
.NET-Framework like LINQ and more advanced language features like
generics or lambda expressions. Here our aim is to give a soft
introduction to C# for people coming from other languages.
The right development environment
Before we can start doing something we should think about where we
want to do something. A logical choice for writing a program is a text
editor. In case of C# there is an even better option: using the
Microsoft Visual Studio (VS)! However, there are more options, that have
been designed specifically with C# in mind. If we want to have a
cross-platform IDE then MonoDevelop is a good choice. On Windows a free
alternative to VS and MonoDevelop is SharpDevelop.
Using a powerful IDE like Visual Studio will help us a lot when
writing programs with the .NET-Framework. Features like IntelliSense (an
intelligent version of auto-complete, which shows us the possible
method calls, variables and available keywords for a certain code
position), breakpoints (the program's execution can be paused and
inspected at desired code positions) and graphical designers for UI
development will help us a lot in programming efficiently.
Additionally the Visual Studio gives us integrated version control,
an object browser and documentation tools. Another bonus for writing C#
projects with the Visual Studio is the concept of solutions and
projects. In Visual Studio one usually creates a solution for a
development project. A solution file can contain several projects, which
can be compiled to libraries (.dll) and executables (.exe files). The solution explorer enables us to efficiently manage even large scale projects.
The project files are used by the MSBuild application to compile all
required files and link to the dependencies like libraries or the
.NET-Framework. We, as a developer, do not need to worry anymore about
writing makefiles. Instead we just add files and references
(dependencies) to projects, which will be compiled and linked in the
right order automatically.
There are several shortcuts / functions that will make VS a real pleasure:
- CTRL + SPACE, which forces IntelliSense to open.
- CTRL + ., which opens the menu if VS shows an option point (this will happen if a namespace is missing or if we rename a variable).
- F5, to build and start debugging.
- CTRL + F5, to build and execute without debugging.
- F6, just build.
- F10, to jump to the next line within the current function (step over) when debugging.
- F11, to jump to the line in the next or current function (step into) when debugging.
- F12, to go to the definition of the identifier at the caret's position.
- SHIFT + F12, to find all references of the identifier at the caret's position.
Of course those keyboard shortcuts can be changed and are not
required. Everything that can be done with shortcuts is also accessible
by using the mouse. On the other hand there are more options and
possibilities than shortcuts.
Where do get the Visual Studio? There are several options, some of
them even free. Students enrolled in an university who is participating
in the DreamSparks / MSDNAA program usually have the option of download
Visual Studio (up to Ultimate) for free. Otherwise one can download
public beta versions or language-bound specialized versions of the
Visual Studio, called Express Edition.
In this tutorial we will only focus on console applications. GUI will
be introduced in the next tutorial. To create a new console application
project in VS we simple have to use the menu File: Then we select New,
Project. In the dialog we select C# on the left side and then Console
application on the right side. Finally we can give our project a name
like SampleConsoleApp. That's it! We already created our first C# application.
Basic concepts
C# is a managed, static strong-typed language with a C like syntax
and object-oriented features similar to Java. All in all one can say
that C# is very close to Java to start with. There are some really great
features in the current version of C#, but in this first tutorial we
exclude them.
The managed means two things: First of all we do not need to care
about the memory anymore. This means that people coming from C or C++
can stop worrying about freeing the memory allocated for their objects.
We only create objects and do something with them. Once we stopped using
them, a smart program called the Garbage Collector (GC) will take care
of them. The next figure shows how the Garbage Collector works
approximately. It will detect unreferenced objects, collect them and
free the corresponding memory. We do not have much control about the
point in time when this is happening. Additionally the GC will do some
memory optimization, however, this is usually not done directly after
freeing the memory.
This results in some overhead on the memory and performance side, but
has the advantage that it is basically impossible to have segmentation
faults in C#. However, memory leaks are still a problem if we keep
references of objects that are no longer required.
The static strong-typed language means that the C# compiler needs to
know the exact type of every variable and that the type-system must be
coherent. There is no cast to a
void
datatype, which lets us basically do anything, however, we have a datatype called Object
on top, which might result in similar problems. We will discuss the consequences of the Object
base type later on. The strong part gives us a hint that operations may
only be used if the operation is defined for the elements. There are no
explicit casts happening without our knowledge.
We have another consequence of C# being managed: C# is not native,
nor interpreted - it is something between. The compiler generates no
assembly code, but the so called Microsoft Intermediate Language (MSIL).
This trick saves us the re-compilations for different platforms. In
case of C# we just compile once and obtain a so called Common Language
Runtime (CLR) assembly. This assembly will be Just-In-Time (JIT)
compiled during runtime. Another feature is that optimizations will also
take place during runtime. Often occurring method calls will be
in-lined and not required statements will be omitted automatically.
In theory this could result in (depending on the used references)
platform-independent programs, however, this implies that all platforms
have the requirements to start and JIT compile CLR assemblies. Right now
Microsoft limits the .NET-Framework to the Windows family, however,
Xamarin offers a product called Mono, which gives us a solution that
also works on Linux and Mac.
Coming back to the language itself we will see that the
object-oriented features are inspired by those of Java. In C# only a
slightly different set of keywords has been used. The
extend
keyword of Java has been replaced by the C++ operator colon :
. There are other areas where the colon has been used an operator with a different (but related) meaning.
Let's have a look at a sample Hello Tech.Pro code.
//Those are the namespaces that are (by default) included
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;
//By default Visual Studio creates already a namespace
namespace HelloWorld
{
//Remember: Everything (variables, functions) has to be encapsulated
class Program
{
//The entry point is called Main (capital M) and has to be static
static void Main(string[] args)
{
//Writing something to the console is easily possible using the
//class Console with the static method Write or WriteLine
Console.WriteLine("Hello Tech.Pro!");
}
}
}
This looks quite similar to Java, except the casing. Comments can be
placed by using slashes (comment goes until the end of the line with no
possibility to switch back before), or using a slash and an asterisk
character. In the latter case the comment has to be ended with the
reverse, i.e. an asterisk and a slash. This kind of comment is called a
block comment. Visual Studio also has a special mode of comments, that
is triggered once three slashes are entered.
Back to the code: While we do not need to place our classes in a
namespace, Visual Studio creates one by default. The creation of a class
is, however, required. Every method and variable needs to be
encapsulated. This is why C# is considered to be
strongly-object-oriented. Data encapsulation is, as we will see, an
important feature of object-oriented programming and therefore necessary
in C#.
There are already some other things that we can learn from this small sample. First, C# uses (as it has to) a class called
Console
for console based interaction. This class only contains so called static members like the WriteLine
function. A static member is such a member that can be accessed without
creating an instance of the class, i.e. static members cannot be
reproduced. They only exist once and exist without being explicitly
created. Functions of a class are called methods.
The main entry point called
Main
can only exist once, which is why it has to be static
.
Another reason is that it has to live in a class. If it would not be
static, the class would first have to be created (as all non-static
members can only be accessed by created objects called instances of a
class). Now we have a chicken-and-egg problem. How can we tell the
program to build an instance of the class (such that the Main
method can be accessed), without having a method where we start doing something? Hence the requirement on the Main
method being static
.
Another thing we can learn is that the parameter
args
is
obviously an array. It seems that an array is used as a datatype,
whereas in C the variable is the array and the type would be string.
This is by design and has some useful implications. We will later see
that every array is based on an object-oriented principle called
inheritance, which specifies a basic set of fields and implementations.
Every array type has a field called Length
, which contains
the number of elements in the array. This is no longer hidden and needs
to be passed as an additional parameter, which is why the standard Main
method of a C# program only has 1 parameter compared to the standard C main
method with 2 parameters.Namespaces
Namespaces might be a new concept for people coming from the C
programming language. Namespaces try to bring order to the world of
types. Even though C# allows us to have multiple methods with the same
name (called method overloading), but a unique signature (return type
and parameters), it does not allow us to use the same name for a type.
This restriction could result in serious problems. If we consider the
case of using (requiring) two (independent) internal libraries, it is
possible that both libraries define a type with the same name. Now there
would be no way to use both types (if compiling would be possible at
all!). At best we could use one type.
This is where namespaces come to rescue. A namespace is like a
container where we can place types in. However, that container is only a
string that will be used by the compiler to distinguish between types.
Even though the dot (like in
a.b
) is usually used to create the impression of an relation between two strings (in this case a
and b
), there is no restriction that a namespace has to exist before creating another one it it, e.g. a
has not to be defined or used somewhere before using a.b
.
Usually we would have to place that namespace in front of every type always, however, using C#'s
using
keyword we tell the compiler to implicitly do that for all types of the
used namespace. Types of the current namespace are also implicitly used
by the compiler. There is only one exception i.e., if we want to use a
type that is defined in multiple used namespaces. In that scenario, we
always have to explicitly specify the namespace of the type we want to
use.Datatypes and operators
Before we can actually start doing something we need to introduce the
basic set of datatypes and operators. As already mentioned C# is a
static strongly-typed language, which is why we need to care about
datatypes. In the end we will have to specify what's the type behind any
variable.
There is a set of elementary datatypes, which contains the following types:
bool
(1 byte) used for logical expressions (true, false)char
(2 bytes) used for a single (unicode) charactershort
(2 bytes) used for storing short integersint
(4 bytes) used for computations with integerslong
(8 bytes) used for computations with long integersfloat
(4 bytes) used for computations with single precision floating point numbersdouble
(8 bytes) used for computations with double precision floating point numbersdecimal
(16 bytes) used for computations with fixed precision floating point numbers
There are also modified datatypes like unsigned integer (
uint
), unsigned long (ulong
)
and many more available. Additionally a really well working class for
using strings has been delivered. The class is just called string
and works as one would expect, except that a lot of useful helpers are directly available.
Types alone are quite boring, since we can only instantiate them,
i.e. create objects based on their definition. It gets more interesting
once we connect them by using operators. In C# we have the same set of
operators (and more) as in C:
- Logical operators like
==
,!=
,<
,<=
,>
,>=
,!
,&&
,||
- Bit operators like
^
,&
,|
,<<
,>>
- Arithmetic operators like
+
,-
,*
,/
,%
,++
,--
- The assignment operator
=
and combinations of the assignment operator with the binary operators - The ternary operator
? :
(inline condition) to return either the left or right side of the colon depending on a condition specified before the question mark - Brackets
()
to change the operator hierarchy
Additionally C# has some inbuilt-methods (defined as unary operators) like
typeof
or sizeof
and a set of really useful type operators:- The standard cast operator
()
as in C - The reference cast operator
as
- The type conformance checker
is
- The null-coalescing operator
??
- The inheritance operator
:
Let's see some of those types and operators in action:
using System;
class Program
{
static void Main(string[] args)
{
//5 here is an integer literal
int a = 5;
//A double value is a floating point literal.
double x = 2.5 + 3.7;
//A single value is given by a floating point literal with the suffix f.
float y = 3.1f;
//Using quotes will automatically generated constant strings
string someLiteral = "This is a string!";
string input = Console.ReadLine();
//int, string, double, float, ... are just keywords
//by Int32, String, Double, Single - which have static & and non-static members.
a = int.Parse(input);//Here we use the static method of Int32.
//Output of a mod operation - adding string and something (or something
//
and string) will always result in a string. Additionally we have the
//regular operator ordering which performs a % 10 before the string and
//a get concatenated.
Console.WriteLine("a % 10 = " + a % 10);
}
}
We will see more operators and basic types in action throughout this
series of tutorials. It should be noted that all operators will by
default return a new type, leaving the original variable unchanged.
Another important aspect is there is a fixed operator hierarchy, which
is basically like the operator hierarchy in C. We do not need to learn
this, since we can always use brackets to change the default hierarchy.
Additionally the operator hierarchy is quite natural in following rules
like dot before dash, i.e. multiplication and division before addition
and subtraction.
Reference and value types
A very important concept for understanding C# is the difference between reference and value types. Once we use a
class
, we are using a reference type. On the other side any object that is a struct
will be handled as a value type. The important difference is that
reference types will be passed by reference, i.e. we will only pass a
copy of the position of the object, and not a copy of the object itself.
Any modification on the object will result in a modification on the
original object.
The other case is the one with value types. If we give a method an argument that is a
struct
(that includes integers, floating point numbers, a boolean, a character
and more) we will get a copy of the passed object. This means that
modifications on the argument will never result in a modification on the
original object.
The whole concept is quite similar to the the concept of pointers in
C. The only difference is that we do not actually have access to the
address and we do not have to dereference the variable to get access to
the values behind it. One could say that C# handles some of the things
automatically that we would have to write by hand in C/C++.
In C# we have access to two keywords that can be passed together with the argument definition. One important keyword is called
ref
.
This allows us to access the original value in the case of passing in
structures. For classes the consequence is that the original pointer is
actually passed in, allowing us to change the position of it. This
sounds quite strange at first, but we will see that in most cases there
is no difference between an explicit ref
on a class
instance to an implicit by just passing the class. The only difference
is that we can actually reset the pointer, like in the following code.using System;
class Program
{
static void Main()
{
//A string is a class, which can be instantiated like this
string s = "Hi there";
Console.WriteLine(s);//Hi there
ChangeString(s);
Console.WriteLine(s);//Hi there
ChangeString(ref s);
Console.WriteLine(s);//s is now a null reference
}
static void ChangeString(string str)
{
str = null;
}
static void ChangeString(ref string str)
{
str = null;
}
}
The other important keyword is called
out
. Basically out
is like ref
. There are a few differences that are worth mentioning:out
variables need to be assigned in the method that has them as parameters.out
variables do not need to be assigned in the method that passes them as parameters.out
variables mark the variable as being used for (additional) outgoing information.
The main usage of
out
parameters is as the name
suggests: We now have an option to distinguish between "just" references
and parameters, which will actually return something. The
.NET-Framework uses out
parameters in some scenarios. Some of those scenarios are the TryParse
methods found on the basic structures like int
, double
and others. Here a bool
is returned, giving us an indicator if the given string
can be converted. In case of a successful conversion the variable given in form of an out
parameter will be set to the corresponding value.
All the talk about reference (
class
) and value (struct
) types is useless if we do not know how to create such types. Let's have a look at a simple example:using System;
class Program
{
static void Main(string[] args)
{
//Usually C# forbids us to leave variables unitialized
SampleClass sampleClass;
SampleStruct sampleStruct;
//However, C# thinks that an out-Function will do the initialization
HaveALook(out sampleClass);
HaveALook(out sampleStruct);
}
static void HaveALook(out SampleClass c)
{
//Insert a breakpoint here to see the
//value of s before the assignment:
//It will be null...
c = new SampleClass();
}
static void HaveALook(out SampleStruct s)
{
//Insert a breakpoint here to see the
//value of s before the assignment:
//It will NOT be null...
s = new SampleStruct();
}
//In C# you can created nested classes
class SampleClass
{
}
//A structure always inherits ONLY from object,
//we cannot specify other classes (more on that later)
//However, an arbitrary number of interfaces can
//be implemented (more on that later as well)
struct SampleStruct
{
}
}
In the example above we are creating two types called
SampleClass
and SampleStruct
. We can instantiate new objects from those type definitions using the new
keyword. This is not new (pun intended) for programmers coming from
Java or C++, but certainly something new for programmers coming from C.
In C we would use the malloc
function in case of a class
(giving us a pointer) and nothing with a structure (giving us a value).
There is, however, one big advantage of using that new
keyword: It will not only do the right memory allocation (on the heap in
case of a reference type, or the stack in case of a value type), but
also call the corresponding constructor. We will later see what a
constructor is, and what kind of benefits it gives us.
Coming back to our example we see that we do not instantiate anything with the
new
keyword in the Main
method, however, we do use it in the methods called HaveALook
,
which differ by the parameter type they expect. Using breakpoints in
those methods we can see that the class variable is actually NOT set
(the passed in value of the variable is null
, which is the constant for a pointer that is not set), while the structure has already some value.Control flow
Now that we introduced the basic concepts behind C#, as well as
elementary datatypes and available operators, we just need one more
thing before we can actually go ahead and write actual C# programs. We
need to know how to control the program flow, i.e. how to introduce
conditions and loops.
This is pretty much the same as in C, as we are dealing with a C-style syntax. That being said we have to follow these rules:
- Conditions can be introduced by using
if
orswitch
. - A loop is possible by using
for
,while
,do
-while
. - A loop will be stopped by using the
break
keyword (will stop the most inner loop). - A loop can skip the rest and return to the condition with the
continue
keyword. - C# also has an iterator loop called
foreach
. - Another possibility is the infamous
goto
statement - we will not discuss this. - There are other ways of controlling the program flow, but we will introduce those later.
The next program code will introduce a few of the mentioned possibilities.
using System;
class Program
{
static void Main(string[] args)
{
//We get some user input with the ReadLine() method
string input = Console.ReadLine();
//Let's see if this is empty or not
if(input == "")
{
Console.WriteLine("The input is empty!");
}
else
{
//A string is a char array and has a Length property
//It also can be accessed like a char array, giving
//us single chars.
for(int i = 0; i < input.Length; i++)
{
switch(input[i])
{
case 'a':
Console.Write("An a - and not ... ");
//This is always required, we cannot just fall through.
goto case 'z';
case 'z':
Console.WriteLine("A z!");
break;
default:
Console.WriteLine("Whatever ...");
//This is required even in the last block
break;
}
}
}
}
}
The iterator loop called
foreach
is not available in C. It is possible to use foreach
with every type that defines some kind of iterator. We will see what
this means later on. For now we only have to know that every array
already defines such an iterator. The following code snippet will use
the foreach
-loop to output each element of an array.//Creating an array is possible by just appending [] to any datatype
int[] myints = new int[4];
//This is now a fixed array with 4 elements. Arrays in C# are 0-based
//hence the first element has index 0.
myints[0] = 2;
myints[1] = 3;
myints[2] = 17;
myints[3] = 24;
//This foreach construct is new in C# (compared to C++):
foreach(int myint in myints)
{
//Write will not start a new line after the string.
Console.Write("The element is given by ");
//WriteLine will do that.
Console.WriteLine(myint);
}
There are some restrictions on
foreach
as compared to for
. First of all, it is not as efficient as a for
-loop, taking one more operation to start the loop, and calling always the iterators Next
method at the end of each iteration. Second we cannot change the current element. This is due to the fact that foreach
operates on iterators, which are in general immutable, i.e. a single
element cannot be changed. This is also required to keep the iteration
consistent.Object-oriented programming
Object-oriented programming is a method of focusing around objects
instead of functions. Therefore the declaration of types is a key aspect
of object-oriented programming. Everything has to be part of a type,
even if it is just static without any instance dependency.
There are downsides of this pattern of course. Instead of writing
sin()
, cos()
, sign()
etc. we have to write Math.Sin()
, Math.Cos()
and Math.Sign()
since the (very helpful) math functions need to be inside a type (in this case the class Math
) as well.
So what are the key aspects of object-oriented programming?
- Data encapsulation
- Inheritance
- Relations between types
- Declaring dependencies
- Maintainability
- Readability
By creating classes to carry large, reusable packages of data we
provide encapsulation. The inheritance process helps us mark a strong
relation between types and reuse the same basic structure. Encapsulating
functions in types will group what belongs together and reduce code by
omitting required parameters. Also misusage will be prevented by
default. All in all, the main goal is too reduce maintenance efforts by
improving readability and increasing the compiler's power in error
detection.
The main concept for OOP is the type-focus. The central type is
certainly a class. Structures are also important, but will only be used
in edge cases. Structures make sense if we have only a small payload, or
want to create quite elementary small types that will have stronger
immutable features than classes.
Let's have a look again at how we create a class (the type) and how we create class objects (instances):
//Create a class definition
class MyClass
{
public void Write(string name)
{
Console.WriteLine("Hi there... {0}!", name);
}
}
//Create a class instance (code to be placed in a method)
MyClass instance = new MyClass();
A class makes sense once we want to reuse a set of methods with a
fixed set of variables additionally to some parameters for those
methods. A class is also very useful once we want to use an already
existing set of variables and / or methods. If we just want a collection
of functions that is unrelated to any set of fixed (called instance
dependent) variables, then we create static classes where we can just
insert static methods and variables. Good examples of such static
classes are the
Console
and Math
class. They
cannot be instantiated (instances of static classes, i.e. classes that
do not contain instance dependent code, do not make any sense) and
provide only functions with a set of parameters.Inheritance and polymorphism
Now we are coming to the inheritance issue. To simplify things we can
think of inheritance as a recursive copy paste process by the compiler.
All members of the parent (
base
) class will be copied.class MySubClass : MyClass
{
public void WriteMore()
{
Console.WriteLine("Hi again!");
}
}
As already mentioned the inheritance operator is the
:
. In this example we create a new type called MySubClass
, which inherits from MyClass
. MyClass
has been defined in the previous section and does not define an explicit inheritance. Therefore MyClass
inherits from Object
. Object
itself just defines four methods, that are:ToString
, which is a very comfortable way of defining how an instance of the type should presented as a string.Equals
, which is a generic way of comparing two arbitrary objects of equality.GetHashCode
, which gets a numeric indicator if two objects could be equal.GetType
, which gets the meta-information on the specific type of the current instance.
These four methods are available at
MyClass
and MySubClass
instances (copy paste!). Additionally MyClass
defines a method called Write
, which will also be available for all MySubClass
instances. Finally MySubClass
defines a method called WriteMore
, which will only be available for MySubClass
instances.
Right now the inheritance concept is already a little bit useful, but
it is not very powerful. The concept of polymorphism will enable us to
specialize objects using inheritance. First we will introduce the
virtual
keyword. This keyword lets us specify that a (virtual
marked) method can be re-implemented by more specialized (or derived) classes.class MyClass
{
//This method is now marked as re-implementable
public virtual void Write(string name)
{
//Using a placeholder in a string.Format()-able method
Console.WriteLine("Hi {0} from MyClass!", name);
}
}
If we now want to re-implement the
Write
method in the MySubClass
class, then we have to do that explicitly by marking the re-implementation as override
. Let's have a look:class MySubClass : MyClass
{
//This method is now marked as re-implemented
public override void Write(string name)
{
Console.WriteLine("Hi {0} from MySubClass!", name);
}
}
What is the great benefit of this? Let's check out some example code snippet:
//We create two variables of type MyClass
MyClass a = new MyClass();
MyClass b = new MySubClass();
//Now we call the Write() method on each of them, the only
//difference being that in fact b is a more specialized type
a.Write("Flo"); //Outputs ... from MyClass!
b.Write("Flo"); //Outputs ... from MySubClass!
So the trick is that without knowing about the more specialized
instance behind it, we are able to access to specialized implementation
available in
mySubClass
. This is called polymorphism and
basically states that classes can re-implement certain methods, which
can then be used again without knowing about the specialization or
re-implementation at all.
Already here we can benefit from polymorphism, since we are able to
override
the four methods given by Object
. Let's consider the following example:using System;
class Program
{
static void Main(string[] args)
{
MyClassOne one = new MyClassOne();
MyClassTwo two = new MyClassTwo();
Console.WriteLine(one);//Displays a strange string that is basically the type's name
Console.WriteLine(two);//Displays "This is my own class output"
}
}
class MyClassOne
{
/* Here we do not override anything */
}
class MyClassTwo
{
public override string ToString()
{
return "This is my own class output";
}
}
Here the method
WriteLine
solves the problem of having to display any input as a sequence of characters by using the ToString
method of Object
. This enables WriteLine
to output any object, even objects that are unknown. Everything that WriteLine
cares about is that the given argument is actually an instance of Object
(that applies to every object in C#), which means that the argument has a ToString
method. Finally the specific ToString
method of the argument is called.Access modifiers
Access modifiers play an important rule in forcing programmers to
apply to a given object-oriented design. They hide members to prevent
undefined access, define which members take part in the inheritance
process and what objects are visible outside of a library.
Right here we already have to note that all restrictions placed by
modifiers are only artificial. The compiler is the only protector of
those rules. This means that those rules will not prevent unauthorized
access to e.g. a variable during runtime. Therefore setting access
modifiers to spawn some kind of security system is certainly a really
bad idea. The main idea behind those modifiers is the same as with
object-oriented programming: Creating classes that encapsulate data and
force other programmers in a certain pattern of access. This way,
finding the right way of using certain objects should be simpler and
more straight forward.
C# knows a whole bunch of such modifier keywords. Let's have a look at them with a short description:
private
, declares that a member is neither visible from outside the object, nor does it take part in the inheritance process.protected
, declares that a member is not visible from outside the object, however, the member takes part in the inheritance process.internal
, declares that a member or type is visible outside the object, but not outside the current library.internal protected
, has the meaning ofinternal
ORprotected
.public
, declares that a member or type is visible everywhere.
Most of the time we can specify the modifier (there are some
exceptions to this rule, as we will see later), however, we can also
always omit it. For types directly placed in a namespace, the default
modifier is
internal
. This makes quite some sense. For types and members placed in a type (like a class or structure), the default modifier is private
.
This makes sense since it is just a best practice from C++: We always should have started with a
private
declaration in C++, otherwise every member would have been public
(this has been one of the bad design decisions of C++). So while C++
took the way of taking the weakest access modifier as standard (public
), C# always uses the strongest one (internal
or private
).using System;
//No modifier, i.e. the class Program is internal
class Program
{
//No modifier, i.e. the method Main() is private
static void Main()
{
MyClass c = new MyClass();
//Works
int num = c.WhatNumber();
//Does not work
//int num = c.RightNumber();
}
}
//MyClass is visible from this library and other libraries
public class MyClass
{
//This one can only be accessed from MyClass
private int a;
//Classes inheriting from MyClass can access b like MyClass can
protected int b;
//No modifier, i.e. the method RightNumber() is private
int RightNumber()
{
return a;
}
//This will be seen from the outside
public int WhatNumber()
{
//Access inside the class is possible
return RightNumber();
}
}
//MySubClass is only visible from this library
internal class MySubClass : MyClass
{
int AnotherRightNumber()
{
//Works
b = 8;
//Does not work - a cannot be accessed since it is private
return a;
}
}
There are some restrictions that will be enforced by the compiler. The reverse case of the example above, where we set
MyClass
internal
and MySubClass
public
is not possible. The compiler detects, that having MySubClass
visible to the outside must require MyClass
to also be visible to the outside. Otherwise we have a specialization of a type where the basic type is unknown.
The same is true in general, like when we return an instance of a type that is
internal
in a method that is visible to the outside (public
with the type being public
). In this case the compiler will also tell us that the type that is returned has a stronger access modifier set.
In C# every non-static method has access to the class instance pointer variable
this
.
This variable is treated like a keyword and points to the current class
instance. Usually the keyword can be omitted before calling methods of
the class instance, however, there are multiple scenarios where the this
is very useful.
One of those scenarios is to distinguish between local and global variables. Consider the following example:
class MyClass
{
string str;
public void Change(string str)
{
//Here this.str is the global variable str and
//str is the local (passed as parameter) variable
this.str = str;
}
}
Since methods marked as
static
are independent of instances, we cannot use the this
keyword. Additionally to the this
pointer there is also a base
pointer, which gives us access to all (for the derived class
accessible) members of the base class instance. This way it is possible
to call already re-implemented or hidden methods.class MySubClass : MyClass
{
public override void Write(string name)
{
//First we want to use the original implementation
base.Write(name);
//Then our own
Console.WriteLine("Hi {0} from MySubClass!", name);
}
}
In the example, we are accessing the original implementation of the
Write
method from the re-implementation.Properties
People coming from C++ will know the problem of restricting access to
variables of a class. In generally one should never expose variables of
a class, such that other classes could change it without the class
being notified. Therefore the following piece of code was written quite
often in C++ (code given in C#):
private int myVariable;
public int GetMyVariable()
{
return myVariable;
}
public void SetMyVariable(int value)
{
myVariable = value;
}
This is a clean code and we (as developers) now have the possibility
to react to variable's external changes by inserting some lines of code
before
myVariable = value
. The problem with this code is that- we really only want to show that this is just a wrapper around
myVariable
and that - we need to write too much code for this simple pattern.
Therefore the C# team introduced a new language feature called properties. Using properties the code above boils down to:
private int myVariable;
public int MyVariable
{
get { return myVariable; }
set { myVariable = value; }
}
This looks much cleaner now. Also the access changed. Before we accessed
myVariable
like a method (using a = GetMyVariable()
or SetMyVariable(b)
), but now we access myVariable like a variable (using a = MyVariable
or MyVariable = b
). This is more like the programmer's original intention and saves us some lines of code.
Internally the compiler will still create those (get / set) methods,
but we do not care about this. We will just use properties with either a
get
block, a set
block, or both, and everything will work.The constructor
The constructor is a special kind of method that can only be called
implicitly and never explicitly. A constructor is automatically called
when we allocate memory with the
new
keyword. In perfect
alignment with standard methods, we can overload the constructor by
having multiple definitions that differ by their parameters.
Every class (and structure) has at least one constructor. If we did
not write one (until now we did not), then the compiler places a
standard (no parameters, empty body) constructor. Once we define one
constructor, the compiler does not insert a default constructor.
The signature of a constructor is special. It has no return type,
since it does implicitly return the new instance, i.e. an instance of
the class. Also a constructor is defined by its name, which is the same
name as the class. Let's have a look at some constructors:
class MyClass
{
public MyClass()
{
//Empty default constructor
}
public MyClass(int a)
{
//Constructor with one argument
}
public MyClass(int a, string b)
{
//Constructor with two arguments
}
public MyClass(int a, int b)
{
//Another constructor with two arguments
}
}
This looks quite straight forward. In short, a constructor is a
method with the name of the class that specifies no return value. Using
the various constructors is possible when instantiating an object of the
class.
MyClass a = new MyClass();//Uses the default constructor
MyClass b = new MyClass(2);//Uses the constructor with 1 argument
MyClass c = new MyClass(2, "a");//Uses the constructor with 2 arguments
MyClass d = new MyClass(2, 3);//Uses the other constructor with 2 arguments
Of course it could be that one constructor would need to do the same
work as another constructor. In this case it seems like we only have two
options:
- Copy & paste the content.
- Extracting the content into a method, which is then called by both constructors.
The first is a simple no-go after the DRY (Don't Repeat Yourself)
principle. The second one is maybe also not fine, since this could
result in the method being abused on other locations. Therefore C#
introduces the concept of chaining constructors. Before we actually
execute instructions from one constructor, we call another constructor.
The syntax relies on the colon
:
and the current class instance pointer this
:class MyClass
{
public MyClass()
: this(1, 1) //This calls the constructor with 2 arguments
{
}
public MyClass(int a, int b)
{
//Do something with a and b
}
}
Here the default constructor uses the constructor with 2 parameters
to do some initialization work. The initialization work is the most
popular use-case of a constructor. A constructor should be a lightweight
method that does some preprocessing / setup / variable initialization.
The colon operator for the constructor chaining is used for a reason.
Like with inheritance every constructor has to call another
constructor. If no call is specified (i.e. no previous constructor),
then the called constructor is the default constructor of the base
class. Therefore the second constructor in the previous example does
actually look like the following:
public MyClass(int a, int b)
: base()
{
//Do something with a and b
}
The additional line is, however, redundant, since the compiler will
automatically insert this. There are only two cases where we have to
specify the base constructor for the constructor chaining:
- When we actually want to call another base constructor than the default constructor of the base class.
- and When there is no default constructor of the base class.
The reason for the constructor chaining with the base class constructor is illustrated in the next figure.
We see that in this class hierarchy in order to create an instance of
Porsche
, an instance of Car
has to be created. This creation, however, requires the creation of an instance of a Vehicle
, which requires the instantiation of Object
.
Each instantiation is associated with calling a constructor, which has
to be specified. The C# compiler will automatically call the empty
constructor, but this is only possible in case such a constructor
exists. Otherwise we have to tell the compiler explicitly what to call.
There are also cases where other access modifiers for a constructor
might make sense. If we want to prevent instantiation of a certain type
(like with
abstract
), we could create one default constructor and make it protected
. On the other hand the following is a simple so-called Singleton pattern:class MyClass
{
private static MyClass instance;
private MyClass() { }
public static MyClass Instance
{
get
{
if(instance == null)
instance = new MyClass();
return instance;
}
}
}
Now we cannot create instances of the class, but we can access the static property
Instance
by using MyClass.Instance
. This property not only has access to the static variable instance
, but also has access to all private
members like the private
constructor. Therefore, it can create an instance and return the created instance.
This implementation has two main advantages:
- Because the instance is created inside the
Instance
property method, the class can exercise additional functionality (for example, instantiating a subclass), even though it may introduce unwelcome dependencies. - The instantiation is not performed until an object asks for an instance. This approach is referred to as lazy instantiation. This avoids instantiating unnecessary singletons when the application starts.
We will not discuss other design patterns in this series of tutorials.
Abstract classes and interfaces
There is one more thing we need to discuss in this tutorial.
Sometimes we want to create classes that should just be a sketch for
some more specialized implementations. This is like creating a template
for classes. We do not want to use the template directly (instantiate
it), but we want to derive from the class (use the template), which
should save us some time. The keyword for marking a class as being a
template is
abstract
. Abstract classes cannot be instantiated, but can be used as types of course. Such a class can also mark members as being abstract
. This will require derived classes to deliver the implementation:abstract class MyClass
{
public abstract void Write(string name);
}
class MySubClass : MyClass
{
public override void Write(string name)
{
Console.WriteLine("Hi {0} from an implementation of Write!", name);
}
}
Here we mark the
Write
method as being abstract, which has two consequences:- There is no method body (the curly brackets are missing) in the first method definition.
MySubClass
is required tooverride
theWrite
method (or in general: all methods that are markedabstract
and not implemented yet).
Also the following code will fail, since we create an instance of
MyClass
, which is now marked as being abstract.//Ouch, MyClass is abstract!
MyClass a = new MyClass();
//This works fine, MyClass can still be used as a type
MyClass b = new MySubClass();
An important restriction in doing OOP with C# is the limitation to
inheritance from one class only. If we do not specify a base class then
Object
will be used implicitly otherwise the explicitly specified class will
be used. The restriction to one class in the inheritance process makes
sense, since it keeps everything well-defined and prohibits weird edge
cases. There is an elegant way around this limitation, which builds upon
using so called interface
types.
An
interface
is like a code-contract. Interfaces define
which functionalities should be provided by the classes or structures
that implement them, but they do not say any word about how the exact
function looks like. That being said we can think of those interfaces as
abstract classes without variables and with only abstract
members (methods, properties, ...).
Let's define a very simple interface:
interface MyInterface
{
void DoSomething();
string GetSomething(int number);
}
The defined interface contains two methods called
DoSomething
and GetSomething
. The definitions of these methods look very similar to the definitions of abstract methods, except we are missing the keywords public
and abstract
. This is by design. The idea is that since every member of an interface is abstract
(or to be more precise: misses an implementation), the keyword is
redundant. Another feature is that every method is automatically being
treated as public
.
Implementing an interface is possible by using the same syntax as with classes. Let's consider two examples:
class MyOtherClass : MyInterface
{
public void DoSomething()
{ }
public string GetSomething(int number)
{
return number.ToString();
}
}
class MySubSubClass : MySubClass, MyInterface
{
public void DoSomething()
{ }
public string GetSomething(int number)
{
return number.ToString();
}
}
This snippet should demonstrate a few things:
- It is possible to implement only one interface and no class (this will result in inheriting directly from
Object
) - We can also implement one ore more interfaces and additionally a class (explicit inheritance)
- We always have to implement all methods of the "inherited" interface(s)
- Also as a side note we do not need to re-implement
Write
method on theMySubSubClass
, sinceMySubClass
already implements this
It should be clear that we cannot instantiate interfaces (they are
like abstract classes), but we can use them as types. Therefore it would
be possible to do the following:
MyInterface myif = new MySubSubClass();
Usually interface types start with a big I in the .NET-Framework.
This is a useful convention to recognize interfaces immediately. In our
journey, we will discover some useful interfaces that are quite
important for the .NET-Framework. Some of these interfaces are used by
C# implicitly.
Interfaces also gives us another option for implementing their
methods. Since we can implement multiple interfaces, it is possible that
two methods with the same name and signature will be included. In this
case there must be a way to distinguish between the different
implementations. This is possible by a so-called explicit
implementation. An explicitly implemented interface will not contribute
to the class directly. Instead one has to cast the class to the specific
interface type in order to access the members of the interface.
Here is an explicit implementation:
class MySubSubClass : MySubClass, MyInterface
{
//Explicit (no public and MyInterface in front)
void MyInterface.DoSomething()
{ }
//Explicit (no public and MyInterface in front)
string MyInterface.GetSomething(int number)
{
return number.ToString();
}
}
Explicit and implicit implementations of definitions from an
interface can be mixed. Hence we can only be sure to get access to all
members defined by an interface, if we cast an instance to that
interface.
Exception handling
There are many things that have been designed with OOP in mind in C#.
One of those things is exception handling. Every exception has to
derive from the
Exception
class, which has been placed in the System
namespace. In general we should always try to avoid exceptions,
however, there are cases where an exception could easily happen. One
such example is found in communication with the file system. Here we are
talking to the OS, which sometimes has no other choice than to throw an
exception. There could be various reasons, e.g.:- The given path is invalid.
- The file cannot be found.
- We do not have sufficient rights to access the files.
- The file is corrupt and cannot be read.
Of course the OS API could just return a pseudo file or pseudo
content and everything would work. The problem with such a handling is
that this does not represent reality, and we would have no way to detect
that obviously something went wrong. Another option would be to return
an error code, but this would result in a C like API and it would leave
the handling to the programmer. If the programmer now would do a bad job
(like ignoring the returned error code), the user would never see that
something went wrong.
Here is where exceptions come into play. The important thing about an
exception is that once an exception is possible, we should think about
handling it. In order to handle such an exception, we need a way to
react to it. The construct is the same as in C++ or Java: Thrown
exceptions can be caught.
try
{
FunctionWhichMightThrowException();
}
catch
{
//React to it
}
In the example, we call a method named
FunctionWhichMightThrowException
. Calling this method might result in an exception, which is why we put it in a try
-block. The catch
-block
is only entered if an exception is thrown, otherwise it will be
ignored. What this example is not capable of doing is reacting to the
specific exception. Right now we just react to any exception, without
touching the exception that has been thrown. This is, however, very
important and should therefore be done:try
{
FunctionWhichMightThrowException();
}
catch(Exception ex)
{
//React to it e.g.
Console.WriteLine(ex.Message);
}
Since every exception has to derive from
Exception
, this will always work and we will always be able to access to the property Message
. This is a so called catch'em all
block. Sometimes, however, we want to distinguish between the various
exceptions. Coming back to our example with the file system above, we
can expect that every unique scenario (e.g. path invalid, file not
found, insufficient rights, ...) will throw a different kind of
exception. We could differentiate between those exceptions by defining
more catch
-blocks:byte[] content = null;
try
{
content = File.ReadAllBytes(/* ... */);
}
catch (PathTooLongException)
{
//React if the path is too long
}
catch (FileNotFoundException)
{
//React if the file has not been found
}
catch (UnauthorizedAccessException)
{
//React if we have insufficient rights
}
catch (IOException)
{
//React to a general IO exception
}
catch (Exception)
{
//React to any exception that is not yet handled
}
There should be two lessons from this example.
- We can specify multiple
catch
-blocks, each with its own handling. The only limitation is that we should specify it in such a order, that the most general exception is called last, while the most specific is first. - We do not need to name the variable of the exception. If we name it we will get access to the
Exception
object, but sometimes we do not care about the specific object. Instead we just want to differentiate between the various exceptions.
Now that we can catch those nasty exceptions, we may want to throw exceptions ourselves. This is done by using the
throw
keyword. Let's see some sample code:void MyBuggyMethod()
{
Console.WriteLine("Entering my method");
throw new Exception("This is my exception");
Console.WriteLine("Leaving my method");
}
If we call this method we will see that the second
WriteLine
method will not be called. Once an exception is thrown the method is left immediately. This goes on until a suitable try
-catch
-block
is wrapping the method call. If no such block is found then the
application will crash. This behavior is called bubbling. Alternatively
we could have also written our own class that derives from Exception
:class MyException : Exception
{
public MyException
: this("This is my exception")
{
}
}
Now our code above could have been changed to become the following:
void MyBuggyMethod()
{
Console.WriteLine("Entering my method");
throw new MyException();
Console.WriteLine("Leaving my method");
}
Coming back again to our example that plays around with the file
system. In this scenario we might end up with some open file handle.
Therefore, whether we get some exception or not, we want to close that
handle to clean up the open resources. In this scenario another block
would be very helpful. A block that performs a final action that does
not depend on the actions in the
try
or any catch
block. Of course such a block exists and is called a finally
-block.FileStream fs = null;
try
{
fs = new FileStream("Path to the file", FileMode.Open);
/* ... */
}
catch(Exception)
{
Console.WriteLine("An exception occurred.");
return;
}
finally
{
if(fs != null)
fs.Close();
}
Here we should note that
return
in one block will still call the code in the finally
-block. So in total we have the option of using a try
-catch
, a try
-catch
-finally
or a try
-finally
block. The last one will not catch the exception (i.e. the exception
will bubble up), but still invoke the code that is given in the finally
-block (no matter what happens in the try
-block).Outlook
In the next tutorial we will learn about more advanced features in C#
and extend our knowledge in object-oriented programming. With our
knowledge in C# improving, we are ready to dive more into the
.NET-Framework.
Comments
Post a Comment