As several people have pointed out, the code in
tutorial does not create a true CSV parser. We did, however, publish a
followup tutorial that is a fully functional CSV parser.
And here it is: C# Tutorial - Using The Built In OLEDB CSVParser
So you are sitting around and you somehow have 100 Comma Separated Value
files (CSV) and you are not quite sure exactly what the best way to read
them is. Well, if you are using Visual Studio and C#, you are in quite
a bit of luck, because you can read a CSV file quite easily. With one
very small function you can spit out a list of values, separated
conveniently by rows and columns. Then you can take this list and use it
however you want, perhaps in a DataTable or GridView object.
The parser we are going to build today is going to be extremely simple,
and will in fact break on more complicated CSV files (files that have
commas actually in the data, etc...). But for most CSV files, this will
work fine - and look for a tutorial in the near future about building a
parser that can easily deal with even the most convoluted of CSV files.
To start off, you need to open up Visual Studio and start a new C#
application project, so go ahead and do that, naming the project
whatever you want. Once your project is up and ready, you need to find a
place to build and call your parser function. If you right click on your
Form1.cs, then go to 'View Code' you will get your form1's code. Inside
the main Form1 : Form class, under your public Form1() definition is the
perfect place for your function for now. Later on you can move it to
somewhere more permanent, but for now we will get the function working.
Sadly, one of the namespaces we will be using is not declared by default
in the standard 'using' statements at the top of our file. But all you
have to do is add it below all the others:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO; //System.IO is not used by default
Now we can tear up some serious code. The first step is to declare our
parser function. It will look something like:
public List<string[]> parseCSV(string path)
{
}
This is a pretty simple function, which will take in a string that
represents the path of the CSV file and spit out a List of string
arrays. Now you may be asking why not just use a string array of string
arrays (string[][])? Well, adding elements to an array is not exactly
efficient, but a list can be added to, subtracted from, and is just
generally a lot more flexible.
The next thing we need to do is declare our return variable, which is
really just one line. So inside the function, as the first line, we
have:
List<string[]> parsedData = new List<string[]>();
This is just declaring a List of string arrays that will hold our file
information as we read each line. Now the cool thing is that the
System.IO
namespace has this neat class called StreamReader, which can
open a text based file and read it line by line. This gives us the
advantage of just calling a method that reads the file line by line
rather than byte by byte. StreamReader is extremely handy for reading
text files and is a perfect candidate for us in this case.
We are gong to declare our StreamReader with a using statement so it
will be disposed once it leaves scope, and when it is disposed it will
be closed automatically. So declaring the new StreamReader object will
look something like:
using (StreamReader readFile = new StreamReader(path))
{
}
Take notice that the actual declaration is inside the using statement.
Inside this statement we will be doing everything involving reading the
file and building our list of string arrays. Now all a comma separated
values file is is exactly what you would think - a file full of values
separated by commas. Each line really corresponds to a row of data, so
all we have to do is read the file line by line, then separate the
values. Since the StreamReader Class can read a file line by line, all
we really need to do is take the line and split it. But first we have to
declare some variables, inside the using block of course.
We will need two variables, one to hold the line as it is read and an
array to hold the separated values. We will call these
line
and row
:using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
Next we have to read the file line by line, which can be done with a
very simple while statement. We will be reading the file until the
current line is null, which will be the case when there are no more
lines to read. To do this we set our line variable to our currently read
line, then when the line is null, stop reading the file. It will look
like so:
while ((line = readFile.ReadLine()) != null)
{
}
A very simple while loop that runs through the file until you reach a
line that is empty. *Take note that a line is not null if it is space,
newline, or the like. A line is only null if there is truly nothing
there.* Inside this loop, all we need to do is split each line at the
commas, then add the resulting array to our list. Luckily there are many
times you need to split a string, so the basic string class has a method
to do just this. After we split the line, adding it to the list is just
as simple. We call
list.add()
. So with two lines of code we can do
what we need to. After our additions our while
statement will look
like:while ((line = readFile.ReadLine()) != null)
{
row = line.Split(',');
parsedData.Add(row);
}
Simple yet effective. So simple in fact, that you really don't have to
read in just comma separated files, but any file separated by a standard
character can be read. all you have to do is change the split() call to
whatever character is splitting the file. As mentioned above, the first
line sets our row variable to the values of our split string, and the
second line adds that string array to our list. Not difficult to
understand at all.
The end of the while loop actually means the end of our using block as
well. After the using block we have a completely filled list of string
arrays, which represent rows of data in our CSV file. All we need to do
now is put the whole thing in a Try-Catch block, which will catch any
errors we may get when attempting to open or read the file.
We don't actually need anything fancy, in fact we will just catch any
exception we get in the using block (since there are a bunch of
different kinds that could be thrown). So our final function will look
something like:
public List<string[]> parseCSV(string path)
{
List<string[]> parsedData = new List<string[]>();
try
{
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
while ((line = readFile.ReadLine()) != null)
{
row = line.Split(',');
parsedData.Add(row);
}
}
}
catch (Exception e)
{
MessageBox.Show(e.Message);
}
return parsedData;
}
Notice that we just take the message from the exception and display it
with the standard MessageBox class. This will work, and our function
will actually just return an empty list on any error, which means that
our code actually doesn't break, we just don't get any data. So after we
return our parseData list, whether filled or not, our function ends.
A small function that is easy to understand and even easier to build.
Even better, since it returns a basic list object, you can use the data
returned to do anything from fill a grid to making complex calculations.
You can also make this function read any type of separated file, just
change the separator in the split statement.
I
hope this tutorial was informative and most of all useful.
Comments
Post a Comment