Wednesday, June 19, 2013

Comparing public folder item counts

A question that is often asked of support in regard to legacy Public Folders is whether they are replicating and how much progress they are making.  The most common scenario for asking this question arises when the administrator is adding a new Public Folder database to the organization and replicating a large amount of data to it.  What commonly happens is that the administrator calls support and says "The database on the old server is 300GB, but the new database is only 150GB!  How can I tell what still needs to be replicated?  Is it still progressing??"  The administrator can raise diagnostic logging for public folders, but reading the events to see what folders are replicating is tedious.  Most administrators want a more detailed way of estimating the progress of replication than comparing file sizes.  They also want to avoid checking all the individual replication events.

There are a number of ways to monitor the progress of the replication so that one can make a guess as to how long a particular environment will take to complete an operation.  In this blog I am going to provide a detailed example of one approach to estimating the progress of replication by comparing item counts between different public folder stores.

To get the item counts in an Exchange 2003 Public folder database you can use PFDAVAdmin.  The process is outlined in this previous EHLO blog post.  For what we are doing below you will need the displayname, folderpath and the total number of items in the folder; the rest of the fields are not necessary.

To get the item counts on an Exchange 2007 Server you use (remember there is only one Pub per server):

Get-publicfolderstatistics -server <servername> | export-csv c:\file1.txt

To get the item counts on an Exchange 2010 server you use:

Get-publicfolderstatistics -server <servername> -resultsize unlimited | export-csv c:\file1.txt

There are some very important caveats to this whole procedure.  The things you need to watch out for are:

  • We are only checking item counts.  If you delete 10 items and add 10 items between executions of the statistics data gathering this type of query will not reveal whether they have replicated.  Therefore having the same number on both sides is not an assurance that the folders are in sync. 
  • If you are comparing folders that contain recurring meetings the item counts can be different on Exchange 2007 and older because of the way WebDAV interacts with those items
  • I have seen many administrators try to compare the size of one Public Folder database to the size of another.  Such an approach to checking on replication does not take into account space for deleted items, overhead and unused space.  Checking item counts is more reliable than simply comparing item sizes
  • The two databases might be at very different stages of processing replication messages.  It is unlikely that both pubs will present the same numbers of items if the folders are continuously active.  Even if the folders are seeing relatively low activity levels it is not uncommon for the item count to be off by one or two items because the replication cycle (which defaults to every 15 minutes) simply hasn't gotten to the latest post
  • If you really want to know if two replicas are in sync try to remove one.  If Exchange lets you remove the instance then you know Exchange believes the folders are in sync.  If Exchange cannot confirm the folders are in sync it will keep the instance until it can complete the backfill from it.  In most cases the administrators I have spoken with are not in a position where they can use this approach.

For the actual comparison you can use any number of products.  For this blog I have chosen Microsoft Access for demonstrating the process of comparing the CSV files from the different servers.  To keep things simple I am going to use the Jet database engine packaged with Access.  There are some limitations to my approach:

  • Jet has a maximum file size of 2GB so if your public folder infrastructure is particularly large (i.e.  your CSV files are over 500MB) you may have to switch to using Microsoft SQL.
  • I am not going to compare public folders with a Folder path greater than 254 characters because the Jet database engine that ships with Access cannot join memo fields in a query.  Working around the join limitation by splitting the path across multiple text fields is beyond the scope of this blog.
  • I am going to look at folders that exist in both CSV files.   If the instance has not been created and its data exported into the CSV file the folder will not be listed.

An outline of the process is:

  • Export the item counts from the two servers you wish to compare
  • Import the resulting text files
  • Clean up the data for the final query
  • Run a query to list the item counts for all folders that are in Both files and the difference in the item counts between the originally imported files

Assumptions for the steps below:

  • You have exported the public folder statistics with the PowerShell commands presented above
  • You have fields named FolderPath, ItemCount and Name in the CSV file

If your file is different than expected you will have to modify the steps as you go along

Here are the steps for conducting the comparison:

1. Create a new blank Microsoft Access database in a location that has more than double the size of your CSV files available as free space.

2. By default export-csv places a line at the top of the text file.  This line will interfere with the import so we need to remove it.  Open each CSV file in notepad (this can take a while for larger files) and remove the line highlighted below.  In this example the line starting with "AdminDisplayName" would become the topmost line of the file.  Once the top line is deleted close and save the file.

image 
Figure 1

3. Import the CSV file to a new table:

  • Click on the External Data tab as highlighted in Figure 2
  • Browse to the CSV file and select it (or type in its path and name directly)
  • Make sure the "Import the source data into a new table in the current database' option is selected
  • Click OK

image 
Figure 2

4. In the wizard that starts specify the file is delimited as shown and then click Next.

image 
Figure 3

5. Tell the wizard that the text qualifier is the double quote (character 34 in ASCII), the delimiter is the comma and that the "First Row Contains Field Names" as shown in Figure 4.

Note:  It is possible that you will receive a warning when you click "First Row Contains Field Names".  If any of the field names violate the rules for a field name Access will display a warning.  Don't panic.  Access will replace the non-conforming names with ones it considers appropriate (typically Field1, Field2, etc.).  You can change the names if you wish on the Advanced screen.

image 
Figure 4

6. Switch to Advanced view (click the Advanced button highlighted in Figure 4) so that we can change the data type of the FolderPath field.  In Access 2010 and older the data type needs to be changed from Text to Memo.  In Access 2013 it needs to be changed from Short Text to Long Text.  While we are in this window you have the option to exclude columns that are not needed by placing a checkmark in the box from the skip column.  In this blog we are only going to use the FolderPath, name and the item count.  You can also exclude fields earlier in the process by specifying what fields will be exported when you do the export-csv.  The following screenshots show the Advanced properties window.

image 
Figure 5a: Access 2010 and older

image 
Figure 5b: Access 2013

Note:  If you think you will be doing this frequently you can use the Save As button to save your settings.  The settings will be saved inside the Access database and can then be selected during future imports by clicking on the Specs button.

7. Click OK on the Advanced dialog and then click Finish in the wizard.

8. When prompted to save the Import steps click Close.  If you think you will be repeating this process in the future feel free to explore saving the import steps.

9. Access will import the data into a table.  By default the table will have the same name as the source CSV file.  The files used in creating this blog were called 2007PF_120301 and 2010 PF_120301.  If there are any import errors they will be saved in a separate table.  Take a moment to examine what they are.  The most common is that a field got truncated.  If that field is the folderpath it will affect the comparisons later.  If there are other problems you will have to troubleshoot what is wrong with the highlighted lines (typically there should be no import errors as long as the FolderPath is set as a Memo field).

10. Go back to Step 2 to import the second file that will be used in the comparison. 

11. Now a query must be run to determine if any folderpath exceeds 255 characters.  Fields longer than 255 characters cannot be used for a join in an Access query.  If we have values that exceed 255 characters in this field we will need to exclude them from the comparison.  Additional work to split a long path across multiple fields can be done, but that is being left as an exercise for any Access savvy readers. 

12. To get started select the options highlighted in Yellow in Figure 6:

image 
Figure 6

13. Highlight the table where we want to check the length of the folderpath field as shown in Figure 7.  Once you have selected the table click Add and then Close:

image 
Figure 7

14. Switch to SQL view as shown in Figure 8:

image 
Figure 8

15. Replace the default select statement with one that looks like this (please make sure you substitute your own table name for the one that I have Bolded in the example):

SELECT Len([FolderPath]) AS Expr1, [2007PF_120301].FolderPath 
FROM 2007PF_120301 
WHERE (((Len([FolderPath]))>254));

Note:  Be sure the semi-colon is the last character in the statement.

16. Run the query using the red "!" as shown in Figure 9: 

image 
Figure 9

image 
Figure 10

17. If the result is a single empty row (as shown in Figure 10) then skip down to step 19.  If the result is at least one row then go back to SQL view (as shown in Figure 8) and change the statement to look like this one (as before please make sure 2007PF_120301 is replaced with the table name actually being used in your database):

SELECT [2007PF_120301].FolderPath, [2007PF_120301].ItemCount, 
[2007PF_120301].Name, [2007PF_120301].Identity INTO 2007PF_120301_trimmed 
FROM 2007PF_120301 
WHERE (((Len([FolderPath]))<255));

18. You will get a prompt like the one in Figure 11 when you run the query.  Select Yes:

image 
Figure 11

19. After it is done repeat steps 11-18 for the other CSV file that was imported to be part of the comparison.  If you have done steps 11-18 for both files you will be comparing then advance to step 20.

20. Originally the FolderPath was imported as a memo field (Long Text if using Access 2013).  However we cannot join memo fields in a query.  We need to convert them to a text field with a length of 255. 

If you got a result greater than zero rows in step 16 this step and the subsequent steps will all be carried out on the table specified in the INTO clause of the SQL statement (in this blog that table is named 2007PF_120301_trimmed). 

If you were able to skip steps 17 and 18 this step and the subsequent steps will be carried out on the table you imported (2007PF_120301 in this example).

Open the table in Design view by right-clicking on it and selecting Design View as shown in Figure 12.  If you select the wrong tables for the subsequent steps you will get a lot of unwanted duplicates in your final comparison output.

image 
Figure 12

21. Change the folderpath from Memo to Text as shown in Figure 13.  If you are using Access 2013 change it from Long Text to Short Text.

image 
Figure13

22. With the FolderPath field highlighted look to the lower part of the Design window where the properties of the currently selected field are displayed.  Change the field size of folderpath to 255 characters as shown in Figure 14.

image 
Figure 14

23. Save the table and close its design view.  You will be prompted as shown in Figure 15.  Don't panic.  All the folderpaths should be shorter than the 255 characters specified in the properties of the table.  The dialog is just a standard warning from Access.  No data should be truncated (the earlier queries should have seen to that).  Say Yes and repeat steps 20-23 for the other table being used in this comparison.  If you make a mistake here remember that you will still have your original CSV files and can always fix the mistake by removing the tables and redoing the import.

image 
Figure 15

24. We have been on a bit of a journey to make sure we prepared the tables.  Now for the comparison.  Create a new query (as shown in Figure 6) and highlight both tables that have had the FolderPath shortened to 255 characters as shown in Figure 16.  Once they are highlight click Add and then close.

image 
Figure 16

25. Drag Folderpath from the table that is the source of your replication to Folderpath on the other database.  The result will look like Figure 17.

image 
Figure 17

26.   In the top half of the Query Design window we have the tables with their fields listed.  In the bottom half we have the query grid.  You can make fields appear in the grid in 3 ways:

  • Switch the SQL view and add them to the Select statement
  • Double-click the field in the top half of the window
  • Drag the field from the top half of the window to the grid
  • Click in the Field line of the grid and a drop down will appear that you can use to select the fields
  • Type the field name you want on the Field in the grid

For this step we need to add:

  • One copy of the folderpath field from one table (doesn't matter which one)
  • The ItemCount field from each table

27.   Go to an empty column in the grid.  We need to enter the text that will tells us the difference between the two item counts.  Type the following text into the column (be sure to use the table names from your own database and not my example): 

Expr1:  Abs([2007PF_120301_trimmed].[itemcount]-[2010pf_120301_trimmed].[itemcount])

Note:  After steps 25-27 the final result should look like  Figure 18.  The equivalent SQL looks like this:

SELECT [2007PF_120301_trimmed].FolderPath, [2007PF_120301_trimmed].ItemCount, [2010PF_120301_trimmed].ItemCount, Abs([2007PF_120301_TRIMMED].[ItemCount]-[2010PF_120301_TRIMMED].[ItemCount]) AS Expr1 
FROM 2007PF_120301_trimmed INNER JOIN 2010PF_120301_trimmed ON [2007PF_120301_trimmed].FolderPath = [2010PF_120301_trimmed].FolderPath;

image 
Figure 18

28. Run the query using the red "!" shown in Figure 9.  The results will show you all the folders that exist in BOTH public folder databases, the itemscount in each database and the difference between them.  I like the difference reported as a positive number, but you might prefer to remove the absolute value function.

There is more that can be done with this.  You can use Access to run a Find Unmatched query to find all items from one table that are not in the other table (thus locating folders that have an instance in one database, but not the other).  You can experiment with different Join types in the query and you can deal with Folderpaths longer than a single text field can accommodate.  These and any other additional functionality you desire are left as an exercise for the reader to tackle.  I hope this provides you with a process that can be used to compare the item counts between two Public Folder stores (just remember the caveats at the top of the article).

Thanks To Bill Long for reviewing my caveats and Oscar Goco for reviewing my steps with Access.

Chris Pollitt


http://blogs.technet.com/b/exchange/archive/2013/05/28/comparing-public-folder-item-counts.aspx




--
Thanks & Regards
G.BalaKrishna
Contact : 9962552505
            : 9841710719

No comments:

Post a Comment