weewar.com corner
Subscribe via e-mail:
Search this site:

plaxoed!

[Mark Jen’s life @ Plaxo]

cased style

November 2, 2006

Little tool to extract all files out of Sharepoint

Filed under: work, technology, Microsoft, Plaxo — markjen @ 3:05 pm

We’ve been using Microsoft Sharepoint here at Plaxo to organize our projects and files. For those who have used Sharepoint, you know it’s a love/hate relationship :)

Anyways, we needed to get a copy of all the files in our Sharepoint site recently. Looking around on the web, we found a few solutions, but couldn’t really find one that did exactly what we wanted. So, I wrote up a quick little app to grab all the files we needed out of Sharepoint, while preserving Sharepoint’s directory structure. It’s a command line tool and it doesn’t have any options.

It’s really not that hard; here’s the source in case you’re interested:

using System;
using System.Collections.Generic;
using System.Text;
using System.Data;
using System.Data.SqlClient;
using System.IO;
 
// replace this string with your Sharepoint content DB connection string
string DBConnString = “Server=YOURSHAREPOINTSERVER;Database=CONTENTDATABASE;Trusted_Connection=True;”;
 
// create a DB connection
SqlConnection con = new SqlConnection(DBConnString);
con.Open();
 
// the query to grab all the files.
// Note: Feel free to alter the LeafName like ‘%.extension’ arguments to suit your purpose
SqlCommand com = con.CreateCommand();
com.CommandText = “select DirName, LeafName, Content from Docs where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and Content is not NULL”;
 
// execute query
SqlDataReader reader = com.ExecuteReader();
 
while (reader.Read())
{
    // grab the file’s directory and name
    string DirName = (string)reader[“DirName”];
    string LeafName = (string)reader[“LeafName”];
 
    // create directory for the file if it doesn’t yet exist
    if (!Directory.Exists(DirName))
    {
        Directory.CreateDirectory(DirName);
        Console.WriteLine(“Creating directory: “ + DirName);
    }
 
    // create a filestream to spit out the file
    FileStream fs = new FileStream(DirName + “/” + LeafName, FileMode.Create, FileAccess.Write);
    BinaryWriter writer = new BinaryWriter(fs);
 
    // depending on the speed of your network, you may want to change the buffer size (it’s in bytes)
    int bufferSize = 1000000;
    long startIndex = 0;
    long retval = 0;
    byte[] outByte = new byte[bufferSize];
 
    // grab the file out of the db one chunk (of size bufferSize) at a time
    do
    {
        retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize);
        startIndex += bufferSize;
 
        writer.Write(outByte, 0, (int)retval);
        writer.Flush();
    } while (retval == bufferSize);
 
    // finish writing the file
    writer.Close();
    fs.Close();
 
    Console.WriteLine(“Finished writing file: “ + LeafName);
}
 
// close the DB connection and whatnots
reader.Close();
con.Close();

Here’s my VS.NET 2005 project file. If you need help with the code, please let me know.

25 Responses to “Little tool to extract all files out of Sharepoint”

  1. rose Says:

    hmm, i think you could have just used sharepoint in the folder view, then dragged everything onto your desktop :)

    ah, how i wished we used sharepoint here at work…

  2. adam herscher Says:

    out of curiosity, why did you need to do this?

  3. Chupes Says:

    hello. pardon my newbieness here, but is this a sql script?o what other language?

  4. markjen Says:

    Hi Chupes,

    Yup, this is a simple SQL query that extracts all the files in sharepoint and writes them to the file system.

    It’s written in C#, but it’s .NET, so you could easily convert it to VB.NET, C++.NET, etc.

  5. m Says:

    you rock

  6. Steven Borg Says:

    Nice work!!! Yes, the theory is that you can just drag and drop from a folder view, however, that sucks when you’re trying to automate the retrieval! Or, like in our case, where the entire WSS 2.0 site got completely hosed during the upgrade to WSS 3.0 and we could see all the files, but not download them. This script was a lifesaver!

  7. Eric Says:

    Mark, When I parse or execute this script in the Query Analyzer I get the following error, and I cannot pinpoint the problem. Any suggestions? Because I too need to pull documents from a wss2 database that is not in use anymore.

    [Microsoft][ODBC SQL Server Driver]Syntax error or access violation

  8. Eric Says:

    Ignore the message above. I now have the code and your C# project file loaded in vs.net. Running the debug led me to find an error that states the Content column dosen’t exist. This database is the Moss 07 beta, so I am not sure if that is the issue. Since the code just checks to see if it is null, I took it out.

    Now I think I am so close becasue it will actually create a directory structure, but errors on pulling the file over in “chunks”. The debug says the “Index was outside the bounds of the array” and it is point to the outByte portion of this statement:

    retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize)

    I’m still looking for the answer myself and I am hoping I can get this to work. Any help would be great, thanks.

  9. Eric Says:

    I have modified the code to work with SharePoint 07. Since the Content is stored in a new table with 07, the query piece needs to be a little different. Just replace line 24 with the sting below. It worked perfectly for me after that.

    Find
    com.CommandText = “select DirName, LeafName, Content from Docs where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and Content is not NULL”;

    Replace With
    com.CommandText = "select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content] from [AllDocs],[AllDocStreams] where (LeafName like '%.doc' or LeafName like '%.xls' or LeafName like '%.pdf' or LeafName like '%.ppt') and [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]";

  10. mINDAUGAS Says:

    Hi, is there any way to use your tool for default Windows Internal Database msde i’m trying to connect to database seting Serter=fs\microsoft##ssee with no sucess. fs -is the same host there installated sharepoint service and internal database .
    thanks.

  11. bud Says:

    you da man!sharepoint config db corrupted but was able to recover our files with your script!thank you! thank you!

  12. JeffD Says:

    Thanks!!! The WSS 2>3 converter left my database (with over 2 gigs of files) hosed. Your script got me back an intact file system full of my files. Don’t know how to say thanks enough.

  13. wbrproductions Says:

    markjen and #9 are the shiznizkydoodle! Thanks, folks, you just saved my butt. That worked like a goddam lucky charm. 2 weeks of frustration dissolved by a 15 second script.

  14. Alex Guzun Says:

    My MOSS2007 just gone to hell… got you app. Fantastic job…
    Is it possible to get lists? We had a big amount of data in lists :(
    To get wiki, i think it is enough to add *.html

    Thank you very much in advance.

  15. Alex Guzun Says:

    Another problem :( It can’t created looong directories. Program stops when tried to create dir witch full path contain 194 symbols :( Is it possible to made something (at least) to pass that dir, not to fail?

  16. Alex Guzun Says:

    Any solution to skip ‘long directory’?

    Unhandled Exception: System.IO.PathTooLongException: The specified path, file name, or both are too long. The fully qual
    ified file name must be less than 260 characters, and the directory name must be less than 248 characters.
    at System.IO.Path.NormalizePathFast(String path, Boolean fullCheck)
    at System.IO.Path.GetFullPathInternal(String path)
    at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShar
    e share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy)
    at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access)
    at DoxoEater.Program.Main(String[] args) in c:\!!!\DoxoEater\DoxoEater\Program.cs:line 44

  17. markjen Says:

    Hi Alex, you could run a quick check on the path size before line 44 in program.cs. Not sure what you’d do with the files that were supposed to be in the long-name directory though. Perhaps you could write a small function that would shorten the directory name?

  18. Happy Says:

    Thanks a million. You program saved our documents.
    I have used several days to try to get Moss to do this. Now I can delete all and make a new installment.

    thanks
    :-)

  19. Ryan Says:

    How can I get the latest version of this code? I have been reading through this and there seems like some bugs have been fixed.

    What would make this perfect for me is if there was logging that confirms each file did or did not get written to a drive.

    Does anyone know of a program similar to this that I can purchase?

  20. ryan Says:

    It’s me again. I have been using this script and it has worked very well.

    Does anyone know if there is a simular script to extract List data?

  21. joe Says:

    Is there any code out there to unload/extract calendar data in a script for some automation purposes I have?

  22. Wordpress Says:

    developmentchinese jacketcompetition Chinesedressdevelopment capitalsilk clothesStreet grilchinese dresscultural womancheongsamhistory ձӰ 0.8833553045874069

  23. Dwain Says:

    I thought I’d pile on and say THANK YOU - this is a great little utility. I added an args parameter so I could get the documents out of any specific site, but your code was a great start. Thanks!

  24. abhijeet Says:

    Getting errors on pulling the file over in “chunks”. The debug says the “Index was outside the bounds of the array” and it is point to the outByte portion of this statement:

    retval = reader.GetBytes(2, startIndex, outByte, 0, bufferSize)

    earlier eric posted the same problem but here i haven’t see any comment about that, i f any one have idea about this please help me..

    I’m still looking for the answer myself and I am hoping I can get this to work. Any help would be great, thanks.

  25. Paul Says:

    This is a great script, but you will notice that this does not maintain version information. In order to do so you need the following commands

    select [DirName], [LeafName], [Content], [UIVersion] as Version from [Docs] where [Content] is not NULL

    select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content],[AllDocs].[UIVersionString] as Version from [AllDocs],[AllDocStreams] where [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]

    Add the following to the reader.Read loop

    string Version = (string)reader[”Version”].ToString();

    //Add this below the reader assignment of leafName
    string leafName= “[” + Version.ToString() + “]” + LeafName;

    I think that this will address all scenarios where a file is over written. As a precaution I have also added some logging in my script to check and see if the exists before I extract it. I recommend that you do the same.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>