Как разделить большой xml-файл?

Question

Как разделить большой xml-файл?

Мы экспортируем "записи" в xml-файл; один из наших клиентов пожаловался, что файл слишком велик для обработки их другой системой. Поэтому мне нужно разделить файл, повторяя "раздел заголовка" в каждом из новых файлов.

Поэтому я ищу что-то, что позволит мне определить некоторые XPath для секции(секций), которые всегда должны выводиться, и другой xpath для "строк" с параметром, который говорит, сколько строк поместить в каждый файл и как назвать файл. файлы.

Прежде чем я начну писать какой-то пользовательский .net-код для этого; Есть ли стандартный инструмент командной строки, который будет работать в windows, который делает это?

(поскольку я знаю, как программировать на C#, я больше люблю писать код, чем пытаться возиться со сложным xsl и т. д., Но "само" решение было бы лучше, чем пользовательский код.)

898 7

windows xml

7 ответов:

Comments

Ничего не найдено.

bill seacham · Accepted Answer · 2010-12-18 21:16:58

" Есть ли стандартный инструмент командной строки, который будет работать на windows, который делает это?"

Да. http://xponentsoftware.com/xmlSplit.aspx

Robert Rossney · Accepted Answer · 2010-12-01 22:29:17

Для этого нет универсального решения, потому что существует так много различных возможных способов структурирования исходного XML.

Достаточно просто построить преобразование XSLT, которое будет выводить фрагмент XML-документа. Например, учитывая этот XML:
<header>
  <data rec="1"/>
  <data rec="2"/>
  <data rec="3"/>
  <data rec="4"/>
  <data rec="5"/>
  <data rec="6"/>
</header>
С помощью этого XSLT можно вывести копию файла, содержащего только элементы data в определенном диапазоне:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>
  <xsl:param name="startPosition"/>
  <xsl:param name="endPosition"/>

  <xsl:template match="@* | node()">
      <xsl:copy>
          <xsl:apply-templates select="@* | node()"/>
      </xsl:copy> 
  </xsl:template>

  <xsl:template match="header">
    <xsl:copy>
      <xsl:apply-templates select="data"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="data">
    <xsl:if test="position() &gt;= $startPosition and position() &lt;= $endPosition">
      <xsl:copy>
        <xsl:apply-templates select="@* | node()"/>
      </xsl:copy>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>
(заметим, кстати, что поскольку это основано на тождестве преобразование, оно работает, даже если header не является элементом верхнего уровня.)

Вам все еще нужно подсчитать элементы data в исходном XML и повторно запустить преобразование со значениями $startPosition и $endPosition, которые подходят для данной ситуации.

ewroman · Accepted Answer · 2014-08-30 08:47:06

Сначала загрузите редактор foxe xml по этой ссылке http://www.firstobject.com/foxe242.zip

Смотрите это видео http://www.firstobject.com/xml-splitter-script-video.htm Видео объясняет, как работает разделенный код.

На этой странице есть код скрипта (начинается с split() ) скопируйте код и в программе редактора xml создайте "новую программу "под"файлом". Вставьте код и сохраните его. Код такой:
split()
{
  CMarkup xmlInput, xmlOutput;
  xmlInput.Open( "**50MB.xml**", MDF_READFILE );
  int nObjectCount = 0, nFileCount = 0;
  while ( xmlInput.FindElem("//**ACT**") )
  {
    if ( nObjectCount == 0 )
    {
      ++nFileCount;
      xmlOutput.Open( "**piece**" + nFileCount + ".xml", MDF_WRITEFILE );
      xmlOutput.AddElem( "**root**" );
      xmlOutput.IntoElem();
    }
    xmlOutput.AddSubDoc( xmlInput.GetSubDoc() );
    ++nObjectCount;
    if ( nObjectCount == **5** )
    {
      xmlOutput.Close();
      nObjectCount = 0;
    }
  }
  if ( nObjectCount )
    xmlOutput.Close();
  xmlInput.Close();
  return nFileCount;
}
Измените выделенные жирным шрифтом (или * * * * отмеченные) поля на твои нужды. (это также выражено на странице видео)

В окне редактора xml щелкните правой кнопкой мыши и выберите команду Выполнить (или просто F9). В окне есть панель вывода, где отображается количество созданных файлов.

Примечание: имя входного файла может быть "C:\\Users\\AUser\\Desktop\\a_xml_file.xml" (двойные косые черты) и выходной файл "C:\\Users\\AUser\\Desktop\\anoutputfolder\\piece" + nFileCount + ".xml"

loomi · Accepted Answer · 2015-09-22 15:52:01

Как уже упоминалось, xml_split из пакета Perl XML:: Twig делает большую работу.

Использование
xml_split < bigFile.xml

#or if compressed e.g.
bzcat bigFile.xml.bz2 | xml_split
Без каких-либо аргументов xml_split создает файл для каждого дочернего узла верхнего уровня.

Существуют параметры для указания количества элементов, которые вы хотите получить в файле (-g) или приблизительного размера (-s <Kb|Mb|Gb>).

Установка

Окна

Смотрите сюда

Linux

sudo apt-get install xml-twig-tools

Oded · Accepted Answer · 2010-12-01 17:26:20

Нет ничего встроенного, что могло бы легко справиться с этой ситуацией.

Ваш подход звучит разумно, хотя я, вероятно, начну с "скелета" документа, содержащего элементы, которые необходимо повторить, и сгенерирую несколько документов с "записями".

Обновление:

Немного покопавшись, я нашел эту статью, описывающую способ разделения файлов с помощью XSLT.

Gfy · Accepted Answer · 2014-04-05 18:01:04

Xml_split-разбиение огромных XML-документов на более мелкие фрагменты

Http://www.perlmonks.org/index.pl?node_id=429707

Http://metacpan.org/pod/XML::Twig

Steve Black · Accepted Answer · 2015-06-11 05:54:56

Использование Ultraedit на основе https://www.ultraedit.com/forums/viewtopic.php?f=52&t=6704

Все, что я добавил, - Это биты верхнего и нижнего колонтитулов XML Первый и последний файл необходимо исправить вручную (или удалить корневой элемент из исходного кода).

    // from https://www.ultraedit.com/forums/viewtopic.php?f=52&t=6704 

var FoundsPerFile = 200;      // Global setting for number of found split strings per file.
var SplitString = "</letter>";  // String where to split. The split occurs after next character.
var xmlHead = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?>';
var xmlRootStart = '<letters xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" letterCode="OA01" >';
var xmlRootEnd = '</letters>';

/* Find the tab index of the active document */
// Copied from http://www.ultraedit.com/forums/viewtopic.php?t=4571
function getActiveDocumentIndex () {
   var tabindex = -1; /* start value */

   for (var i = 0; i < UltraEdit.document.length; i++)
   {
      if (UltraEdit.activeDocument.path==UltraEdit.document[i].path) {
         tabindex = i;
         break;
      }
   }
   return tabindex;
}

if (UltraEdit.document.length) { // Is any file open?
   // Set working environment required for this job.
   UltraEdit.insertMode();
   UltraEdit.columnModeOff();
   UltraEdit.activeDocument.hexOff();
   UltraEdit.ueReOn();

   // Move cursor to top of active file and run the initial search.
   UltraEdit.activeDocument.top();
   UltraEdit.activeDocument.findReplace.searchDown=true;
   UltraEdit.activeDocument.findReplace.matchCase=true;
   UltraEdit.activeDocument.findReplace.matchWord=false;
   UltraEdit.activeDocument.findReplace.regExp=false;
   // If the string to split is not found in this file, do nothing.
   if (UltraEdit.activeDocument.findReplace.find(SplitString)) {
      // This file is probably the correct file for this script.
      var FileNumber = 1;    // Counts the number of saved files.
      var StringsFound = 1;  // Counts the number of found split strings.
      var NewFileIndex = UltraEdit.document.length;
      /* Get the path of the current file to save the new
         files in the same directory as the current file. */
      var SavePath = "";
      var LastBackSlash = UltraEdit.activeDocument.path.lastIndexOf("\\");
      if (LastBackSlash >= 0) {
         LastBackSlash++;
         SavePath = UltraEdit.activeDocument.path.substring(0,LastBackSlash);
      }
      /* Get active file index in case of more than 1 file is open and the
         current file does not get back the focus after closing the new files. */
      var FileToSplit = getActiveDocumentIndex();
      // Always use clipboard 9 for this script and not the Windows clipboard.
      UltraEdit.selectClipboard(9);
      // Split the file after every x found split strings until source file is empty.
      while (1) {
         while (StringsFound < FoundsPerFile) {
            if (UltraEdit.document[FileToSplit].findReplace.find(SplitString)) StringsFound++;
            else {
               UltraEdit.document[FileToSplit].bottom();
               break;
            }
         }
         // End the selection of the find command.
         UltraEdit.document[FileToSplit].endSelect();
         // Move the cursor right to include the next character and unselect the found string.
         UltraEdit.document[FileToSplit].key("RIGHT ARROW");
         // Select from this cursor position everything to top of the file.
         UltraEdit.document[FileToSplit].selectToTop();
         // Is the file not already empty?
         if (UltraEdit.document[FileToSplit].isSel()) {
            // Cut the selection and paste it into a new file.
            UltraEdit.document[FileToSplit].cut();
            UltraEdit.newFile();
            UltraEdit.document[NewFileIndex].setActive();
            UltraEdit.activeDocument.paste();


            /* Add line termination on the last line and remove automatically added indent
               spaces/tabs if auto-indent is enabled if the last line is not already terminated. */
            if (UltraEdit.activeDocument.isColNumGt(1)) {
               UltraEdit.activeDocument.insertLine();
               if (UltraEdit.activeDocument.isColNumGt(1)) {
                  UltraEdit.activeDocument.deleteToStartOfLine();
               }
            }

            // add headers and footers 

            UltraEdit.activeDocument.top();
            UltraEdit.activeDocument.write(xmlHead);
                        UltraEdit.activeDocument.write(xmlRootStart);
            UltraEdit.activeDocument.bottom();
            UltraEdit.activeDocument.write(xmlRootEnd);
            // Build the file name for this new file.
            var SaveFileName = SavePath + "LETTER";
            if (FileNumber < 10) SaveFileName += "0";
            SaveFileName += String(FileNumber) + ".raw.xml";
            // Save the new file and close it.
            UltraEdit.saveAs(SaveFileName);
            UltraEdit.closeFile(SaveFileName,2);
            FileNumber++;
            StringsFound = 0;
            /* Delete the line termination in the source file
               if last found split string was at end of a line. */
            UltraEdit.document[FileToSplit].endSelect();
            UltraEdit.document[FileToSplit].key("END");
            if (UltraEdit.document[FileToSplit].isColNumGt(1)) {
               UltraEdit.document[FileToSplit].top();
            } else {
               UltraEdit.document[FileToSplit].deleteLine();
            }
         } else break;
            UltraEdit.outputWindow.write("Progress " + SaveFileName);
      }  // Loop executed until source file is empty!

      // Close source file without saving and re-open it.
      var NameOfFileToSplit = UltraEdit.document[FileToSplit].path;
      UltraEdit.closeFile(NameOfFileToSplit,2);
      /* The following code line could be commented if the source
         file is not needed anymore for further actions. */
      UltraEdit.open(NameOfFileToSplit);

      // Free memory and switch back to Windows clipboard.
      UltraEdit.clearClipboard();
      UltraEdit.selectClipboard(0);
   }
}