I. Introduction
Functional requirements: Upload office documents and provide online preview of files.
solution:
- Use the Aspose.cells.jar package to convert the document to pdf format;
- Use libreOffice to convert documents to pdf format;
- Convert the document to html format using poi.
- Option 1: Through Aspose, this function is a paid version and needs to be cracked, so it can be discarded.
- Option 2: To use libreOffice, you need to install and use libreOffice, linux also needs to install unoconv, and you need to use the pom dependency of commons-io. Before the official maven library could not query this pom dependency, I gave up this solution. I found this dependency when I was about to query the data. It is already available. It is estimated that there was a problem with the official maven library some time ago.
- Option 3 can be used only by adding the required dependencies, but the converted html will have some formatting problems, which will be discussed later.
2. Add dependencies
<dependencies> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>3.12</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-scratchpad</artifactId> <version>3.12</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>fr.opensagres.xdocreport.document</artifactId> <version>1.0.5</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId> <version>1.0.4</version> </dependency> <dependency> <groupId>fr.opensagres.xdocreport</groupId> <artifactId>fr.opensagres.xdocreport.core</artifactId> <version>2.0.1</version> </dependency> </dependencies>
Three, word document to html
Generally, word files have two suffixes: doc and docx. docx is the extension for word2007 and later versions of documents, and doc is the extension for saving word2003 documents. For the conversion of word to html in these two formats, different methods need to be used.
-
Conversion problem: After conversion, for the 2003 version of word, the automatically generated directory will display errors; the 2003 version and the 2007 version may not be displayed for special characters, but the problem is not obvious; the pictures in the article must be commonly used pictures Format (jpeg,jpg,png, etc.), otherwise it cannot be displayed.
-
2003 version word is converted to html (.doc)
public static boolean word2003ToHtml(Map params) { logger.debug("***** word2003ToHtml start params:{}", params); try { //Image storage path String fileImg = params.get("fileImg").toString(); //After converting html, the url prefix of the image in html String viewImgPath = params.get("viewImgPath").toString(); //html file File htmlFile = new File(params.get("htmlFile").toString()); File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString()); // 1) Load word document to generate HWPFDocument object InputStream inputStream = new FileInputStream(file); HWPFDocument wordDocument = new HWPFDocument(inputStream); WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()); //Set the location where the image is stored wordToHtmlConverter.setPicturesManager(new PicturesManager() { public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches, float heightInches) { File imgPath = new File(fileImg); if (!imgPath.exists()) {//Create the image directory if it does not exist imgPath.mkdirs(); } File file = new File(fileImg + suggestedName); try { OutputStream os = new FileOutputStream(file); os.write(content); os.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } //Here you can specify the path to the image in the word document. return viewImgPath + "/" + suggestedName; } }); //Parse word document wordToHtmlConverter.processDocument(wordDocument); Document htmlDocument = wordToHtmlConverter.getDocument(); OutputStream outputStream = new FileOutputStream(htmlFile); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(outputStream); TransformerFactory factory = TransformerFactory.newInstance(); Transformer serializer = factory.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); outputStream.close(); } catch (Exception e) { e.printStackTrace(); return false; } return true; }
3. The 2007 version word is converted into html (.docx)
public static boolean word2007ToHtml(Map params) throws Exception { logger.debug("***** word2007ToHtml start params:{}", params); try { //After converting html, the url prefix of the image in html String viewImgPath = params.get("viewImgPath").toString(); //Image storage path String fileImg = params.get("fileImg").toString(); File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString()); // 1) Load word document to generate XWPFDocument object InputStream inputStream = new FileInputStream(file); XWPFDocument document = new XWPFDocument(inputStream); // 2) Parse the XHTML configuration (URIResolver to set the directory where the image is stored) XHTMLOptions options = XHTMLOptions.create(); options.URIResolver(new BasicURIResolver(viewImgPath)); FileImageExtractor extractor = new FileImageExtractor(new File(fileImg)); options.setExtractor(extractor); // 3) Convert XWPFDocument to XHTML File htmlFile = new File(params.get("htmlFile").toString()); OutputStream outputStream = new FileOutputStream(htmlFile); XHTMLConverter.getInstance().convert(document, outputStream, options); } catch (Exception e) { e.printStackTrace(); return false; } return true; }
Four, Excel document to Html
The method of converting Excel to HTML in POI can only convert the HSSFWorkBook type (ie 03 version xls), so you can first convert the read xlsx file into an xls file and then call this method for unified processing
-
The 2003 version of excel converts html (pictures cannot be included in excel, because poi does not provide a method to convert pictures in excel)
public static boolean excelToHtml(Map params) { try { String file = params.get("FILE_NAME").toString(); String filePath = params.get("filePath").toString() + file; InputStream input = new FileInputStream(filePath); HSSFWorkbook excelBook = new HSSFWorkbook(); //Judging Excel file to convert 07+ version to 03 version if (file.endsWith(EXCEL_XLS)) { //Excel 2003 excelBook = new HSSFWorkbook(input); } else if (file.endsWith(EXCEL_XLSX)) { // Excel 2007/2010 ExcelTransFormUtil xls = new ExcelTransFormUtil(); XSSFWorkbook workbookOld = new XSSFWorkbook(input); xls.transformXSSF(workbookOld, excelBook); } ExcelToHtmlConverter excelToHtmlConverter = new ExcelToHtmlConverter(DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument()); //Remove Excel header row excelToHtmlConverter.setOutputColumnHeaders(false); //Remove Excel row numbers excelToHtmlConverter.setOutputRowNumbers(false); excelToHtmlConverter.processWorkbook(excelBook); Document htmlDocument = excelToHtmlConverter.getDocument(); ByteArrayOutputStream outStream = new ByteArrayOutputStream(); DOMSource domSource = new DOMSource(htmlDocument); StreamResult streamResult = new StreamResult(outStream); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); outStream.close(); String content = new String(outStream.toByteArray()); FileUtils.writeStringToFile(new File(params.get("htmlFile").toString()), content, "UTF-8"); } catch (Exception e) { e.printStackTrace(); return false; } return true; }
-
Convert 2007 version to 2003 version
public void transformXSSF(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew) { HSSFSheet sheetNew; XSSFSheet sheetOld; workbookNew.setMissingCellPolicy(workbookOld.getMissingCellPolicy()); for (int i = 0; i < workbookOld.getNumberOfSheets(); i++) { sheetOld = workbookOld.getSheetAt(i); sheetNew = workbookNew.getSheet(sheetOld.getSheetName()); sheetNew = workbookNew.createSheet(sheetOld.getSheetName()); this.transform(workbookOld, workbookNew, sheetOld, sheetNew); } } private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew, XSSFSheet sheetOld, HSSFSheet sheetNew) { sheetNew.setDisplayFormulas(sheetOld.isDisplayFormulas()); sheetNew.setDisplayGridlines(sheetOld.isDisplayGridlines()); sheetNew.setDisplayGuts(sheetOld.getDisplayGuts()); sheetNew.setDisplayRowColHeadings(sheetOld.isDisplayRowColHeadings()); sheetNew.setDisplayZeros(sheetOld.isDisplayZeros()); sheetNew.setFitToPage(sheetOld.getFitToPage()); sheetNew.setHorizontallyCenter(sheetOld.getHorizontallyCenter()); sheetNew.setMargin(Sheet.BottomMargin, sheetOld.getMargin(Sheet.BottomMargin)); sheetNew.setMargin(Sheet.FooterMargin, sheetOld.getMargin(Sheet.FooterMargin)); sheetNew.setMargin(Sheet.HeaderMargin, sheetOld.getMargin(Sheet.HeaderMargin)); sheetNew.setMargin(Sheet.LeftMargin, sheetOld.getMargin(Sheet.LeftMargin)); sheetNew.setMargin(Sheet.RightMargin, sheetOld.getMargin(Sheet.RightMargin)); sheetNew.setMargin(Sheet.TopMargin, sheetOld.getMargin(Sheet.TopMargin)); sheetNew.setPrintGridlines(sheetNew.isPrintGridlines()); sheetNew.setRightToLeft(sheetNew.isRightToLeft()); sheetNew.setRowSumsBelow(sheetNew.getRowSumsBelow()); sheetNew.setRowSumsRight(sheetNew.getRowSumsRight()); sheetNew.setVerticallyCenter(sheetOld.getVerticallyCenter()); HSSFRow rowNew; for (Row row : sheetOld) { rowNew = sheetNew.createRow(row.getRowNum()); if (rowNew != null) this.transform(workbookOld, workbookNew, (XSSFRow) row, rowNew); } for (int i = 0; i < this.lastColumn; i++) { sheetNew.setColumnWidth(i, sheetOld.getColumnWidth(i)); sheetNew.setColumnHidden(i, sheetOld.isColumnHidden(i)); } for (int i = 0; i < sheetOld.getNumMergedRegions(); i++) { CellRangeAddress merged = sheetOld.getMergedRegion(i); sheetNew.addMergedRegion(merged); } } private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew, XSSFRow rowOld, HSSFRow rowNew) { HSSFCell cellNew; rowNew.setHeight(rowOld.getHeight()); for (Cell cell : rowOld) { cellNew = rowNew.createCell(cell.getColumnIndex(), cell.getCellType()); if (cellNew != null) this.transform(workbookOld, workbookNew, (XSSFCell) cell, cellNew); } this.lastColumn = Math.max(this.lastColumn, rowOld.getLastCellNum()); } private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew, XSSFCell cellOld, HSSFCell cellNew) { cellNew.setCellComment(cellOld.getCellComment()); Integer hash = cellOld.getCellStyle().hashCode(); if (this.styleMap != null && !this.styleMap.containsKey(hash)) { this.transform(workbookOld, workbookNew, hash, cellOld.getCellStyle(), (HSSFCellStyle) workbookNew.createCellStyle()); } cellNew.setCellStyle(this.styleMap.get(hash)); switch (cellOld.getCellType()) { case Cell.CELL_TYPE_BLANK: break; case Cell.CELL_TYPE_BOOLEAN: cellNew.setCellValue(cellOld.getBooleanCellValue()); break; case Cell.CELL_TYPE_ERROR: cellNew.setCellValue(cellOld.getErrorCellValue()); break; case Cell.CELL_TYPE_FORMULA: cellNew.setCellValue(cellOld.getCellFormula()); break; case Cell.CELL_TYPE_NUMERIC: cellNew.setCellValue(cellOld.getNumericCellValue()); break; case Cell.CELL_TYPE_STRING: cellNew.setCellValue(cellOld.getStringCellValue()); break; default: } } private void transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew, Integer hash, XSSFCellStyle styleOld, HSSFCellStyle styleNew) { styleNew.setAlignment(styleOld.getAlignment()); styleNew.setBorderBottom(styleOld.getBorderBottom()); styleNew.setBorderLeft(styleOld.getBorderLeft()); styleNew.setBorderRight(styleOld.getBorderRight()); styleNew.setBorderTop(styleOld.getBorderTop()); styleNew.setDataFormat(this.transform(workbookOld, workbookNew, styleOld.getDataFormat())); styleNew.setFillBackgroundColor(styleOld.getFillBackgroundColor()); styleNew.setFillForegroundColor(styleOld.getFillForegroundColor()); styleNew.setFillPattern(styleOld.getFillPattern()); styleNew.setFont(this.transform(workbookNew, (XSSFFont) styleOld.getFont())); styleNew.setHidden(styleOld.getHidden()); styleNew.setIndention(styleOld.getIndention()); styleNew.setLocked(styleOld.getLocked()); styleNew.setVerticalAlignment(styleOld.getVerticalAlignment()); styleNew.setWrapText(styleOld.getWrapText()); this.styleMap.put(hash, styleNew); } private short transform(XSSFWorkbook workbookOld, HSSFWorkbook workbookNew, short index) { DataFormat formatOld = workbookOld.createDataFormat(); DataFormat formatNew = workbookNew.createDataFormat(); return formatNew.getFormat(formatOld.getFormat(index)); } private HSSFFont transform(HSSFWorkbook workbookNew, XSSFFont fontOld) { HSSFFont fontNew = workbookNew.createFont(); fontNew.setBoldweight(fontOld.getBoldweight()); fontNew.setCharSet(fontOld.getCharSet()); fontNew.setColor(fontOld.getColor()); fontNew.setFontName(fontOld.getFontName()); fontNew.setFontHeight(fontOld.getFontHeight()); fontNew.setItalic(fontOld.getItalic()); fontNew.setStrikeout(fontOld.getStrikeout()); fontNew.setTypeOffset(fontOld.getTypeOffset()); fontNew.setUnderline(fontOld.getUnderline()); return fontNew; } Covering your face manually is a bit too much. We all make do and watch. . .
Five, PPT document to Html
It is to convert ppt into a picture and put it into html.
Note:
1) Chinese characters in ppt may have problems with Chinese display. You can add the Chinese character library to the server.
2) The pptx2007 version file cannot display the table
-
2003 version ppt to html
public static boolean ppt2003Tohtml(Map params) { try { String imgPath = params.get("fileImg").toString(); File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString()); InputStream inputStream = new FileInputStream(file); SlideShow ppt = new SlideShow(inputStream); inputStream.close(); Dimension pgsize = ppt.getPageSize(); org.apache.poi.hslf.model.Slide[] slide = ppt.getSlides(); FileOutputStream out = null; String imghtml = ""; String viewImgPath = params.get("viewImgPath").toString(); for (int i = 0; i < slide.length; i++) { logger.debug("the first" + i + "Page."); TextRun[] truns = slide[i].getTextRuns(); for (int k = 0; k < truns.length; k++) { RichTextRun[] rtruns = truns[k].getRichTextRuns(); for (int l = 0; l < rtruns.length; l++) { rtruns[l].setFontIndex(1); rtruns[l].setFontName("Song Dynasty"); } } BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB); Graphics2D graphics = img.createGraphics(); graphics.setPaint(Color.BLUE); graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height)); slide[i].draw(graphics); // Here set the storage path of the image and the format of the image (jpeg,png,bmp, etc.) out = new FileOutputStream(imgPath + (i + 1) + ".jpeg"); javax.imageio.ImageIO.write(img, "jpeg", out); //Image loading path in html String imgs = viewImgPath + "/" + (i + 1) + ".jpeg"; imghtml += "<img src=\'" + imgs + "\' style=\'width:960px;height:530px;vertical-align:text-bottom;\'><br><br><br><br>"; DOMSource domSource = new DOMSource(); StreamResult streamResult = new StreamResult(out); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); out.close(); String ppthtml = "<html><head><META http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body>" + imghtml + "</body></html>"; FileUtils.writeStringToFile(new File(params.get("htmlFile").toString()), ppthtml, "utf-8"); } } catch (Exception e) { e.printStackTrace(); return false; } return true; }
-
2007 version ppt conversion html
public static boolean ppt2007Tohtml(Map params) { try { String imgPath = params.get("fileImg").toString(); File file = new File(params.get("filePath").toString() + params.get("FILE_NAME").toString()); InputStream inputStream = new FileInputStream(file); XMLSlideShow ppt = new XMLSlideShow(inputStream); inputStream.close(); Dimension pgsize = ppt.getPageSize(); XSLFSlide[] pptPageXSLFSLiseList = ppt.getSlides(); FileOutputStream out = null; String imghtml = ""; String viewImgPath = params.get("viewImgPath").toString(); for (int i = 0; i < pptPageXSLFSLiseList.length; i++) { try { for (XSLFShape shape : pptPageXSLFSLiseList[i].getShapes()) { if (shape instanceof XSLFTextShape) { XSLFTextShape tsh = (XSLFTextShape) shape; for (XSLFTextParagraph p : tsh) { for (XSLFTextRun r : p) { r.setFontFamily("Song Dynasty"); } } } } BufferedImage img = new BufferedImage(pgsize.width, pgsize.height, BufferedImage.TYPE_INT_RGB); Graphics2D graphics = img.createGraphics(); // clear the drawing area graphics.setPaint(Color.white); graphics.fill(new Rectangle2D.Float(0, 0, pgsize.width, pgsize.height)); // render pptPageXSLFSLiseList[i].draw(graphics); // String Imgname = imgPath + (i + 1) + ".jpeg"; out = new FileOutputStream(Imgname); javax.imageio.ImageIO.write(img, "jpeg", out); //Image loading path in html String imgs = viewImgPath + "/" + (i + 1) + ".jpeg"; imghtml += "<img src=\'" + imgs + "\' style=\'width:960px;height:530px;vertical-align:text-bottom;\'><br><br><br><br>"; } catch (Exception e) { System.out.println(e); System.out.println("the first" + i + "open ppt conversion error"); } } DOMSource domSource = new DOMSource(); StreamResult streamResult = new StreamResult(out); TransformerFactory tf = TransformerFactory.newInstance(); Transformer serializer = tf.newTransformer(); serializer.setOutputProperty(OutputKeys.ENCODING, "utf-8"); serializer.setOutputProperty(OutputKeys.INDENT, "yes"); serializer.setOutputProperty(OutputKeys.METHOD, "html"); serializer.transform(domSource, streamResult); out.close(); String ppthtml = "<html><head><META http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body>" + imghtml + "</body></html>"; FileUtils.writeStringToFile(new File(params.get("htmlFile").toString()), ppthtml, "utf-8"); } catch (Exception e) { e.printStackTrace(); return false; } return true; }
6. Preview
- After each preview converts the source file to html, upload the html file to the nginx server directory and use nginx to proxy access to prevent browser caching problems. Here I package the html file and image path, upload it to the nginx server through sftp, and then unzip it.
- Use the iframe tag to access the html file. It should be noted here that when previewing the pdf file, a button to download and print will appear. If the requirement is that the file can only be cached but not downloaded, you cannot use the iframe tag to preview the pdf.