반응형
pdfbox의 목적
- PDF 문서 작성
- PDF 문서 추출 및 편집
pdfbox의 장점
- 무료
- PDF 문서내의 글자를 추출 혹은 편집 가능
- PDF 파일을 여러개로 분리 혹은 병합 가능
- PDF를 이미지(PNG or JPEG)로 변환 가능
- PDF 싸인 가능
pdfbox의 단점
- 아직 단점은 잘 모르겠음
구현
<!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.24</version>
</dependency>
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.cos.COSDocument;
import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream;
import org.apache.pdfbox.io.RandomAccessRead;
import org.apache.pdfbox.pdfparser.PDFParser;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentInformation;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.text.PDFTextStripper;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;
public class PdfReaderUtil {
public static void main(String args[]) {
System.out.println("start!");
PdfReaderUtil pdfReaderUtil = new PdfReaderUtil();
try {
String text1 = pdfReaderUtil.readPdfBox1(filePath);
String text2 = pdfReaderUtil.readPdfBox2(filePath);
System.out.println("readPdfBox1:"+text1);
System.out.println("readPdfBox2:"+text2);
} catch (Exception e) {
System.out.println("exception : " + e);
}
System.out.println("end!");
}
public static String readPdfBox1(String filePath) throws Exception {
File file = new File(filePath);
PDDocument pDDocument = PDDocument.load(file);
String text = new PDFTextStripper().getText(pDDocument);
return text;
}
public static String readPdfBox2(String filePath) throws Exception {
InputStream inputStream = new FileInputStream(new File(filePath));
RandomAccessRead source = new RandomAccessBufferedFileInputStream(inputStream);
PDFParser pDFParser = new PDFParser(source);
pDFParser.parse();
PDDocument pDDocument = pDFParser.getPDDocument();
COSDocument cOSDocument = pDFParser.getDocument();
PDFTextStripper pDFTextStripper = new PDFTextStripper();
PDDocumentInformation pDDocumentInformation = pDDocument.getDocumentInformation();
pDFTextStripper.setLineSeparator("\n");
pDFTextStripper.setWordSeparator(" ");
String result = pDFTextStripper.getText(pDDocument);
System.out.println("Total : " + pDDocument.getNumberOfPages());
cOSDocument.close();
pDDocument.close();
return result;
}
}
반응형
'개발 > Java' 카테고리의 다른 글
[Java] PDF - iText (0) | 2021.07.12 |
---|---|
[Java ] OCR - 결과 비교 Tesseract, Google Vision (5) | 2021.07.02 |
[JAVA] OCR - Google Vision (0) | 2021.06.28 |
[JAVA] OCR - tesseract (0) | 2021.06.26 |