Find Same Strings in Two Files & Output Them in a Certain Format

Question

I am writing a program to read two files and compare them word by word and line by line. Basically, I need to check if the first line in the first text file is a substring of any line in the second file and display the first word of each line of the second file if it is a substring, and then repeat the process with all the other lines of the first file. Additionally, I need to do this without using Java functions, such as contains().

For each line in the first file, I need to check the first word with each word in the lines of the second file till I find a match. Once I find a match I need to check if the second word in the first file is the same as the next word in the second file and so on until the end of the line in the first file. If the entire line in the first file is contained in a line of the second file then the program must print the first word of that line from the second file.

For example:
File1.txt:
like parks
went out
go out

File2.txt:
I like to go out because I like parks
Ben does not go out much
Shelly went out often but does not like parks
Harry does not go out neither does he like parks

Desired Output:
q1. like parks
I
Shelly
Harry
q2. went out
Shelly
q3. go out
I
Ben
Harry

// Import io so we can use file objects

import java.io.BufferedReader;

import java.io.FileReader;

import java.io.IOException;

import java.io.*;

public class wordc {

public static void main(String[] args) {

try {

//reads the files

BufferedReader bf1 = new BufferedReader(new FileReader("File1.txt"));

BufferedReader bf2 = new BufferedReader(new FileReader("File2.txt"));

int k =0, l = 0, i = 0, j = 0, count = 0, linecount1 = 0, linecount2 = 0, wordcount1 = 0, wordcount2 = 0;

String line1, line2;

//counts the number of lines in File1

while((line1 = bf1.readLine()) != null)

{

linecount1++;

}

//counts the number of lines in File2

{

linecount2++;

}

// loop to iterate through File1

while((line1 = bf1.readLine()) != null && k < linecount1)

{

System.out.println("q"+ k++ + "line1");

//store words in the current line in the File1 in a word array

String[] word1 = line1.split(" ");

//number of words in the line

wordcount1 = word1.length;

//loop to iterate through File2

while ((line2 = bf1.readLine()) != null && l < linecount2)

{

//store words in current line in the File2 in a word array

String[] word2 = line2.split(" ");

// number of words in the line

wordcount2 = word2.length;

count = 0;

while(j < wordcount1)

{

while(i < wordcount2)

{

//compare first word in word1 array to first word in word2 array

//continue to compare till a match is found

//once a match is found increament count

// and compare the next word in the word1 array with the next word in the word2 array

//and so on

if (word1[j].equals(word2[i]))

{

i++;

j++;

count++;

}

//if the current word in word1 does not match the word in word2

//check the current word in word1 with the next word in word2

else

{

i++;

break;

}

}

}

//if the number of words in a line in File1 matched a portion of a line in File2

//print the first word of that line

if(count == wordcount1)

System.out.println(line2[l]);

l++;

}

k++;

}

bf1.close();

bf2.close();

}

catch (IOException e) {

System.out.println("IO Error Occurred:" + e.toString());

}

}

}

Thanks in advance for all the help!

 

Answer

This isn’t difficult. You can use a two-level loop and string processing operations (query, splitting, concatenation and location). But it’s cumbersome to code it from low level. You can handle it in SPL (Structured Process Language) with 2 lines of code:

A

1

=file("D:\\file1.txt").read@n()

2

=file("D:\\file2.txt").read@n()

3

=A1.conj(A2.select(pos(~,A1.~)).(~.words()(1)))

A1: Read file1.txt line by line;

A2: Read file2.txt line by line;

A3: conj() function calculates members of a set iteratively and concatenates results of subsets. select() function gets certain members. pos() function checks if a string is contained in another string. words() function splits a string into individual words.

conj\select\pos are loop functions. They can replace the loop statements to generate concise code.

To call an SPL script in a Java application, see How to Call an SPL Script in Java.