28 Find the Index of the First Occurrence in a String
Given two strings needle
and haystack
, return the index of the first occurrence of needle
in haystack
, or -1
if needle
is not part of haystack
.
Example 1:
Input: haystack = "sadbutsad", needle = "sad"
Output: 0
Explanation: "sad" occurs at index 0 and 6.
The first occurrence is at index 0, so we return 0.
Example 2:
Input: haystack = "leetcode", needle = "leeto"
Output: -1
Explanation: "leeto" did not occur in "leetcode", so we return -1.
class Solution {
public int strStr(String haystack, String needle) {
if (haystack == null || needle == null || haystack.length() < needle.length()){
return -1;
}
if (needle.length() == 0){
return 0;
}
for (int i = 0; i <= haystack.length() - needle.length(); i++){
int j = 0;
while(j < needle.length() && haystack.charAt(j+i) == needle.charAt(j)){
j++;
}
if (j == needle.length()){
return i;
}
}
return -1;
}
}
/*
l e e t c o d e
i
l e e t o
j
*/
class Solution {
public int strStr(String haystack, String needle) {
int n = haystack.length();
int m = needle.length();
if(n < m) {return -1;}
for(int i = 0; i <= n-m; i++) {
int j;
for(j = 0; j < m; j++) {
if(haystack.charAt(i+j) != needle.charAt(j)) {
break;
}
}
if(j == m) {
return i;
}
}
return -1;
}
}
class Solution {
// CONSTANTS
final int RADIX_1 = 26;
final int MOD_1 = 1000000033;
final int RADIX_2 = 27;
final int MOD_2 = 2147483647;
// Return Array of Hash Values
public long[] hashPair(String string, int m) {
long hash1 = 0, hash2 = 0;
long factor1 = 1, factor2 = 1;
for (int i = m - 1; i >= 0; i--) {
hash1 += ((int) (string.charAt(i) - 'a') * (factor1)) % MOD_1;
factor1 = (factor1 * RADIX_1) % MOD_1;
hash2 += ((int) (string.charAt(i) - 'a') * (factor2)) % MOD_2;
factor2 = (factor2 * RADIX_2) % MOD_2;
}
return new long[] { hash1 % MOD_1, hash2 % MOD_2 };
}
public int strStr(String haystack, String needle) {
int m = needle.length();
int n = haystack.length();
if (n < m)
return -1;
// COMPUTING CONSTANTS
long MAX_WEIGHT_1 = 1;
long MAX_WEIGHT_2 = 1;
for (int i = 0; i < m; i++) {
MAX_WEIGHT_1 = (MAX_WEIGHT_1 * RADIX_1) % MOD_1;
MAX_WEIGHT_2 = (MAX_WEIGHT_2 * RADIX_2) % MOD_2;
}
// Compute hash pair of needle
long[] hashNeedle = hashPair(needle, m);
long[] hashHay = { 0, 0 };
// Check for each m-substring of haystack, starting at index windowStart
for (int windowStart = 0; windowStart <= n - m; windowStart++) {
if (windowStart == 0) {
// Compute hashPair of First Substring
hashHay = hashPair(haystack, m);
} else {
// Update Hash Pair using Previous Hash Value in O(1)
hashHay[0] = ((hashHay[0] * RADIX_1) % MOD_1
- ((int) (haystack.charAt(windowStart - 1) - 'a') * MAX_WEIGHT_1) % MOD_1
+ (int) (haystack.charAt(windowStart + m - 1) - 'a') + MOD_1) % MOD_1;
hashHay[1] = ((hashHay[1] * RADIX_2) % MOD_2
- ((int) (haystack.charAt(windowStart - 1) - 'a') * MAX_WEIGHT_2) % MOD_2
+ (int) (haystack.charAt(windowStart + m - 1) - 'a') + MOD_2) % MOD_2;
}
// If the hash matches, return immediately. Probability of Spurious Hit tends to zero.
if (hashNeedle[0] == hashHay[0] && hashNeedle[1] == hashHay[1]) {
return windowStart;
}
}
return -1;
}
}
P.S. 1, 2 在string I 里
Q3(Strstr). Substring problem: how to determine whether a string is a substring of another string.
java API index of (string str) https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String)
return the index of the first occurrence of the specified substring, or -1 if there is no such occurrence.
Example:
text (s1) = "a b c d e"; length = m
pattern (s2) = "c d e"; length = n
return 2
Method 1: Brute Force O(\(n^2\))
for every starting inde in s1:
[0 ... s1.length() - s2.length()]
try to match every chars in s1 compare to s2
package TheoreticalClass.Class9StringII;
public class SubstringFinding {
public static void main(String[] args){
String string1 = "abcde";
String string2 = "cde";
System.out.println("BruteForce:" + helper(string1, string2));
System.out.println("BruteForce2:" + helper2(string1, string2));
}
public static int helper(String s1, String s2){
if (s1 == null || s2 == null || s1.length() < s2.length()){
return -1;
}
if (s2.length() == 0){
return 0;
}
int result = 0;
int count = 0;
int index = 0;
// a b c d e
// i
// c d e
// j
for(int i = 0; i < s1.length(); i++){
for (int j = 0; j < s2.length(); j++){
if (s1.charAt(i) == s2.charAt(j)) {
if (j == 0 || i == 0){
count = 1;
}else if (s1.charAt(i-1) == s2.charAt(j-1)){
count++;
}else{
count = 1;
}
}
if (count >= result){
index = i - count + 1;
}else {
index = i - result + 1;
}
result = Math.max(count, result);
}
}
return index;
}
// class 参考答案
public static int helper2(String s1, String s2){
if (s1 == null || s2 == null || s1.length() < s2.length()){
return -1;
}
if (s2.length() == 0){ // 容易漏掉 空串是任何空串的substring
return 0;
}
// i: every s1 starting index
for (int i = 0; i <= s1.length() - s2.length(); i++){ // 等号不能去掉
int j = 0;
//保持i的站位不动, j向后延伸
while (j < s2.length() && s1.charAt(i + j) == s2.charAt(j)){
j++;
}
if (j == s2.length()){
return i;
}
}
return -1;
}
/*
s1 = a b c d e
i i+1 i+2 i+3 i+4
s2 = cde
j
*/
}
所有指针的题都不要偷懒, 把它的意义写下来
i: every s1 starting index
问这个等于号能不能去掉
别慌 过例子
s1 = a
i
s2 = a
0要进去比一下, 所以等号不能去掉
TC: O(m*n)
SC: O(1)
High Level: for all possible starting index i from 0 to s1.length -2s.length
check from i pattern.length char == pattern
check i 后面的每一位是否和pattern里的每一位相同
Method 2 : Rabin-Karp (P2)
http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm
s2 = "c d e" hash("cde") = x0
s1 = "a b c d e";
____
a b c = X1
b c d = X2
c d e = X0
principle: If we can hash the short string to a unique integer (via a hash function without collision), then we can just compare each substring of s1's hashed value and compare it with s2's hash value.
Assumption: only lower case letter (base = 26 a--z)
a = 97 (ASCII) -> 0
b = 98 -> 1
...
z -> 25
=>
'a' - 'a' = 0
'b' - 'a' = 1
...
'z' - 'a' = 25
x - 'a' = ?
类似于10进制计算:
123 = 1 * 10 ^ 2 + 2 * 10 ^ 1 + 3 * 10 ^ 0
125 = (10 进制) = 1 * 10 ^ 2 + 2 * 10 ^ 1 5*10^0
100 + 20 + 5
abc = 012 (26进制) = 0 * 26 ^ 2 + 1 * 26 ^1 + 2 * 26 ^ 0 = 28
bcd = 123 (26进制) = 1 * 26 ^ 2 + 2 * 26 ^1 + 3 * 26^0 = 731
cde = 234 (26进制) = 2 * 26 ^2 + 3 * 26 ^ 1 + 4 * 26 ^0 = 1434
a b c
0*26^2 + 1 * 26 ^1 + 2 * 26 ^ 0 = 28
b c d
1 * 26 ^2 + 2 * 26^1 + 3 * 26 ^0 = 731
1 2 3 -> 2 3 4
234 = (123 - 1*10 ^2 ) * 10 + 4 = 234 O(1)
23
前面踢一个后面加一个O(1)
a b c
b c d
bcd = (28(abc) - 0 * 26 ^2) * 26 + 3
hash("bcd") = (hash("abc") - ('a' - 'a') * 26 ^2) + ('d' - 'a') * 26 ^ 0
c d e
2 * 26 ^2 + 3* 26^1 + 4 * 26 ^0 = 1434
Initialization: compute hash(s2) -- O(n) n = s2.length()
Each time we move the sliding window m = s1.length()
Time complexity for each step O(1)
Because we have m-n sliding windows, the total time complexity is O(m-n)
- remove the leftmost item from the polynomial function
- all the rest items of (bc's hashed value) x 26
- add new item d
starting index has range of
0 --> s1.size() - s2.size()
In total we have m - n +1 windos. --> O(m) time in average
Total time = O(m + n) average, O(mn) worst case
我用了这招, 我能省下哪儿段时间呢?
就是你从一个窗口比较下一个窗口的时候, 其实你是可以通过上一个窗口hash值 O(1)的时间知道下一值, 并且对比S2也是O(1)的, 所以可以降低到O(m+n)
public static int strStr(String haystack, String needle) {
// We need a prime to minimize the chance of collisions, as in a HashMap.
// The reason a prime is better is that when we generate hashcodes that are
// multiples of each other, they don't land on the same bucket when we calculate
// their mod (num_buckets) (unless they are multiples of the chosen prime).
// As of where I got this from, I just searched for primes on Wikipedia ;)
final int prime = 9999991;
// Most people use a base here representative of the base the data is encoded in.
// I have found 10 to be more intuitive and to work as fast as others,
// at least for the leetcode test dataset.
// Sedgewick uses 256, so feel free to switch to that, or to 26 if you restrict your values
// to a-z.
final int base = 10;
// Handle cornercase: needle is empty
if (needle.length() == 0) return 0;
// Handle cornercase: needle is longer than haystack
if (haystack.length() < needle.length()) return -1;
// Since we are expressing the string in the hashcode as
// char[i]*base^N + char[i-1] * base^N-1....+char[window.length()-1],
// when we remove the most significant digit in each step a few lines down from here,
// we need to multiply the letter we are removing from the window by the base elevated
// to the number of digits in the window (char[i]*base^N) mod prime.
// Needle (and the window) can have up to 50,000 characters as per requirements,
// so this can go up to Math.pow(10, 50,000), which in Java just returns "Infinity" - it's a huge number.
// To mitigate this, since we are going to be using this number to calculate a hash mod prime,
// we can multiply the base n times and calculate its mod prime factor by factor.
// This works because (a*b)%p = ((a%p)*(b%p))%p
// (https://en.wikipedia.org/wiki/Modulo_operation#Properties_(identities))
// Here we pre-calculate this number so we can reuse it, as we must do it iteratively
// and it would consume cycles character by character otherwise.
int maxBase = 1;
for (int i = 1; i < needle.length(); i++) {
maxBase = (maxBase * base) % prime;
}
int needlehash = hash(needle, needle.length(), base, prime);
int haystackhash = hash(haystack, needle.length(), base, prime);
// Did we get lucky with the first (needle.length()) characters?
if (needlehash == haystackhash) return 0;
// This loop slides the window of size needle.length() along the haystack.
// For example for [a,b,c,d,...] and a needle of size 3,
// it goes from [a,b,c] to [b,c,d] and so on.
// At each step, it recalculates the window's hashcode in O(1), by reusing the previous result. This is the key of the algorithm.
// For more info, please check https://en.wikipedia.org/wiki/Rolling_hash
for (int i = needle.length(); i < haystack.length(); i++) {
// Take first char out of the hash. You may note here that instead of subtracting the value for the most significant
// character directly from haystackhash, I'm subtracting it from haystackhash+prime. The reason is that, since I'm
// subtracting 2 numbers mod prime, the result could easily become negative. To prevent this, we add prime as the result
// should be the same (a%p = (a+p)%p) https://en.wikipedia.org/wiki/Modulo_operation#Properties_(identities)
// since now the number on the left of the subtraction is larger than the number on the right, the result
// cannot be negative.
haystackhash = (haystackhash+prime - (maxBase*haystack.charAt(i-needle.length())%prime))%prime;
// Shift to the left, add new char
haystackhash = (haystackhash*base + haystack.charAt(i))%prime;
// hashes matching can mean a match or a collision. Only way to tell one from the other
// is to linearly verify if the match on hash is also a match character by character.
if (needlehash == haystackhash && checkSolution(haystack, needle, i-needle.length()+1)) return i-needle.length()+1;
}
return -1;
}
// Calculates a hashcode similar to how java.lang.String does it.
// We reimplement it here because we need control over what the prime and the base is.
// Java uses base=31 and prime=Integer.MAX_VALUE (implicitly)
private static int hash(String chars, int count, int base, int prime) {
int hash = 0;
for (int i = 0; i < count; i++) {
char c = chars.charAt(i);
hash = (hash * base + c) % prime;
}
return hash;
}
// Check solution linearly, fail fast on first character not matching
private static boolean checkSolution(String haystack, String needle, int i) {
// Many strings have the same hashcode, we need to check whether this is a collision
for (int j = 0; j < needle.length(); j++) {
if (haystack.charAt(j+i) != needle.charAt(j)) return false;
}
return true;
}