個人資料
  • 博客訪問:
正文

軟件開發麵試C++問題

(2011-03-07 22:17:26) 下一個
當年每次經過計算機係走廊,到處聽到的是老美響亮的“PLUS PLUS”的聲音--都在討論呢。那時遇到問題時的“聖經”是ARM (C++ Annotated Reference Manual)。過了10多年,C++已經實現了標準化,標準文件就有上千頁。C++水平往往是HACKER水平高低的標誌。

有次,有人把一個C++麵試的問題發給我。我一看,現在麵試問題越來越刁難了嘛。題目是:有10萬個文本文件,輸入一個詞組,如何迅速找出含這些詞的文件。把題發過來後,開始計時,必須60-90分鍾之內完成,將答案寄回。

我一看,這不是要構造一個簡單的搜索引擎嘛。

時間緊迫,我立刻打開VI,開始迅速TYPE,並進行了測試。然後將代碼與測試的截屏在40多分鍾的時候EMAIL了回去。下麵將題及我的解答貼出,各位遇到類似的問題可以作為參考。

/////////////////////////////////麵試問題//////////////////////////////////////////////

/*************
 *
 * Copyright (C) D. Yue, March 4, 2009
 *
 */

#include
#include
#include
#include
#include
#include
#include
#include




/********************************

Problem 1 Time cut-off: 60 - 90 min.

1.Data: a set of 100,000 ASCII files (strings).
2.Each file contains 1 or more words.
3.A query is entered from stdin (may contain multiple words).
4.Code in C++: find files in (1) that partially matche to query input. 
5.Assume: the function will be invoked repeatedly.
6.Optimize for time.

********************************************************************************************/


/**************************

Solution to Problem 1

Algorithm & datastructure

*) Assign a unique integer ID to each of the N ASCII strings, so one can retrieve them by their ID, call these IDs StringIDs;
*) Store the strings in some container, which allows easy retrieval by IDs, a hashtable is approriate.
*) Break each string into words (tokenization)
*) In a hash table with the words as keys, store the StringIDs of the strings that contain a key;
*) On user input search the keys in the hashtable, once a match is found, get the StringIDs, then print the corresponding string
*) take care of the requirement that multiple words must all match


******************************/


using namespace std;
using namespace boost;

struct StringWithID {
 StringWithID() {}

 StringWithID(int i, const string& s): id(i), value(s) {};
 int id;
 string value;

 static int getNextID() { return NextID++; } // used when assigning a new ID to a new string

private:
 static int NextID;
 

};

int StringWithID::NextID =1; //initialize

class Search{ 


 typedef map IdToString; // given an ID find the string
 typedef map StringToId; // given a string, find its ID

 typedef map > WordToIds; // given a word, find the IDs of the strings that contains the word

public:
 //Initialize the StringSet and WordsToIDs map with the ASCII strings
 bool initialize( const vector strs) {

 using namespace std;
        using namespace boost;

 for(vector::const_iterator p=strs.begin(); p != strs.end(); p++ ) {

 if(str2id.find(*p) != str2id.end()) continue; // string already there
 int new_id = StringWithID::getNextID(); // assign this ID to the new guy

 id2str[new_id] = StringWithID(new_id, *p); // stored
 str2id[*p] = new_id; //reverse lookup stored

 tokenizer<> tok(*p);
 for(tokenizer<>::iterator i=tok.begin(); i!=tok.end();i++){
 cout<<"Got token " << *i<< endl;
 w2id[*i].push_back(new_id);
 }

 }
 return true;

 }
 
 //pattern is a space separated list of words
 //we iterate through the w2id map to check if 
 bool find_match(const string & pattern) {
 //first we tokenize the pattern into words

 list pats;
 tokenizer<> tok(pattern);
 for(tokenizer<>::iterator i=tok.begin(); i!=tok.end();i++){
 pats.push_back(*i);
 }

 size_t pat_cnt = pats.size();

 map > matched_sets;

 for(list::iterator pi = pats.begin(); pi != pats.end(); pi++) {
 for(WordToIds::iterator witer = w2id.begin(); witer != w2id.end(); witer++) {//iteratate through the words

 string word = witer->first;
 
 if(word.find(*pi) != string::npos) { // the word matched the pattern
 cout<<"Found: "<< *pi << " in: "<< word<

 copy((witer->second).begin(), (witer->second).end(), inserter(matched_sets[*pi], matched_sets[*pi].begin()));

 for(set::iterator i = matched_sets[*pi].begin(); i != matched_sets[*pi].end(); i++) {

 int str_id = *i;
 string str = id2str[str_id].value;
 cout<<"Found sub-pattern "<<*pi << " in: "<
 }
 break;
 }
 }
 }
 //at this point we have sets of IDs for the individual sub input patterns, we must find the ones that in all of the sets (intersection)

      set good_ids;
 bool first_run = true;

 for(map >::iterator i = matched_sets.begin(); i!= matched_sets.end(); i++) {

 if(first_run) {
 good_ids = i->second;
 first_run = false;
 continue;
 }
 
 set old_good_ids = good_ids;
 
 set & int_set = i->second;
 good_ids.clear();
 set_intersection(int_set.begin(), int_set.end(), old_good_ids.begin(), old_good_ids.end(), inserter(good_ids, good_ids.begin()));
 }
 
 //now print out the strings

 if(good_ids.empty() ) {

 cout<<"No match found!"<
 }else {
 for(set::iterator i = good_ids.begin(); i != good_ids.end(); i++) {

 int str_id = *i;
 string str = id2str[str_id].value;
 cout<<"Matched string: "<
 }

 }
 return !good_ids.empty();


 }


protected:
 IdToString id2str;
 StringToId str2id;
 WordToIds w2id;

};

//test program

int main() {
 vector strings;
 strings.push_back("angry brad pitt");
 strings.push_back("pitt likes jolie");

 Search search;
 search.initialize(strings);

 char buf[255];
 while(std::cin.getline(buf, 255)) {
 search.find_match(buf);
 }
}
[ 打印 ]
閱讀 ()評論 (0)
評論
目前還沒有任何評論
登錄後才可評論.